Zheqing (Bill) Zhu Home Page

Zheqing (Bill) Zhu
朱哲清

Founder and CEO, Pokee AI (pokee.ai)

ex-Senior Staff Research Lead Manager
ex-Head of Applied Reinforcement Learning
Meta AI

Stanford PhD in Reinforcement Learning

LinkedIn: https://www.linkedin.com/in/zheqingzhubill/
Twitter: https://twitter.com/ZheqingZhu

Contact:
billzhu@pokee.ai (For industry-related inquiries)
zheqzhu@alumni.stanford.edu (For academic-related inquiries)

Zheqing (Bill) Zhu is the Founder and CEO of Pokee AI and former head of Applied Reinforcement Learning team at Meta AI. Bill leads Pokee AI to build the next-generation foundation AI agents that excel in reasoning, planning and tool usage. Pokee AI's foundation tool usage model surpasses GPT4o, Claude 3.7 and Gemini 2.5 pro in function calling by a large margin and can easily extend to more than 6000 tools. So far, Pokee AI has secured 12M USD seed round funding from Point72 Ventures, with participation from Qualcomm Ventures, Samsung NEXT, among others as well as top angels such as Lip-bu Tan (CEO of Intel) and Ahbay Parasnis (Founder of Typeface and ex-CTO of Adobe).

Before Pokee AI, Bill was a Senior Staff Research Lead Manager at Meta AI, where he served as the Head of Applied Reinforcement Learning team. He led the development and open-sourcing of Pearl, Meta's flag-ship reinforcement learning training platform for production use cases and deployed RL models across ads, recommender systems, and reality labs, realizing more than 500M USD of annual revenue. Prior to serving as Head of Applied Reinforcement Learning, he was the engineering manager and tech lead for Meta’s Ads Growth Machine Learning team, where he built the first advertiser growth AI product and grew Meta's active advertisers from 2M to 12M.

Bill earned his PhD degree in Reinforcement Learning at Stanford University, advised by Professor Benjamin Van Roy, while working full-time at Meta AI leading the Applied Reinforcement Learning team. His main research focus is to understand theoretical and practical gaps in existing reinforcement learning algorithms in a real-world context. He received Master of Science in Computer Science from Stanford University (also while full-time at Meta AI) and Bachelor of Science in Computer Science with a Minor in Finance, summa cum laude, from Duke University. His work has been highlighted as Meta AI NeurIPS 2023 highlight, CMO highlight launch, multiple win of the month awards at Meta. He has been the recipient of Alex Vasilos Memorial Award, the Highest Distinction Graduate Award from Duke University and Ericsson BUSS Shanghai Quarterly Technical Award. His publications have appeared in top venues including JMLR, ICML, ICLR, KDD, Machine Learning, RecSys, ICRA, IROS and more.

Professional Experience

Founder and CEO, Pokee AI, 2024 - now
Senior Staff Research Lead Manager - Head of Applied Reinforcement Learning, Facebook (Meta) AI, 2021 - 2024
Engineering Manager / Tech Lead, Ads Growth Machine Learning, Facebook (Meta), 2018 - 2021
Machine Learning Engineer, Ads Growth Machine Learning, Facebook (Meta), 2017 - 2018

Selected Publicly Available Research

Improving Generative Ad Text on Facebook using Reinforcement Learning
ArXiv Link
Daniel R Jiang, Alex Nikulkov, Yu-Chia Chen, Yang Bai, Zheqing Zhu
Aligned Multi-Objective Optimization
ArXiv Link, ICML 2025
Yonathan Efroni, Daniel Jiang, Ben Kretzu, Jalaj Bhandari, Zheqing Zhu, Karen Ullrich
Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
ArXiv Link, ICLR 2025
Wenhao Zhan, Scott Fujimoto, Zheqing Zhu, Jason D. Lee, Daniel R. Jiang, Yonathan Efroni
Pearl - A Production-ready Reinforcement Learning Agent
ArXiv Link, Website, Github Repo. JMLR
Zheqing Zhu, Rodrigo de Salvo Braz, Jalaj Bhandari, Daniel Jiang, Yi Wan, Yonathan Efroni, Ruiyang Xu, Liyuan Wang, Hongbo Guo, Alex Nikulkov, Dmytro Korenkevych, Urun Dogan, Frank Cheng, Zheng Wu, Wanqiao Xu
Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling
ArXiv Link, Presented at INFORMS 2023
Zheqing Zhu, Yueyang Liu, Xu Kuang, Benjamin Van Roy
Offline Reinforcement Learning for Optimizing Production Bidding Policies
ArXiv Link, KDD 2024
Dmytro Korenkevych, Frank Cheng, Artsiom Balakir, Alex Nikulkov, Zhihao Cen, Zuobing Xu, Zheqing Zhu
Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement Learning.
ArXiv Link, RecSys 2023 (Also presented at KDD Workshop 2023)
Ruiyang Xu*, Jalaj Bhandari*, Dmytro Korenkevych, Fan Liu, Yuchen He, Alex Nikulkov, Zheqing Zhu
Deep Exploration for Recommendation Systems.
ArXiv Link, RecSys 2023
Zheqing Zhu, Benjamin Van Roy
Scalable Neural Contextual Bandit for Recommender Systems.
ArXiv Link, CIKM 2023 (also presented at KDD Workshop 2023)
Zheqing Zhu, Benjamin Van Roy
Learning to Bid and Rank Together in Recommendation Systems.
(ArXiv Coming Soon), Springer Machine Learning Journal
Geng Ji, Wentao Jiang, Jiang Li, Fahmid Morshed Fahid, Zhengxing Chen, Yinghua Li, Jun Xiao, Chongxi Bao, Zheqing Zhu
IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control.
ArXiv Link, ICRA 2024 (also presented at ICML Workshop 2023)
Rohan Chitnis*, Yingchen Xu*, Bobak Hashemi, Lucas Lehnert, Urun Dogan, Zheqing Zhu, Olivier Delalleau

Evaluating Online Bandit Exploration In Large-Scale Recommender System.
ArXiv Link, KDD Workshop on Multi-Armed Bandits and Reinforcement Learning: Advancing Decision Making in E-Commerce and Beyond, 2023
Hongbo Guo, Ruben Naeff, Alex Nikulkov, Zheqing Zhu
Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation via Deep Reinforcement Learning.
ArXiv Link, Submitted to CoNext, 2023
Chang-Lin Chen, Hanhan Zhou, Jiayu Chen, Mohammad Pedramfar, Vaneet Aggarwal, Tian Lan, Zheqing Zhu, Chi Zhou, Tim Gasser, Pol Mauri Ruiz, Vijay Menon, Neeraj Kumar, Hongbo Dong
Multi-Agent Safe Planning with Gaussian Processes.
ArXiv Link, IROS 2020
Zheqing Zhu, Erdem Biyik, Dorsa Sadigh

Education

PhD, Reinforcement Learning, Stanford University, Advisor: Benjamin Van Roy, 2023 (Completed while leading the Applied Reinforcement Learning team at Meta AI)
MS, Computer Science, Stanford University, 2020 (Completed while leading the Applied Reinforcement Learning team at Meta AI)
BS, Computer Science, summa cum laude, Duke University, Advisor: Ronald Parr, 2017

Honors

Company-wide NeurIPS AI highlight - Pearl open-source RL library, Meta AI, 2023
CMO's Highlight Launch List, Facebook (Meta), 2021
Stanford Graduate Student Fellowship, 2020
Win of Month / Win of Quarter, Ads Growth, Facebook (Meta), 2017-2021 (Multi-time Winner)
Alex Vasilos Memorial Award, Duke University, 2017
Graduate with Highest Distinction, Duke University, 2017
Ericsson BUSS Shanghai Quarterly Technical Award, 2015

Community Services

Workshop Chair of AAAI 2023 Reinforcement Learning Ready for Production Workshop
Workshop Advisor of RLC 2024 Deployable Reinforcement Learning Workshop
Reviewer: NeurIPS, AAAI, MLJ, ICLR, KDD

Invited Talks

Reinforcement Learning and AI Agents, World AI Conference (to come), 2025
Reinforcement Learning and AI Agents, Future Builderz, 2025
The Year of Breakout for RL Agents, Jinqiu Capital, 2025
Production-Ready Reinforcement Learning Platform - Pearl, NeurIPS 2023
Reinforcement Learning for Recommender Systems, Neflix Research, 2023
Reinforcement Learning for Recommender Systems, DataFunSummit, 2023
Deep Exploration for Recommendation Systems, at University of Chinese Academy of Sciences, 2023
Deep Machine Learning Panel, at ML Summit San Francisco, 2019
Deep Reinforcement Learning Applications, at Shanshu.ai, 2019

Page updated

Google Sites

Report abuse