|
||||||||||
|
||||||||||
Openings I am actively seeking self-motivated students with strong mathematical or programming skills for the following positions:
🎓We offer a special PhD application track for applicants with a bachelor’s degree from C9 League, UCAS, and QS/THE Top 20 universities, featuring an expedited offer process and a much higher scholarship. Welcome eligible applicants to get in touch with me as soon as possible! C9 League: THU, PKU, SJTU, FDU, ZJU, NJU, USTC, XJTU, HIT 🎓We offer a joint PhD program for Ph.D. students from FDU, ZJU, NJU, USTC, XJTU, and HIT. Welcome eligible applicants to get in touch with me as soon as possible! If you have a background in or are passionate about both applied and theoretical aspects of the following topics:
|
||||||||||
News 09/2025 📃 We release our world model work LatticeWorld for building an industry-grade 3D world with multimodal LLMs. Check out this demo! 09/2025 🎓We offer a special PhD application track for applicants with a bachelor’s degree from C9 League, UCAS, and QS/THE Top 20 universities, featuring an expedited offer process and a much higher scholarship. Welcome eligible applicants to get in touch with me as soon as possible! 📧 (Limited Openings) 09/2025 🎓We offer a joint PhD program for Ph.D. students from FDU, ZJU, NJU, USTC, XJTU, and HIT. Welcome eligible applicants to get in touch with me as soon as possible! 📧 09/2025 🎉 I will be serving as Area Chair of ICLR 2026 08/2025 🎉 Great honor to receive research funding from renowned VC MiraclePlus (奇绩创坛) to support our research on Agentic AI and LLMs. 07/2025 🎉 I will be serving as Senior PC of AAAI 2026 |
||||||||||
Research Interest My recent research interests generally focus on sequential decision-making problems and their applications in Agentic AI, Large Language Models, and Embodied AI. Specifically, my research interest include both applications and theories of reinforcement learning, game theory, large language models, robotics, AI agents, diffusion models, and nonconvex optimization. |
||||||||||
Selected Recent Publication [Full List] [World Model] LatticeWorld: A
Multimodal Large Language Model-Empowered Framework for
Interactive Complex World Generation
(Preprint) [PDF] [Demo Video]
[World Model] Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective Advances in Neural Information Processing Systems (NeurIPS), 2025 [PDF] [LLM Reasoning] Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models Advances in Neural Information Processing Systems (NeurIPS), 2025 [PDF] [Code] [Chinese Media Coverage | Highlighted as Lead Story] [Multi-Objective RL] Traversing Pareto Optimal Policies: Provably Efficient Multi-Objective Reinforcement Learning (Preprint) [PDF] [Robust LLM] ROPO: Robust Preference Optimization for Large Language Models International Conference on Machine Learning (ICML), 2025 [PDF] [RL] On the Value of Myopic Behavior in Policy Reuse IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025 [PDF] [LLM] Online Preference Alignment for Language Models via Count-based Exploration International Conference on Learning Representations (ICLR Spotlight), 2025 [PDF] [Code] [Chinese Media Coverage] [Robust RL] Tackling Data Corruption in Offline Reinforcement Learning via Sequence Modeling International Conference on Learning Representations (ICLR), 2025 [PDF] [RL & Diffusion] Forward KL Regularized Preference Optimization for Aligning Diffusion Policies AAAI Conference on Artificial Intelligence (AAAI), 2025 [PDF] [RL & Econ] Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach Journal of Machine Learning Research (JMLR), 2024 [PDF] [Multi-Objective LLM] Rewards-in-Context: Multi-Objective Alignment of Foundation Models with Dynamic Preference Adjustment International Conference on Machine Learning (ICML), 2024 [PDF] [Code] [Risk-Sensitive RL] Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning International Conference on Machine Learning (ICML Spotlight), 2024 [PDF]
[Multi-Objective LLM]
Arithmetic Control of LLMs for Diverse User
Preferences: Directional Preference Alignment with
Multi-Objective Rewards
Annual Meeting of the Association for Computational Linguistics (ACL main), 2024 [PDF] [Code] [RL & Game] Posterior Sampling for Competitive RL: Function Approximation and Partial Observation Advances in Neural Information Processing Systems (NeurIPS), 2023 [PDF] [RL] Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics International Conference on Learning Representations (ICLR), 2023 [PDF] [Optimization] Gradient-Variation Bound for Online Convex Optimization with Constraints AAAI Conference on Artificial Intelligence (AAAI), 2023 [PDF] [RL & Game] Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning International Conference on Machine Learning (ICML), 2022 [PDF] [Code] [Optimization] In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data Shuffle International Conference on Management of Data (SIGMOD), 2022 [PDF] [Extended Version] [RL & Game] On
Reward-Free RL with Kernel and Neural Function
Approximations: Single-Agent MDP and Markov Game
International Conference on Machine Learning (ICML), 2021 [PDF] [RL & Game] Provably
Efficient Fictitious Play Policy Optimization for
Zero-Sum Markov Games with Structured Transitions
International Conference on Machine Learning (ICML), 2021 [PDF] [RL] On Finite-Time
Convergence of Actor-Critic Algorithm
IEEE Journal on Selected Areas in Information Theory (JSAIT), 2021 [PDF] [Image Rendering] Stylized Neural Painting IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021 [PDF] [Code] [Project]
[Graph Embedding] Pine: Universal Deep Embedding for Graph Nodes via Partial Permutation Invariant Set Functions IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021 [PDF] [Safe RL] Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss Advances in Neural Information Processing Systems (NeurIPS), 2020 [PDF]
[Compressed Sensing]
Robust One-Bit Recovery via ReLU Generative Networks:
Near-Optimal Statistical Rate and Global Landscape
Analysis
International Conference on Machine Learning (ICML), 2020 [PDF] |
||||||||||
Grant
|
||||||||||
Academic Service Conference
|