I am a third-year Ph.D. student in Computer Science at Fudan University, advised by Prof. Pengfei Liu and Prof. Yu Qiao. My research interests are the intersection of reinforcement learning and multimodal models (I am a strong RL believer✊).
In 2025, I built substantial experience in RL for vision-language models, with a focus on RL scaling across single-visual tasks (e.g., MAYE), multi-visual tasks (e.g., One-RL-to-See-Them-All), and agentic vision tool use (e.g., Med). Prior to this, I also worked on unified multimodal models (e.g., ANOLE) and text generation (e.g., MoPS).
My current focus is pre-training for multimodal generative models.
I have had valuable internship experiences at MiniMax and HailuoAI (2025.01 - 2026.02), where I contributed to the M-series foundation models and Hailuo video generation models; Shanghai AI Laboratory (2023.07 - 2025.01), focusing on text generation and multimodal foundation models; and Netease Games AI Laboratory (2022.06 - 2022.08), contributing to Game AI research.

The full listing can also be found on my Google Scholar Profile. But here can find links to related material like code suites.
Website template adapted from joschu.net.