Yan Ma's Homepage

I am a third-year Ph.D. student in Computer Science at Fudan University, advised by Prof. Pengfei Liu and Prof. Yu Qiao. My research interests are the intersection of reinforcement learning and multimodal models (I am a strong RL believer✊).

In 2025, I built substantial experience in RL for vision-language models, with a focus on RL scaling across single-visual tasks (e.g., MAYE), multi-visual tasks (e.g., One-RL-to-See-Them-All), and agentic vision tool use (e.g., Med). Prior to this, I also worked on unified multimodal models (e.g., ANOLE) and text generation (e.g., MoPS).

My current focus is pre-training for multimodal generative models.

I have had valuable internship experiences at MiniMax and HailuoAI (2025.01 - 2026.02), where I contributed to the M-series foundation models and Hailuo video generation models; Shanghai AI Laboratory (2023.07 - 2025.01), focusing on text generation and multimodal foundation models; and Netease Games AI Laboratory (2022.06 - 2022.08), contributing to Game AI research.

News

Feb 2026 — 🔍 My paper "What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-Zoom" is out on arXiv! [Paper] [Homepage] [Code] [Huggingface]
Jun 2025 — 📘 A collaborative survey "Thinking with images for multimodal reasoning: Foundations, methods, and future frontiers" is out on arXiv! [Paper]
May 2025 — 🤖 My paper "One RL to See Them All: Visual Triple Unified Reinforcement Learning" is out on arXiv! [Paper] [Code]
May 2025 — 🖼️ Our work "Thinking with Generated Images" is now available on arXiv! [Paper] [Code]
Apr 2025 — 🚀 Our new survey "Test Time Scaling Drives Cognition Engineering" is out on arXiv! 🔥 [Paper]
Apr 2025 — 📝 My paper "Rethinking RL Scaling for Vision Language Models" is out on arXiv! [Paper] 🎯
Jan 2025 — 🧠 The extended blog version of ANOLE was accepted to the ICLR 2025 Blog Track! [Blog] [Poster]
Sep 2024 — 🏟 "OlympicArena" accepted to NeurIPS 2024 Dataset & Benchmark Track. [Paper]
Sep 2024 — 💡 "Weak-to-strong reasoning" accepted to EMNLP 2024 Findings. [Paper]
Jul 2024 — 🐍 We released ANOLE, our open-source autoregressive vision-language model! [arXiv] / [Code] ⭐️700+
Jun 2024 — 🧹 My paper "MoPS: Modular Story Premise Synthesis" accepted to ACL 2024! [Paper]
Feb 2023 — ✨ Our AAAI 2023 paper on cross-domain adaptation is out! [Paper]
Dec 2022 — 🎤 Presented "Evolutionary Action Selection" at ICONIP 2022 (Oral). [Paper]

Selected Publications

The full listing can also be found on my Google Scholar Profile. But here can find links to related material like code suites.

News

Selected Publications

Education

Project

Misc

Invited Talks

Academic Activities