|
Chunhui Zhang
Greetings! I am Chunhui Zhang (pronounced as Ch'un-hui Chang / εΌ΅ζ₯ζ). I am a Ph.D. candidate in Computer Science at Dartmouth π², where I have the great fortune of working with π Professor Soroush Vosoughi. From June to September of 2025, I was a research intern at Google DeepMind, Mountain View, CA.
I obtained my MSCS degree (research-based) with the GSAS Fellowship at Brandeis University. My research journey began at Northeastern University, where I obtained a Bachelor's degree in CS and received the Outstanding Honor Thesis Award.
 / 
 / 
 / 
 / 
CV  / 
Google Scholar
|
|
|
Research
My work focuses on scaling modular cognitive architectures of multimodal models, particularly supporting their memory capacity and long-range behaviors. I specialize in optimizing training pipelines and accelerating RL systems for complex multimodal and agentic LLM training.
|
|
News
[Aug 21, 2025] Three papers accepted to EMNLP 2025. Thanks to my excellent collaborators. See you in Suzhou!
[Jun 2025] Started as a research intern at Google DeepMind, Mountain View, CA. Excited to meet friends in the Bay Area!
[May 15, 2025] Four papers accepted at ACL 2025 (one selected as oral presentation). Thanks to my excellent collaborators. See you in Vienna!
[May 01, 2025] Spotlight at ICML 2025 (Top 2.59%). Thanks to my excellent collaborators. Happy to revisit Vancouver!
[Apr 23, 2025] Released audio-long-form-reasoner on GitHub. First to implement Qwen2.5-Omni and vLLM for faster RL reasoning across audio and other unified modalities.
[Jan 22, 2025] Three papers accepted to NAACL 2025. Thanks to my excellent collaborators. See you in Albuquerque!
|
|
β
Overcoming Multi-step Complexity in Theory-of-Mind Reasoning: A Scalable Bayesian Planner
Chunhui Zhang, Zhongyu Ouyang, Kwonjoon Lee, Nakul Agarwal, Sean Dae Houlihan, Soroush Vosoughi, Shao-Yuan Lo
ICML 2025 β Spotlight (Top 2.59%)
Code / Paper
First scalable solution for multi-step Theory-of-Mind reasoning. Uses Bayesian inverse planning for global planning, then lets LLMs focus on local reasoning. Works on 70B+ models where others fail.
|
|
β
Growing Through Experience: Scaling Episodic Grounding in Language Models
Chunhui Zhang, Sirui Wang, Zhongyu Ouyang, Xiangchi Yuan, Soroush Vosoughi
ACL 2025 β Oral Presentation (Top 3.24%)
Code / Paper
Post-trained reasoning LLMs (across 3B, 8B, 70B) on MCTS-sampled data from physical simulators.
|
|
β
Pretrained Image-Text Models are Secretly Video Captioners
Chunhui Zhang*, Yiren Jian*, Zhongyu Ouyang, Soroush Vosoughi
NAACL 2025 Main Conference β Oral Presentation (Top 2.88%)
Code / Paper / Slides
RL post-training recipe that achieved Top-2 on PapersWithCode video captioning leaderboard, outperforming industry MLLM video captioners.
|
|
β
Knowing More, Acting Better: Hierarchical Representation for Embodied Decision-Making
Chunhui Zhang, Zhongyu Ouyang, Xingjian Diao, Zheyuan Liu, Soroush Vosoughi
EMNLP 2025 Findings
Paper
Refines vision-language-action LLM representations to enable more effective PPO-based RL training in embodied AI.
|
|
Working Memory Identifies Reasoning Limits in Language Models
Chunhui Zhang, Yiren Jian, Zhongyu Ouyang, Soroush Vosoughi
EMNLP 2024
Code / Paper
Introduced working memory as a diagnostic tool for LLM reasoning limits. This work inspired a follow-up NAACL paper on long-context multimodal understanding.
|
|
Temporal Working Memory: Query-Guided Segment Refinement for Enhanced Multimodal Understanding
Xingjian Diao*, Chunhui Zhang*, Weiyi Wu, Zhongyu Ouyang, Peijun Qing, Ming Cheng, Soroush Vosoughi, Jiang Gui
NAACL 2025 Findings
Code / Paper
Inspired by working memory in my EMNLP 2024 paper, this work is a follow-up study on long-context video-language understanding.
|
|
SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models
Xingjian Diao, Chunhui Zhang, Keyi Kong, Weiyi Wu, Chiyu Ma, Zhongyu Ouyang, Peijun Qing, Soroush Vosoughi, Jiang Gui
EMNLP 2025 β Oral Presentation
Paper
|
|
Superficial Self-Improved Reasoners Benefit from Model Merging
Xiangchi Yuan, Chunhui Zhang, Zheyuan Liu, Dachuan Shi, Soroush Vosoughi, Wenke Lee
EMNLP 2025
Code / Paper
|
|
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Yiren Jian, Tingkai Liu, Yunzhe Tao, Chunhui Zhang, Soroush Vosoughi, Hongxia Yang
ACL 2024 β Oral Presentation (Top 3.10%)
Code / Paper
|
|
When Sparsity Meets Contrastive Models: Less Data Can Bring Better Class-Balanced Representations
Chunhui Zhang, Chao Huang, Yijun Tian, Qianlong Wen, Zhongyu Ouyang, Youhuan Li, Yanfang Ye, et al.
ICML 2023 β AAAI-DCAA 2023 Best Paper Runner-up Award
Paper
|
|
Chasing All-Round Graph Representation Robustness: Model, Training, and Optimization
Chunhui Zhang, Yijun Tian, Mingxuan Ju, Zheyuan Liu, Yanfang Ye, Nitesh Chawla, et al.
ICLR 2023
Code / Paper
|
|
Aligning Relational Learning with Lipschitz Fairness
Yaning Jia*, Chunhui Zhang*, Soroush Vosoughi
ICLR 2024 (*co-first author)
Code / Paper
|
|
Mitigating Emergent Robustness Degradation on Graphs while Scaling Up
Xiangchi Yuan*, Chunhui Zhang*, Yijun Tian, Yanfang Ye, et al.
ICLR 2024 (*co-first author)
Code / Paper
|
|
To All Students/Friends
Feel free to drop me an email if you're up for a laid-back chat about life, career, or research. I'm dedicating (at least) 30 minutes each week for these discussions, and I'm particularly eager to connect with students from underrepresented backgrounds or those dealing with challenges or inequity. Just reach outβI'm here to listen!
|
|
Fun Stuff
Racingβa happy part of my life. I particularly enjoy go-karting and circuit racing (some fun facts:
1st and 2nd place at Supercharged). But there is one type of racing that I have yet to tryβmy favorite rally driving (My favorite rally driver is Han Han).
|
|