Hi there! I am Hao Li (ζζ in Chinese), a final-year Ph.D. candidate in the School of Computer Science at Peking University, advised by Prof. Li Yuan and Prof. Yonghong Tian. Before that, I got my Bachelor degree in Computer Science at Peking University with a Summa Cum Laude.
My research interests include multimodal Learning, visual understanding, and AI for Chemical Science. I have published more than 20 papers at the top international AI conferences with total Google Scholar citations 2000+.
π₯ News
- 2026.5: Β ππ Our MHE-LLama-8B for unified molecule understanding & generation via LLM has accepted by Nature Communications.
- 2025.06: Β ππ We release ChemCoTBench, the first LLM-benchmark evaluating step-wise reasoning on complex chemical tasks, which is accepted by NeurIPS-2025.
- 2025.06: Β ππ Our SUR-LID for Face Forgery Detection accepted by CVPR-2025.
- 2024.12: Β ππ Our ECDFormer for Efficent Spectra Prediction accepted by Nature Computational Science.
- 2024.08: Β ππ Two papers accepted by The 18th European Conference on Computer Vision(ECCV-2024).
- 2023.08: Β ππ One paper accepted by The International Conference on Computer Vision(ICCV-2023).
- 2023.06: Β ππ One paper accepted by Transactions on Image Processing(TIP).
- 2023.04: Β ππ Three paper accepted by The International Joint Conference on Artifical Intelligence(IJCAI-2023).
- 2022.08: Β ππ One oral paper accepted by The International Conference on Multimedia and Expo(ICME-2022-Oral).
π Selected Publications

Navigating Chemical-Linguistic Sharing Space with Heterogeneous Molecular Encoding
Hao Li, Liuzhenghao Lv, Yu Wang, Zijun Chen, Yuyang Liu, Li Yuanβ , Yonghong Tianβ
- Official Homepage for MHE, a Nature Communications work on heterogeneous molecular encoding for chemical language models. MHE bridges molecular structures and natural language by unifying sequence, topology, geometry, and fragment-level information, supporting bidirectional molecular understanding and generation across chemical-linguistic space.

Beyond Chemical QA: Evaluating LLMβs Chemical Reasoning with Modular Chemical Operations
Hao Li, He Cao, Bin Feng, Yanjun Shao, Xiangru Tang, Zhiyuan Yan, Li Yuanβ , Yonghong Tianβ , Yu Liβ
- Official Homepage for the ChemCoTBench, the first step-wise reasoning benchmark for LLMs in the Chemical domain, covering molecular understanding, editing, optimization, and reaction-related predictions. We have contributed an expert-annotated benchmark dataset of 1.5K samples, along with a high-quality SFT-RL reasoning dataset comprising 22K instances.

Hierarchical banzhaf interaction for general video-language representation learning
Hao Li, Peng Jin, Shuicheng Yan, Li Yuanβ , Jie Chenβ
- Official Homepage for the HBI, a new approach that models video-text as game players using multivariate cooperative game theory to handle uncertainty during fine-grained semantic interactions with diverse granularity, flexible combination, and vague intensity.

FreestyleRet: Retrieving Images from Style-Diversified Queries
Hao Li, Yanhao Jia, Peng Jin, Zesen Cheng, Kehan Li, Jialu Sui, Chang Liu, Li Yuan
- Official Code for the FreestyleRet framework. Official Release for the Diversified-Style Retrieval Dataset (DSR).

Decoupled peak property learning for efficient and interpretable ECD spectra prediction
Hao Li, Da Long, Li Yuan, Yu Wang, Yonghong Tian, Xinchang Wang, Fanyang Mo
- Existing predictive approaches lack the consideration of ECD spectra due to the data scarcity, and the interpretability to achieve trust-worthy prediction. Here, we establish a large-scale dataset for Chiral Molecular ECD spectra (CMCDS) and propose ECDFormer for accurate and interpretable ECD spectra prediction. ECDFormer decomposes ECD spectra into peak entities, employs the QFormer architecture to learn peak properties, and renders peaks into spectra. Compared to spectra sequence prediction methods, our decoupled peak prediction approach substantially enhances both accuracy and efficiency, improving the peak symbol accuracy from 37.3% to 72.7% and decreasing the time cost from an average of 4.6 CPU hours to 1.5 seconds.

Diffusionret: Generative text-video retrieval with diffusion model
Hao Li, Peng Jin, Zesen Cheng, Kehan Li, Chang Liu, Li Yuan, Jie Chen
- Official Code for the DiffusionRet model. We propose DiffusionRet, a generative text-video retrieval framework that models retrieval as generating joint distributions from noise. Unlike standard methods, it excels not only on standard benchmarks but also in retrieving out-of-distribution videos, demonstrating superior generalization.
π Honors and Awards
- 2025.9 National Scholarship for Doctoral Students (Top-3), Peking University
- 2023.9 Hongqiao Scholarship in Peking University (Top 1%).
π Educations
- 2017.09 - 2021.06, Bachelor, in School of Electronics Engineering and Computer Science (EECS), Peking University.
- 2021.09 - 2023.09, Master, School of Electronics and Computer Engineering (ECE), Peking University.
- 2023.09 - 2026.07, PhD, School of Computer Science, Peking University.
- 2026.06 - now, Senior Engineer, Qwen Post-Training Team.
π» Internships
- 2020.07 - 2021.09, Mentored by Xu Li, Cognitive Computing Lab, Baidu Research, Beijing, China.
- 2022.07 - 2023.02, Mentored by Songyang Zhang, OpenMMLab, Shanghai AI Lab, Shanghai, China.
- 2024.12 - present, Mentored by Yu Li, International Digital Economy Academy, Shenzhen, China.