About Me

I am a sophomore undergraduate majoring in Artificial Intelligence at Yuanpei College, Peking University.

I am fortunate to be advised by Professor Yaodong Yang at the Institute for AI, Peking University.

My research interests cover Alignment and Interaction (e.g., Scalable Oversight, which is essential to the safety of advanced AI systems). I’m also interested in Game Theory and Multi-Agent Systems. My current research focuses on the goal of constructing safe and trustworthy AI systems. You can find my research statement here.

My answer to the Hamming question (“What are the most important problems [that you should probably work on]?”):

  • How to align systems smarter than humans and how to align them on tasks challenging for human evaluation? (i.e., scalable oversight)
  • How can we integrate theory and experimental validation to embed moral values into AI systems? (e.g., moral reflection and moral progress) and address the AI alignment problem from a socio-technical perspective.

I have just started on this long road, and I will leverage my youth and curiosity to seize more opportunities and time for an in-depth exploration of these problems.


  • (06/2024)🎉 We introduce the PKU-SafeRLHF dataset, designed to promote research on safety alignment in LLMs.
  • (06/2024)🎙️ Happy to introduce our new work about elasticity of LLMs. Click here for further details.
  • (04/2024)🎊 Our work - BeaverTails has been recognized by Meta, further contributing to AI safety research.
  • (03/2024)💥 Our alignment survey has been recognized by NIST! More details.
  • (03/2024)🚀 We have made significant updates to the alignment survey (V4)!
  • (02/2024)💥 We release Aligner: a new efficient alignment paradigm, bypasses the whole RLHF process.
  • (11/2023)🎙️ I am honored to give a talk about our alignment survey!
  • (11/2023)🚀 We release AI Alignment Survey and Alignment Resource Website. Welcome to further discussion!


  • (Under Review) Language Models Resist Alignment
    Jiaming Ji*, Kaile Wang*, Tianyi Qiu*, Boyuan Chen*, Jiayi Zhou, Changye Li, Hantao Lou, and Yaodong Yang
  • (Under Review) Aligner: Achieving Efficient Alignment through Weak-to-Strong Correction
    Jiaming Ji*, Boyuan Chen*, Hantao Lou, Donghai Hong, Borong Zhang, Xuehai Pan, Juntao Dai, and Yaodong Yang
    📄[Paper] 🌐[Website] 🌟[Media]
  • (Under Review) PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models
    🤗[Dataset] 🌐[Website]
  • (Under Review) Efficient Model-agnostic Alignment via Bayesian Persuasion
  • (Preprint) AI Alignment: A Comprehensive Survey
    Jiaming Ji*,Tianyi Qiu*,Boyuan Chen*,Borong Zhang*,Hantao Lou,Kaile Wang,Yawen Duan,Zhonghao He,Jiayi Zhou,Zhaowei Zhang,Fanzhi Zeng,Kwan Yee Ng,Juntao Dai,Xuehai Pan,Aidan O’Gara,Yingshan Lei,Hua Xu,Brian Tse,Jie Fu,Stephen McAleer,Yaodong Yang,Yizhou Wang,Song-Chun Zhu,Yike Guo,and Wen Gao
    📄[Paper] 🌐[Website] 🎥[Video] 🌟[PKU-Alignment Group]

  • (NeurIPS 2023) BEAVERTAILS: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
    Jiaming Ji*, Mickel Liu*, Juntao Dai*, Xuehai Pan, Chi Zhang, Ce Bian, Boyuan Chen, Ruiyang Sun, Yizhou Wang, Yaodong Yang
    📄[Paper] 🌐[Website]


Selected Awards

  • 2024: SenseTime Scholarship (25/year in China, 1/25, ¥20000 RMB)
  • 2024: Yicong Huang Scholarship (research innovation award, ¥8000 RMB)
  • 2024: Research Excellence Award (¥5000 RMB)
  • 2024: Ching-Ling Soong Future Scholarship (¥5000 RMB)
  • 2023: Yicong Huang Scholarship (¥8000 RMB)
  • 2023: Peking University Merit Student (<2%)
  • 2023: Peking University Third Prize Scholarship (¥4000 RMB)
  • 2023: Peking University Public Service Scholarship (¥2000 RMB)
  • 2022: Peking University Freshman Scholarship (¥10000 RMB)