• 2025.11:  🌟 We are thrilled to announce the release of the first comprehensive AI Deception Report! Our work has been recognized and referenced by the UN Secretary-General's Scientific Advisory Board (UN SAB), and has received praise from leading scholars Yoshua Bengio and Stuart Russell.
  • 2025.09:  🎉 Three papers have been accepted to NeurIPS 2025! Among them, InterMT was selected as a Spotlight (Top 2.6%).
  • 2025.09:  🎉 Our work AI Alignment: A Comprehensive Survey has been accepted by ACM Computing Surveys Impact Factor: 28.0 , (ranked 1/147 in Computer Science Theory & Methods) !
  • 2025.09:  🎙️ We are excited to announce our latest work, “Shadows of Intelligence: A Comprehensive Survey of AI Deception.” For more details, please visit here.
  • 2025.07:  🎉 Our work Language Models Resist Alignment has been awarded the ACL 2025 Best Paper!
  • 2025.06:  🎊 Our work MedAligner has been accepted by The Innovation (IF: 33.2) ! MedAligner demonstrates the potential of Aligner (our NeurIPS 2024 oral work) in the medical domain.
  • 2025.06:  🎉 Two papers are accepted by ACL 2025 Main.
  • 2025.05:  🎉 We open-source InterMT, the first multi-turn multimodal understanding and generation human preference dataset. Welcome to discuss and collaborate!
More news
  • 2025.01:  🎉 We release Align-DS-V, the first multimodal strong reasoning model.
  • 2024.10:  💥 We open-source the first all-modality alignment framework - Align-Anything!
  • 2024.09:  💥 Aligner has been accepted as an Oral presentation at NeurIPS 2024!
  • 2024.06:  🎉 We introduce the PKU-SafeRLHF dataset, designed to promote research on safety alignment in LLMs.
  • 2024.06:  🎙️ Happy to introduce our new work about elasticity of LLMs. Click here for further details.
  • 2024.04:  🎊 Our work - BeaverTails has been recognized by Meta, further contributing to AI safety research.
  • 2024.03:  💥 Our alignment survey has been recognized by NIST! More details.
  • 2024.03:  🚀 We have made significant updates to the alignment survey (V4)!
  • 2024.02:  💥 We release Aligner: a new efficient alignment paradigm, bypasses the whole RLHF process.
  • 2023.11:  🚀 We release AI Alignment Survey and Alignment Resource Website. Welcome to further discussion!