- 2025.11: 🌟 We are thrilled to announce the release of the first comprehensive AI Deception Report! Our work has been recognized and referenced by the UN Secretary-General's Scientific Advisory Board (UN SAB), and has received praise from leading scholars Yoshua Bengio and Stuart Russell.
- 2025.09: 🎉 Three papers have been accepted to NeurIPS 2025! Among them, InterMT was selected as a Spotlight (Top 2.6%).
- 2025.09: 🎉 Our work AI Alignment: A Comprehensive Survey has been accepted by ACM Computing Surveys Impact Factor: 28.0 , (ranked 1/147 in Computer Science Theory & Methods) !
- 2025.09: 🎙️ We are excited to announce our latest work, “Shadows of Intelligence: A Comprehensive Survey of AI Deception.” For more details, please visit here.
- 2025.07: 🎉 Our work Language Models Resist Alignment has been awarded the ACL 2025 Best Paper!
- 2025.06: 🎊 Our work MedAligner has been accepted by The Innovation (IF: 33.2) ! MedAligner demonstrates the potential of Aligner (our NeurIPS 2024 oral work) in the medical domain.
- 2025.06: 🎉 Two papers are accepted by ACL 2025 Main.
- 2025.05: 🎉 We open-source InterMT, the first multi-turn multimodal understanding and generation human preference dataset. Welcome to discuss and collaborate!
More news
- 2025.01: 🎉 We release Align-DS-V, the first multimodal strong reasoning model.
- 2024.10: 💥 We open-source the first all-modality alignment framework - Align-Anything!
- 2024.09: 💥 Aligner has been accepted as an Oral presentation at NeurIPS 2024!
- 2024.06: 🎉 We introduce the PKU-SafeRLHF dataset, designed to promote research on safety alignment in LLMs.
- 2024.06: 🎙️ Happy to introduce our new work about elasticity of LLMs. Click here for further details.
- 2024.04: 🎊 Our work - BeaverTails has been recognized by Meta, further contributing to AI safety research.
- 2024.03: 💥 Our alignment survey has been recognized by NIST! More details.
- 2024.03: 🚀 We have made significant updates to the alignment survey (V4)!
- 2024.02: 💥 We release Aligner: a new efficient alignment paradigm, bypasses the whole RLHF process.
- 2023.11: 🚀 We release AI Alignment Survey and Alignment Resource Website. Welcome to further discussion!