arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Eli Tran-Johnson"" — arXiv2 Search
Showing 1–7 of 7 results
/ Date
/ Name
Nov 4, 2022
Measuring Progress on Scalable Oversight for Large Language Models
Feb 15, 2023
The Capacity for Moral Self-Correction in Large Language Models
Aug 23, 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Dec 15, 2022
Constitutional AI: Harmlessness from AI Feedback
Oct 20, 2023
Specific versus General Principles for Constitutional AI
Jul 11, 2022
Language Models (Mostly) Know What They Know
Dec 19, 2022
Discovering Language Model Behaviors with Model-Written Evaluations