"au:"Florian Dorner"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Florian Dorner"" — arXiv2 Search

Showing 1–14 of 14 results

/ Date/ Name

Oct 17, 2024Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data Feb 3, 2024Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget May 25, 2023Incentivizing Honesty among Competitors in Collaborative Learning and Optimization Dec 20, 2022Human-Guided Fair Classification for Natural Language Processing Jul 16, 2025ROC-n-reroll: How verifier imperfection affects test-time scaling Oct 10, 2021Algorithmic collusion: A critical review Feb 9, 2021Measuring Progress in Deep Reinforcement Learning Sample Efficiency Aug 6, 2018Melting Si: beyond density functional theory Aug 4, 2020Forecasting AI Progress: A Research Agenda Jun 9, 2025How Benchmark Prediction from Fewer Data Misses the Mark Jul 10, 2024Training on the Test Task Confounds Evaluation and Emergence Nov 9, 2023Challenging the Validity of Personality Tests for Large Language Models Jul 30, 2025Stop Evaluating AI with Human Tests, Develop Principled, AI-specific Tests instead Jun 9, 2024Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback