Showing 1–15 of 15 results
/ Date/ Name
Feb 12, 2024Score-based Diffusion Models via Stochastic Differential Equations -- a Technical TutorialOct 5, 2024RainbowPO: A Unified Framework for Combining Improvements in Preference OptimizationMar 13, 2025RPO: Fine-Tuning Visual Generative Models via Rich Vision-Language PreferencesMay 30, 2023Policy Optimization for Continuous Reinforcement LearningJan 23, 2024Contractive Diffusion Probabilistic ModelsOct 2, 2025DiFFPO: Training Diffusion LLMs to Reason Fast and Furious via Reinforcement LearningFeb 3, 2025Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement LearningSep 12, 2024Scores as Actions: a framework of fine-tuning diffusion models by continuous-time reinforcement learningMay 23, 2024MallowsPO: Fine-Tune Your LLM with Preference DispersionsMay 19, 2025R3: Robust Rubric-Agnostic Reward ModelsOct 12, 2025Understanding Sampler Stochasticity in Training Diffusion Models for RLHFOct 16, 2024WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global CuisinesJun 2, 2025Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and AccountabilityNov 1, 2025SOCRATES: Simulation Optimization with Correlated Replicas and Adaptive Trajectory EvaluationsSep 17, 2024Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey