RPO: Fine-Tuning Visual Generative Models via Rich Vision-Language Preferences
cs.LG
/ Authors
/ Abstract
Traditional preference tuning methods for LLMs/Visual Generative Models often rely solely on reward model labeling, which can be opaque, offer limited insights into the rationale behind preferences, and are prone to issues such as reward hacking or overfitting. We introduce Rich Preference Optimization (RPO), a novel pipeline that leverages rich feedback signals from Vision Language Models (VLMs) to improve the curation of preference pairs for fine-tuning visual generative models like text-to-image diffusion models. Our approach begins with prompting VLMs to generate detailed critiques of synthesized images, from which we further prompt VLMs to extract reliable and actionable image editing instructions. By implementing these instructions, we create refined images, resulting in synthetic, informative preference pairs that serve as enhanced tuning datasets. We demonstrate the effectiveness of our pipeline and the resulting datasets in fine-tuning state-of-the-art diffusion models.