Showing 1–20 of 26 results
/ Date/ Name
May 27, 2024RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V TrustworthinessJun 23, 2025RLPR: Extrapolating RLVR to General Domains without VerifiersOct 1, 2023Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal AssistantsSep 16, 2025MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training RecipeApr 30, 2026MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal InteractionDec 1, 2023RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human FeedbackAug 21, 2023SeqGPT: An Out-of-the-box Large Language Model for Open Domain Sequence UnderstandingMar 9, 2023Knowledge-augmented Few-shot Visual Relation DetectionAug 23, 2023Large Multilingual Models Pivot Zero-Shot Multimodal Learning across LanguagesNov 22, 2022Visually Grounded Commonsense Knowledge AcquisitionFeb 9, 2023Guttation Monitor: Wearable Guttation Sensor for Plant Condition Monitoring and DiagnosisAug 3, 2024MiniCPM-V: A GPT-4V Level MLLM on Your PhoneDec 11, 2024Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual DescriptionsJan 5, 2026Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object RepresentationJan 21, 2026The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language ModelsJul 27, 2023MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained Semantic Classes and Hard Negative EntitiesApr 16, 2022Contrastive Learning with Hard Negative Entities for Entity Set ExpansionJul 24, 2024AI-Gadget Kit: Integrating Swarm User Interfaces with LLM-driven Agents for Rich Tabletop Game ApplicationsFeb 3, 2025Process Reinforcement through Implicit RewardsJan 21, 2025EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents