Showing 21–40 of 64 results
/ Date/ Name
Dec 18, 2023VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric EncoderApr 16, 2025InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer FrameworkSep 28, 2025HunyuanImage 3.0 Technical ReportJan 16, 2026Image-Text Knowledge Modeling for Unsupervised Multi-Scenario Person Re-IdentificationSep 4, 2025PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt RewritingFeb 6, 2026ChatUMM: Robust Context Tracking for Conversational Interleaved GenerationApr 22, 2024Phi-3 Technical Report: A Highly Capable Language Model Locally on Your PhoneMar 3, 2022Correlation-Aware Deep TrackingNov 2, 2021Relational Self-Attention: What's Missing in Attention for Video UnderstandingMay 6, 2025Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-TuningDec 23, 2025YCB-Handovers Dataset: Analyzing Object Weight Impact on Human Handovers to Adapt Robotic Handover MotionApr 6, 2026Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics ModelingApr 27, 2026Meta-CoT: Enhancing Granularity and Generalization in Image EditingNov 30, 2021MMPTRACK: Large-scale Densely Annotated Multi-camera Multiple People Tracking BenchmarkJul 31, 2022One-Shot Medical Landmark Localization by Edge-Guided Transform and Noisy Landmark RefinementAug 7, 2022Robust Multi-Object Tracking by Marginal InferenceDec 11, 2020A Multi-task Joint Framework for Real-time Person SearchNov 30, 2023MicroCinema: A Divide-and-Conquer Approach for Text-to-Video GenerationNov 30, 2023ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion ModelsJul 9, 2024RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models