Showing 1–20 of 32 results
/ Date/ Name
Mar 17, 2021Disentangled Cycle Consistency for Highly-realistic Virtual Try-OnApr 19, 2023MetaBEV: Solving Sensor Failures for BEV Detection and Map SegmentationOct 28, 2024CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D GaussiansMar 30, 2023Soft Neighbors are Positive Supporters in Contrastive Visual Representation LearningNov 26, 2023Advancing Vision Transformers with Group-Mix AttentionOct 11, 2021Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation LearningMay 26, 2022AdaptFormer: Adapting Vision Transformers for Scalable Visual RecognitionOct 8, 2023InstructDET: Diversifying Referring Object Detection with Generalized InstructionsSep 30, 2023PixArt-$α$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image SynthesisJul 5, 2024WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in DrivingOct 6, 2025Character Mixing for Video GenerationSep 4, 2025A Generative Foundation Model for Chest RadiographyDec 3, 2025RELIC: Interactive Video World Model with Long-Horizon MemoryDec 6, 2025Rethinking Training Dynamics in Scale-wise Autoregressive GenerationJul 21, 2021CycleMLP: A MLP-like Architecture for Dense PredictionFeb 25, 2024RoboCodeX: Multimodal Code Generation for Robotic Behavior SynthesisNov 24, 2023Large Language Models as Automated Aligners for benchmarking Vision-Language ModelsSep 25, 2023Speed Co-Augmentation for Unsupervised Audio-Visual Pre-trainingDec 15, 2025DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch DecodersDec 12, 2025CreativeVR: Diffusion-Prior-Guided Approach for Structure and Motion Restoration in Generative and Real Videos