Showing 1–20 of 29 results
/ Date/ Name
May 29, 2015Cross-domain Image Retrieval with a Dual Attribute-aware Ranking NetworkAug 7, 2023DiT: Efficient Vision Transformers with Dynamic Token RoutingApr 27, 2021Rethinking BiSeNet For Real-time Semantic SegmentationJan 13, 2026UM-Text: A Unified Multimodal Model for Image Understanding and Visual Text EditingOct 5, 2022Meta-Ensemble Parameter LearningFeb 8, 2024Scalable Diffusion Models with State Space BackboneNov 2, 2023Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative GroundingJun 3, 2024Dimba: Transformer-Mamba Diffusion ModelsSep 1, 2024FLUX that Plays MusicJan 3, 2025JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video EditingFeb 1, 2023EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network DesignNov 30, 2022Uncertainty-Aware Image CaptioningApr 20, 2024Music Consistency ModelsJul 21, 2023Divide and Adapt: Active Domain Adaptation via Customized LearningJul 16, 2024Scaling Diffusion Transformers to 16 Billion ParametersSep 27, 2025Dynamic-TreeRPO: Breaking the Independent Trajectory Bottleneck with Structured SamplingDec 22, 2023Tuning-Free Inversion-Enhanced Control for Consistent Image EditingNov 27, 2023A-JEPA: Joint-Embedding Predictive Architecture Can ListenJun 8, 2022Language-Bridged Spatial-Temporal Interaction for Referring Video Object SegmentationMar 25, 2017More is Less: A More Complicated Network with Less Inference Complexity