Showing 21–40 of 45 results
/ Date/ Name
Feb 15, 2026LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language ModelsApr 6, 2023TopNet: Transformer-based Object Placement Network for Image CompositingNov 10, 2022High-Quality Entity SegmentationApr 22, 2022Unified Pretraining Framework for Document UnderstandingMar 27, 2018Dual Attention Matching Network for Context-Aware Feature Sequence based Person Re-IdentificationDec 2, 2024XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive GenerationApr 18, 2024SOHES: Self-supervised Open-world Hierarchical Entity SegmentationSep 15, 2025Image Tokenizer Needs Post-TrainingNov 25, 2025HBridge: H-Shape Bridging of Heterogeneous Experts for Unified Multimodal Understanding and GenerationDec 16, 2025Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language ModelsDec 17, 2024LazyDiT: Lazy Learning for the Acceleration of Diffusion TransformersMar 16, 2026SNCE: Geometry-Aware Supervision for Scalable Discrete Image GenerationAug 17, 2022Text-to-Image Generation via Implicit Visual Guidance and HypernetworkDec 9, 2021CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and SegmentationJun 7, 2021SelfDoc: Self-Supervised Document Representation LearningNov 24, 2021Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-LabelingSep 14, 2021Multi-Scale Aligned Distillation for Low-Resolution DetectionDec 11, 2025VGent: Visual Grounding via Modular Design for Disentangling Reasoning and PredictionMar 17, 2026ViT-AdaLA: Adapting Vision Transformers with Linear AttentionMar 20, 2026DiffGraph: An Automated Agent-driven Model Merging Framework for In-the-Wild Text-to-Image Generation