Showing 1–20 of 21 results
/ Date/ Name
Apr 8, 2026TC-AE: Unlocking Token Capacity for Deep Compression AutoencodersMar 6, 2026StruVis: Enhancing Reasoning-based Text-to-Image Generation via Thinking with Structured VisionOct 28, 2025Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and GenerationOct 23, 2025ARGenSeg: Image Segmentation with Autoregressive Image Generation ModelOct 15, 2025When In Doubt, Abstain: The Impact of Abstention on Strategic ClassificationSep 28, 2025HieraTok: Multi-Scale Visual Tokenizer Improves Image Reconstruction and GenerationMay 5, 2025Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal InteractionJan 14, 2025ADAM: An AI Reasoning and Bioinformatics Model for Alzheimer's Disease Detection and Microbiome-Clinical Data IntegrationOct 10, 2024Intuitive interaction flow: A Dual-Loop Human-Machine Collaboration Task Allocation Model and an experimental studyDec 15, 2023SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation ImagerySep 14, 2023Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer LearningAug 20, 2023Towards Real-World Visual Tracking with Temporal ContextsDec 9, 2022Physically Plausible Animation of Human Upper Body from a Single ImageNov 28, 2022Progressive Learning without ForgettingSep 5, 2022RLIP: Relational Language-Image Pre-training for Human-Object Interaction DetectionJul 24, 2022MAR: Masked Autoencoders for Efficient Action RecognitionApr 6, 2022Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical ConsistencyMar 3, 2022TCTrack: Temporal Contexts for Aerial TrackingAug 24, 2021ParamCrop: Parametric Cubic Cropping for Video Contrastive LearningJun 15, 2021Relation Modeling in Spatio-Temporal Action Localization