Showing 1–20 of 20 results
/ Date/ Name
Nov 8, 2020Adaptive Linear Span Network for Object Skeleton DetectionNov 23, 2022Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token MigrationJan 18, 2024VMamba: Visual State Space ModelJul 7, 2020Discretization-Aware Architecture SearchFeb 28, 2025Adaptive Keyframe Sampling for Long Video UnderstandingDec 5, 2021Exploring Complicated Search Spaces with Interleaving-Free SamplingFeb 18, 2025YOLOv12: Attention-Centric Real-Time Object DetectorsJan 24, 2024ChatterBox: Multi-round Multimodal Referring and GroundingMar 27, 2022Beyond Masking: Demystifying Token-Based Pre-Training for Vision TransformersJun 1, 2024Artemis: Towards Referential Understanding in Complex VideosSep 17, 2021GraFormer: Graph Convolution Transformer for 3D Pose EstimationJun 17, 2024ClawMachine: Learning to Fetch Visual Tokens for Referential ComprehensionDec 23, 2024Personalized Large Vision-Language ModelsNov 25, 2021Semantic-Aware Generation for Self-Supervised Visual Representation LearningMay 30, 2022HiViT: Hierarchical Vision Transformer Meets Masked Image ModelingMar 11, 2026From Imitation to Intuition: Intrinsic Reasoning for Open-Instance Video ClassificationAug 21, 2023Spatial Transform Decoupling for Oriented Object DetectionMay 26, 2024Building Vision Models upon Heat ConductionJun 30, 2025PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask ConditionsSep 18, 2025AutoEdit: Automatic Hyperparameter Tuning for Image Editing