Showing 1–20 of 24 results
/ Date/ Name
May 11, 2023An Inverse Scaling Law for CLIP TrainingJun 25, 2020SmallBigNet: Integrating Core and Contextual Views for Video ClassificationJun 3, 2021CT-Net: Channel Tensorization Network for Video ClassificationMay 3, 2022In Defense of Image Pre-Training for Spatiotemporal RecognitionMay 30, 2024Scaling White-Box Transformers for VisionFeb 9, 2022L2B: Learning to Bootstrap Robust Models for Combating Label NoiseJun 27, 2023CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \$10,000 Budget; An Extra \$4,000 Unlocks 81.8% AccuracyAug 6, 2024MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for MedicineSep 29, 2025Rethinking JEPA: Compute-Efficient Video SSL with Frozen TeachersJan 21, 2026OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and GenerationOct 11, 20233D TransUNet: Advancing Medical Image Segmentation through Vision TransformersMar 23, 20243D-TransUNet for Brain Metastases Segmentation in the BraTS2023 ChallengeApr 21, 2022Fast AdvPropJul 21, 2023Consistency-guided Meta-Learning for Bootstrapping Semi-Supervised Medical Image SegmentationMay 7, 2025OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal LearningNov 25, 2024CLIPS: An Enhanced CLIP Framework for Learning with Synthetic CaptionsJun 12, 2024What If We Recaption Billions of Web Images with LLaMA-3?Dec 20, 2022Unleashing the Power of Visual Prompting At the Pixel LevelJan 9, 2024Revisiting Adversarial Training at ScaleJun 8, 2024Medical Vision Generalist: Unifying Medical Imaging Tasks in Context