Showing 21–40 of 57 results
/ Date/ Name
Sep 15, 2022OmniVL:One Foundation Model for Image-Language and Video-Language TasksOct 30, 2023MM-VID: Advancing Video Understanding with GPT-4V(ision)Dec 21, 2017Smart, Sparse Contours to Represent and Edit ImagesJul 9, 2021ViTGAN: Training GANs with Vision TransformersAug 22, 2024AlphaFolding: 4D Diffusion for Dynamic Protein Structure Prediction with Reference and Motion GuidanceSep 2, 2025DroneSR: Rethinking Few-shot Thermal Image Super-Resolution from Drone-based PerspectiveApr 22, 2024Phi-3 Technical Report: A Highly Capable Language Model Locally on Your PhoneApr 27, 2026IPRU: Input-Perturbation-based Radio Frequency Fingerprinting Unlearning for LAWNsMar 20, 2023MM-REACT: Prompting ChatGPT for Multimodal Reasoning and ActionJun 3, 2022Visual Clues: Bridging Vision and Language Foundations for Image Paragraph CaptioningApr 29, 2021AutoFlow: Learning a Better Training Set for Optical FlowMay 6, 2021LASR: Learning Articulated Shape Reconstruction from a Monocular VideoApr 26, 2021DVMark: A Deep Multiscale Framework for Video WatermarkingJul 31, 2024The Llama 3 Herd of ModelsMar 6, 2025Simulating the Real World: A Unified Survey of Multimodal Generative ModelsNov 13, 2024TDGCN-Based Mobile Multiuser Physical-Layer Authentication for EI-Enabled IIoTApr 21, 2026LoViF 2026 Challenge on Real-World All-in-One Image Restoration: Methods and ResultsMay 27, 2022GIT: A Generative Image-to-text Transformer for Vision and LanguageMay 5, 2011Adaptively Learning the Crowd KernelApr 23, 2020Supervised Contrastive Learning