Showing 41–60 of 95 results
/ Date/ Name
Oct 30, 2023Harvest Video Foundation Models via Efficient Post-PretrainingMay 12, 2026BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous DiffusionJan 20, 2022CP-Net: Contour-Perturbed Reconstruction Network for Self-Supervised Point Cloud LearningOct 31, 2023SEINE: Short-to-Long Video Diffusion Model for Generative Transition and PredictionDec 19, 2023M-BEV: Masked BEV Perception for Robust Autonomous DrivingJan 17, 2024Vlogger: Make Your Dream A VlogFeb 12, 2020Progressive Object Transfer DetectionSep 15, 2021Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in VideosSep 1, 2016Weakly Supervised PatchNets: Describing and Aggregating Local Patches for Scene RecognitionMar 5, 2018LSTD: A Low-Shot Transfer Detector for Object DetectionJan 8, 2025H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous DrivingJan 21, 2025InternVideo2.5: Empowering Video MLLMs with Long and Rich Context ModelingMay 18, 2025Video-GPT via Next Clip DiffusionDec 16, 2024CG-Bench: Clue-grounded Question Answering Benchmark for Long Video UnderstandingMar 6, 2025An Egocentric Vision-Language Model based Portable Real-time Smart AssistantJul 3, 2025VRAgent-R1: Boosting Video Recommendation with MLLM-based Agents via Reinforcement LearningDec 1, 2025InternVideo-Next: Towards General Video Foundation Models without Video-Text SupervisionJan 26, 2024From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four ModalitiesApr 24, 2024MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGIJul 13, 2023InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation