Showing 1–11 of 11 results
/ Date/ Name
Nov 20, 2024Extending Video Masked Autoencoders to 128 framesJul 25, 2020OpenRooms: An End-to-End Open Framework for Photorealistic Indoor Scene DatasetsJul 7, 2025Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic CapabilitiesOct 9, 2023Language Model Beats Diffusion -- Tokenizer is Key to Visual GenerationAug 14, 2019Multiview-Consistent Semi-Supervised Learning for 3D Human Pose EstimationSep 24, 2019Multi-Person 3D Human Pose Estimation from Monocular ImagesJul 6, 2023VideoGLUE: Video General Understanding Evaluation of Foundation ModelsFeb 20, 2024VideoPrism: A Foundational Visual Encoder for Video UnderstandingMay 30, 2022Temporal Latent Bottleneck: Synthesis of Fast and Slow Processing Mechanisms in Sequence LearningJul 5, 2021Test-Time Personalization with a Transformer for Human Pose EstimationDec 12, 2024Neptune: The Long Orbit to Benchmarking Long Video Understanding