Showing 1–20 of 20 results
/ Date/ Name
Feb 25, 2019GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question AnsweringMar 1, 2021Generative Adversarial TransformersMar 8, 2018Compositional Attention Networks for Machine ReasoningNov 29, 2023SODA: Bottleneck Diffusion Models for Representation LearningNov 17, 2021Compositional Transformers for Scene GenerationJul 9, 2019Learning by Abstraction: The Neural State MachineDec 4, 2025SIMA 2: A Generalist Embodied Agent for Virtual WorldsFeb 27, 2026A Mixed Diet Makes DINO An Omnivorous Vision EncoderNov 16, 2022Holistic Evaluation of Language ModelsSep 12, 2025LayerLock: Non-collapsing Representation Learning with Progressive FreezingDec 15, 2025Recurrent Video Masked AutoencodersAug 16, 2021On the Opportunities and Risks of Foundation ModelsJun 13, 2024Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion ModelsNov 8, 2024Moving Off-the-Grid: Scene-Grounded Video RepresentationsMar 13, 2024Scaling Instructable Agents Across Many Simulated WorldsDec 19, 2024Scaling 4D RepresentationsDec 13, 2024How to Spin an Object: First, Get the Shape RightOct 30, 2020SLM: Learning a Discourse Language Representation with Sentence UnshufflingDec 9, 2024Can foundation models actively gather information in interactive environments to test hypotheses?Mar 8, 2024Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context