Showing 1–18 of 18 results
/ Date/ Name
Jan 24, 2024VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web TasksFeb 7, 2020SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side InformationJan 12, 2021Cross-Modal Contrastive Learning for Text-to-Image GenerationMay 18, 2021Pathdreamer: A World Model for Indoor NavigationJan 31, 2023Grounding Language Models to Images for Multimodal Inputs and OutputsApr 6, 2022Simple and Effective Synthesis of Indoor 3D ScenesNov 7, 2020Text-to-Image Generation Grounded by Fine-Grained User AttentionMay 26, 2023Generating Images with Multimodal Language ModelsJul 1, 2024Tree Search for Language Model AgentsOct 9, 2021Vector-quantized Image Modeling with Improved VQGANOct 11, 2023Multimodal Graph Learning for Generative TasksApr 14, 2021Revisiting Hierarchical Approach for Persistent Long-Term Video PredictionJun 29, 2016Object Boundary Detection and Classification with Image-level LabelsOct 6, 2022A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation LearningJun 18, 2024Dissecting Adversarial Robustness of Multimodal LM AgentsFeb 14, 2023VQ3D: Learning a 3D-Aware Generative Model on ImageNetJun 22, 2022Scaling Autoregressive Models for Content-Rich Text-to-Image GenerationFeb 27, 2024OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web