"au:"Jing Yu Koh"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Jing Yu Koh"" — arXiv2 Search

Showing 1–18 of 18 results

/ Date/ Name

Jan 24, 2024VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks Feb 7, 2020SideInfNet: A Deep Neural Network for Semi-Automatic Semantic Segmentation with Side Information Jan 12, 2021Cross-Modal Contrastive Learning for Text-to-Image Generation May 18, 2021Pathdreamer: A World Model for Indoor Navigation Jan 31, 2023Grounding Language Models to Images for Multimodal Inputs and Outputs Apr 6, 2022Simple and Effective Synthesis of Indoor 3D Scenes Nov 7, 2020Text-to-Image Generation Grounded by Fine-Grained User Attention May 26, 2023Generating Images with Multimodal Language Models Jul 1, 2024Tree Search for Language Model Agents Oct 9, 2021Vector-quantized Image Modeling with Improved VQGAN Oct 11, 2023Multimodal Graph Learning for Generative Tasks Apr 14, 2021Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction Jun 29, 2016Object Boundary Detection and Classification with Image-level Labels Oct 6, 2022A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning Jun 18, 2024Dissecting Adversarial Robustness of Multimodal LM Agents Feb 14, 2023VQ3D: Learning a 3D-Aware Generative Model on ImageNet Jun 22, 2022Scaling Autoregressive Models for Content-Rich Text-to-Image Generation Feb 27, 2024OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web