Showing 1–20 of 28 results
/ Date/ Name
Jan 28, 2021The Role of Syntactic Planning in Compositional Image CaptioningMay 5, 2020It's Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual InformationOct 25, 2023On the Interplay between Fairness and ExplainabilityAug 22, 2023StoryBench: A Multifaceted Benchmark for Continuous Story VisualizationOct 24, 2022Multilingual Multimodal Learning with Machine Translated TextMay 23, 2023Weakly-Supervised Learning of Visual Relations in Multimodal PretrainingApr 22, 2022Mostra: A Flexible Balancing Framework to Trade-off User, Artist and Platform Objectives for Music SequencingMay 24, 2022Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution GeneralizationSep 28, 2021Visually Grounded Reasoning across Languages and CulturesMay 30, 2019Matrix Completion in the Unit Hypercube via Structured Matrix FactorizationSep 6, 2019Enhancing Machine Translation with Dependency-Aware Self-AttentionJan 27, 2022IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and LanguagesMar 30, 2023A Study of Autoregressive Decoders for Multi-Tasking in Computer VisionMay 12, 2023Measuring Progress in Fine-grained Vision-and-Language UnderstandingNov 30, 2020Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTsSep 9, 2021Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal TransformersMar 6, 2025What Are You Doing? A Closer Look at Controllable Human Video GenerationJun 9, 2022Ancestor-to-Creole Transfer is Not a Walk in the ParkSep 19, 2025Dynamic Classifier-Free Diffusion Guidance via Online FeedbackJul 14, 2022Language Modelling with Pixels