"au:"Zineng Tang"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Zineng Tang"" — arXiv2 Search

Showing 1–13 of 13 results

/ Date/ Name

Nov 21, 2022Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention Jul 6, 2021VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer Nov 30, 2023CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation Sep 28, 2022TVLT: Textless Vision-Language Transformer Oct 4, 2024Grounding Language in Multi-Perspective Referential Communication May 13, 2020Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA Jun 4, 2025Images are Worth Variable Length of Representations Mar 19, 2025TULIP: Towards Unified Language-Image Pretraining May 19, 2023Any-to-Any Generation via Composable Diffusion Dec 5, 2022Unifying Vision, Text, and Layout for Universal Document Processing Dec 9, 2024Evaluating Model Perception of Color Illusions in Photorealistic Scenes Jul 23, 2024AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game May 18, 2023Paxion: Patching Action Knowledge in Video-Language Foundation Models