"au:"Kai Qiu"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Kai Qiu"" — arXiv2 Search

Showing 1–20 of 25 results

/ Date/ Name

Jun 14, 2024ControlVAR: Exploring Controllable Visual Autoregressive Modeling Dec 14, 2023Exploring Transferability for Randomized Smoothing Aug 16, 2024Efficient Autoregressive Audio Modeling via Next-Scale Prediction Sep 15, 2025Image Tokenizer Needs Post-Training Mar 11, 2025Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis Nov 30, 2023MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation Nov 30, 2023ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models Feb 2, 2026RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents Nov 22, 2022Weakly-supervised Pre-training for 3D Human Pose Estimation via Perspective Knowledge Oct 2, 2024ImageFolder: Autoregressive Image Generation with Folded Tokens Jan 7, 2025Three-dimensional attention Transformer for state evaluation in real-time strategy games Nov 20, 2024REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents Mar 7, 2024$\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations Jul 29, 2019Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting Dec 5, 2024MageBench: Bridging Large Multimodal Models to Agents Dec 2, 2024XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive Generation Sep 29, 2025InfoAgent: Advancing Autonomous Information-Seeking Agents Mar 14, 2025HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models Jul 31, 2025Phi-Ground Tech Report: Advancing Perception in GUI Grounding May 21, 2025ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning