Showing 1–20 of 25 results
/ Date/ Name
Jun 14, 2024ControlVAR: Exploring Controllable Visual Autoregressive ModelingDec 14, 2023Exploring Transferability for Randomized SmoothingAug 16, 2024Efficient Autoregressive Audio Modeling via Next-Scale PredictionSep 15, 2025Image Tokenizer Needs Post-TrainingMar 11, 2025Robust Latent Matters: Boosting Image Generation with Sampling Error SynthesisNov 30, 2023MicroCinema: A Divide-and-Conquer Approach for Text-to-Video GenerationNov 30, 2023ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion ModelsFeb 2, 2026RE-TRAC: REcursive TRAjectory Compression for Deep Search AgentsNov 22, 2022Weakly-supervised Pre-training for 3D Human Pose Estimation via Perspective KnowledgeOct 2, 2024ImageFolder: Autoregressive Image Generation with Folded TokensJan 7, 2025Three-dimensional attention Transformer for state evaluation in real-time strategy gamesNov 20, 2024REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion LatentsMar 7, 2024$\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under PerturbationsJul 29, 2019Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd CountingDec 5, 2024MageBench: Bridging Large Multimodal Models to AgentsDec 2, 2024XQ-GAN: An Open-source Image Tokenization Framework for Autoregressive GenerationSep 29, 2025InfoAgent: Advancing Autonomous Information-Seeking AgentsMar 14, 2025HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language ModelsJul 31, 2025Phi-Ground Tech Report: Advancing Perception in GUI GroundingMay 21, 2025ViaRL: Adaptive Temporal Grounding via Visual Iterated Amplification Reinforcement Learning