"au:"Shengju Qian"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Shengju Qian"" — arXiv2 Search

Showing 1–20 of 25 results

/ Date/ Name

Aug 20, 2019Make a Face: Towards Arbitrary High Fidelity Face Manipulation Jul 14, 2025EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Mar 1, 2023StraIT: Non-autoregressive Generation with Stratified Image Transformer Apr 23, 2024ID-Animator: Zero-Shot Identity-Preserving Human Video Generation Mar 25, 2024Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning Oct 28, 2021Blending Anti-Aliasing into Vision Transformer Jun 25, 2024Text-Animator: Controllable Visual Text Video Generation Aug 18, 2019Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style Translation Dec 21, 2022What Makes for Good Tokenizers in Vision Transformer?Feb 12, 2026AssetFormer: Modular 3D Assets Generation with Autoregressive Transformer Mar 24, 2025MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-Processing Sep 21, 2023LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Mar 26, 2025MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation Apr 15, 2023TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation Dec 7, 2023Prompt Highlighter: Interactive Control for Multi-Modal LLMs Jan 17, 2020Temporal Interlacing Network May 26, 2025StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation Dec 19, 2021On Efficient Transformer-Based Image Pre-training for Low-Level Vision Sep 29, 2025Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes Nov 1, 2025Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond