Showing 1–20 of 25 results
/ Date/ Name
Aug 20, 2019Make a Face: Towards Arbitrary High Fidelity Face ManipulationJul 14, 2025EmbRACE-3K: Embodied Reasoning and Action in Complex EnvironmentsMar 1, 2023StraIT: Non-autoregressive Generation with Stratified Image TransformerApr 23, 2024ID-Animator: Zero-Shot Identity-Preserving Human Video GenerationMar 25, 2024Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought ReasoningOct 28, 2021Blending Anti-Aliasing into Vision TransformerJun 25, 2024Text-Animator: Controllable Visual Text Video GenerationAug 18, 2019Aggregation via Separation: Boosting Facial Landmark Detector with Semi-Supervised Style TranslationDec 21, 2022What Makes for Good Tokenizers in Vision Transformer?Feb 12, 2026AssetFormer: Modular 3D Assets Generation with Autoregressive TransformerMar 24, 2025MuMA: 3D PBR Texturing via Multi-Channel Multi-View Generation and Agentic Post-ProcessingSep 21, 2023LongLoRA: Efficient Fine-tuning of Long-Context Large Language ModelsMar 26, 2025MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D GenerationApr 15, 2023TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic SegmentationDec 7, 2023Prompt Highlighter: Interactive Control for Multi-Modal LLMsJan 17, 2020Temporal Interlacing NetworkMay 26, 2025StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image GenerationDec 19, 2021On Efficient Transformer-Based Image Pre-training for Low-Level VisionSep 29, 2025Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated MeshesNov 1, 2025Rethinking Facial Expression Recognition in the Era of Multimodal Large Language Models: Benchmark, Datasets, and Beyond