Showing 1–13 of 13 results
/ Date/ Name
Sep 6, 2024Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-TaskJul 8, 2024Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language NavigationDec 5, 2023DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text GuidanceAug 31, 2023Towards High-Fidelity Text-Guided 3D Face Generation and Manipulation Using only ImagesAug 22, 2023GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-trainingApr 10, 2023DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region AlignmentDec 2, 20223D-TOGO: Towards Text-Guided Cross-Category 3D Object GenerationFeb 17, 2022Revisiting Over-smoothing in BERT from the Perspective of GraphFeb 14, 2022Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training BenchmarkAug 7, 2021NASOA: Towards Faster Task-oriented Online Fine-tuning with a Zoo of ModelsJun 21, 2021SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous DrivingFeb 25, 2021SparseBERT: Rethinking the Importance Analysis in Self-attentionJul 18, 2020CATCH: Context-based Meta Reinforcement Learning for Transferrable Architecture Search