Showing 21–40 of 92 results
/ Date/ Name
Mar 25, 2025AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion TransformersJan 3, 2025Robust Self-Paced Hashing for Cross-Modal Retrieval with Noisy LabelsDec 6, 2024LinVT: Empower Your Image-level Large Language Model to Understand VideosNov 22, 2024Health AI Developer FoundationsOct 2, 2024Harnessing the Latent Diffusion Model for Training-Free Image Style TransferSep 20, 2024ChemDFM-X: Towards Large Multimodal Model for ChemistryAug 3, 2024SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and SynopsesJul 31, 2024Open-Vocabulary Audio-Visual Semantic SegmentationJul 3, 2024MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song GenerationJul 2, 2024To Forget or Not? Towards Practical Knowledge Unlearning for Large Language ModelsJun 21, 2024EmpathyEar: An Open-source Avatar Multimodal Empathetic ChatbotJun 2, 2024Once-for-All: Controllable Generative Image Compression with Dynamic Granularity AdaptationMay 29, 2024Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text RetrievalMay 24, 2024Looking Backward: Streaming Video-to-Video Translation with Feature BanksMay 23, 2024Visual Echoes: A Simple Unified Transformer for Audio-Visual GenerationApr 29, 2024G-Refine: A General Quality Refiner for Text-to-Image GenerationApr 21, 2024Counterfactual Reasoning Using Predicted Latent Personality Dimensions for Optimizing Persuasion OutcomeMar 26, 2024Panonut360: A Head and Eye Tracking Dataset for Panoramic VideoMar 18, 2024QEAN: Quaternion-Enhanced Attention Network for Visual Dance GenerationMar 1, 2024An Experimental Study of Low-Latency Video Streaming over 5G