"au:"Di Zhang"" — arXiv2 SearchShowing 1–7 of 7 results
/ Date/ Name
Nov 6, 2025NVIDIA Nemotron Nano V2 VLAug 1, 2025AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song GenerationJun 24, 2025Kling-Foley: Multimodal Diffusion Transformer for High-Quality Video-to-Audio GenerationMar 11, 2025A Survey on Wi-Fi Sensing Generalizability: Taxonomy, Techniques, Datasets, and Future Research ProspectsDec 12, 2024Owl-1: Omni World Model for Consistent Long Video GenerationOct 10, 2024Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video ContentApr 15, 2024UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark