Showing 1–20 of 92 results
/ Date/ Name
Apr 23, 2026StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity RecognitionApr 23, 2026UAU-Net: Uncertainty-aware Representation Learning and Evidential Classification for Facial Action Unit DetectionApr 22, 2026ONOTE: Benchmarking Omnimodal Notation Processing for Expert-level Music IntelligenceApr 22, 2026AttentionBender: Manipulating Cross-Attention in Video Diffusion Transformers as a Creative ProbeApr 22, 2026Building a Precise Video Language with Human-AI OversightApr 21, 2026AutoAWG: Adverse Weather Generation with Adaptive Multi-Controls for Automotive VideosApr 20, 2026XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied EnvironmentsApr 17, 2026MOMENTA: Mixture-of-Experts Over Multimodal Embeddings with Neural Temporal Aggregation for Misinformation DetectionApr 9, 2026MSCT: Differential Cross-Modal Attention for Deepfake DetectionApr 8, 2026LungCURE: Benchmarking Multimodal Real-World Clinical Reasoning for Precision Lung Cancer Diagnosis and TreatmentFeb 23, 2026A Very Big Video Reasoning SuiteFeb 7, 2026Learning Brain Representation with Hierarchical Visual EmbeddingsJan 15, 2026Handling Missing Modalities in Multimodal Survival Prediction for Non-Small Cell Lung CancerJan 7, 2026Apollo: Unified Multi-Task Audio-Video Joint GenerationDec 23, 2025SemCovert: Secure and Covert Video Transmission via Deep Semantic-Level HidingAug 1, 2025AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song GenerationJul 27, 2025T$^\text{3}$SVFND: Towards an Evolving Fake News Detector for Emergencies with Test-time Training on Short Video PlatformsMay 19, 2025MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their MixApr 24, 2025Machine Learning-Based Prediction of Quality Shifts on Video Streaming Over 5GApr 1, 2025A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives