Showing 1–20 of 43 results
/ Date/ Name
Apr 7, 2025Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal PromptingMar 25, 2024DPStyler: Dynamic PromptStyler for Source-Free Domain GeneralizationAug 21, 2024CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with DiffusionApr 18, 2024V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction TuningMar 24, 2024Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal UnderstandingJul 7, 2023LaunchpadGPT: Language Model as Music Visualization Designer on LaunchpadSep 25, 2022Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence RewardJun 17, 2023LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary CaptioningDec 29, 2023Video Understanding with Large Language Models: A SurveyJun 18, 2024Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?May 29, 2025ZeroSep: Separate Anything in Audio with Zero TrainingOct 8, 2025ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability MemoryApr 7, 2021Mechanical Properties of Gradient Copper Nano-Gyroid Cellular Structures: A Molecular Dynamics StudyMay 4, 2023Caption Anything: Interactive Image Description with Diverse Multimodal ControlsSep 23, 2024AIM 2024 Challenge on Video Saliency Prediction: Methods and ResultsOct 13, 2024MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language ModelsOct 16, 2024Large Enhancement of Properties in Strained Lead-free Multiferroic Solid Solutions with Strong Deviation from Vegard's LawJan 12, 2025Fatigue-free ferroelectricity in Hf0.5Zr0.5O2 ultrathin films via interfacial designApr 15, 2025Harnessing the Computation Redundancy in ViTs to Boost Adversarial TransferabilityFeb 1, 2024GaussianStyle: Gaussian Head Avatar via StyleGAN