"au:"YunLong Tang"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"YunLong Tang"" — arXiv2 Search

Showing 1–20 of 43 results

/ Date/ Name

Apr 7, 2025Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting Mar 25, 2024DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization Aug 21, 2024CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion Apr 18, 2024V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning Mar 24, 2024Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding Jul 7, 2023LaunchpadGPT: Language Model as Music Visualization Designer on Launchpad Sep 25, 2022Multi-modal Segment Assemblage Network for Ad Video Editing with Importance-Coherence Reward Jun 17, 2023LLMVA-GEBC: Large Language Model with Video Adapter for Generic Event Boundary Captioning Dec 29, 2023Video Understanding with Large Language Models: A Survey Jun 18, 2024Do More Details Always Introduce More Hallucinations in LVLM-based Image Captioning?May 29, 2025ZeroSep: Separate Anything in Audio with Zero Training Oct 8, 2025ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory Apr 7, 2021Mechanical Properties of Gradient Copper Nano-Gyroid Cellular Structures: A Molecular Dynamics Study May 4, 2023Caption Anything: Interactive Image Description with Diverse Multimodal Controls Sep 23, 2024AIM 2024 Challenge on Video Saliency Prediction: Methods and Results Oct 13, 2024MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models Oct 16, 2024Large Enhancement of Properties in Strained Lead-free Multiferroic Solid Solutions with Strong Deviation from Vegard's Law Jan 12, 2025Fatigue-free ferroelectricity in Hf0.5Zr0.5O2 ultrathin films via interfacial design Apr 15, 2025Harnessing the Computation Redundancy in ViTs to Boost Adversarial Transferability Feb 1, 2024GaussianStyle: Gaussian Head Avatar via StyleGAN