"au:"Zhenfang Chen"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Zhenfang Chen"" — arXiv2 Search

Showing 1–20 of 39 results

/ Date/ Name

Jun 6, 2019Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video Feb 9, 2024ContPhy: Continuum Physical Concept Learning and Reasoning from Videos Aug 2, 2024Compositional Physical Reasoning of Objects and Events from Videos Jan 25, 2020Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video Nov 8, 2023GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs Oct 10, 2023TextPSG: Panoptic Scene Graph Generation from Textual Descriptions Oct 28, 2021Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language Sep 2, 2021Deep Face Video Inpainting via UV Mapping Jan 12, 2023See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning Apr 6, 2023Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention Jun 7, 2023ModuleFormer: Modularity Emerges from Mixture-of-Experts Jul 29, 2024FlexAttention for Efficient High-Resolution Vision-Language Models Apr 22, 2025Vidi: Large Multimodal Models for Video Understanding and Editing Jul 24, 20233D-LLM: Injecting the 3D World into Large Language Models Mar 9, 2023Planning with Large Language Models for Code Generation Oct 9, 2023SALMON: Self-Alignment with Instructable Reward Models Oct 11, 2023Sparse Universal Transformer Nov 6, 2023CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding May 15, 2024SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge Jan 30, 2024Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble