Showing 1–12 of 12 results
/ Date/ Name
Jan 26, 2022Self-supervised 3D Semantic Representation Learning for Vision-and-Language NavigationJan 26, 2022An Automated Question-Answering Framework Based on Evolution AlgorithmSep 16, 2021Knowledge-based Embodied Question AnsweringApr 30, 2020Towards Embodied Scene DescriptionOct 6, 2022Embodied Referring Expression for Manipulation Question Answering in Interactive EnvironmentDec 8, 2022OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist ModelsSep 18, 2024Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any ResolutionDec 1, 2022Mixed Neural Voxels for Fast Multi-view Video SynthesisAug 24, 2023Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondSep 28, 2023Qwen Technical ReportOct 2, 2024A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image GenerationJul 15, 2024Qwen2 Technical Report