arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Yali Wang"" — arXiv2 Search
Showing 1–7 of 7 results
/ Date
/ Name
Feb 15, 2026
UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model
Jun 26, 2024
EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation
Jun 12, 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Mar 22, 2024
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
Mar 11, 2024
VideoMamba: State Space Model for Efficient Video Understanding
Dec 6, 2022
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Nov 24, 2021
MorphMLP: An Efficient MLP-Like Backbone for Spatial-Temporal Representation Learning