"au:"Yanhong Li"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Yanhong Li"" — arXiv2 Search

Showing 1–20 of 24 results

/ Date/ Name

Jun 30, 2025On the Predictive Power of Representation Dispersion in Language Models Dec 31, 2024Chunk-Distilled Language Modeling Dec 23, 2025Distilling to Hybrid Attention Models via KL-Guided Layer Selection Oct 21, 2025Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs Sep 2, 2023Self-Supervised Video Transformers for Isolated Sign Language Recognition Mar 25, 2025Context-Efficient Retrieval with Factual Decomposition Apr 14, 2024When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models Jan 16, 2012Wetting-induced budding of vesicles in contact with several aqueous phases Apr 14, 2025How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients Jul 2, 2025DIY-MKG: An LLM-Based Polyglot Language Learning System Apr 13, 2025SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing Workflow Oct 31, 2024What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective Oct 31, 2025OKBench: Democratizing LLM Evaluation with Fully Automated, On-Demand, Open Knowledge Benchmarking Feb 19, 2026Reverso: Efficient Time Series Foundation Models for Zero-shot Forecasting Mar 4, 2026Why Are Linear RNNs More Parallelizable?Mar 18, 2016The joint distribution of the Parisian ruin time and the number of claims until Parisian ruin in the classical risk model Apr 28, 2026Training Transformers as a Universal Computer Mar 18, 2016Number of claims and ruin time for a refracted risk process Mar 5, 2024Towards Training A Chinese Large Language Model for Anesthesiology Dec 14, 2023Learning from Polar Representation: An Extreme-Adaptive Model for Long-Term Time Series Forecasting