"au:"Weilin Zhao"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Weilin Zhao"" — arXiv2 Search

Showing 1–20 of 28 results

/ Date/ Name

Feb 21, 2024Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding Sep 29, 2025InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation Jul 15, 2023CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal Devices Feb 20, 2025FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling Aug 3, 2024MiniCPM-V: A GPT-4V Level MLLM on Your Phone Dec 5, 2024Densing Law of LLMs Jul 11, 2025BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Jun 9, 2025MiniCPM4: Ultra-Efficient LLMs on End Devices Sep 24, 2025BurstEngine: an Efficient Distributed Framework for Training Transformers on Extremely Long Sequences of over 1M Tokens Jan 21, 2026The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Nov 3, 2021OpenPrompt: An Open-source Framework for Prompt-learning Apr 9, 2024MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies Jan 29, 2026Spava: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention Mar 26, 2022A Roadmap for Big Model Sep 4, 2024Configurable Foundation Models: Building LLMs from a Modular Perspective Sep 18, 2024Enabling Real-Time Conversations with Minimal Training Costs Jul 5, 2023OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained Models Oct 5, 2023Predicting Emergent Abilities with Infinite Resolution Evaluation Jun 5, 2024Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training Mar 14, 2024BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences