Showing 1–20 of 28 results
/ Date/ Name
Feb 21, 2024Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative DecodingSep 29, 2025InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long AdaptationJul 15, 2023CA-LoRA: Adapting Existing LoRA for Compressed LLMs to Enable Efficient Multi-Tasking on Personal DevicesFeb 20, 2025FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative SamplingAug 3, 2024MiniCPM-V: A GPT-4V Level MLLM on Your PhoneDec 5, 2024Densing Law of LLMsJul 11, 2025BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation SparsityJun 9, 2025MiniCPM4: Ultra-Efficient LLMs on End DevicesSep 24, 2025BurstEngine: an Efficient Distributed Framework for Training Transformers on Extremely Long Sequences of over 1M TokensJan 21, 2026The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language ModelsNov 3, 2021OpenPrompt: An Open-source Framework for Prompt-learningApr 9, 2024MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training StrategiesJan 29, 2026Spava: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate AttentionMar 26, 2022A Roadmap for Big ModelSep 4, 2024Configurable Foundation Models: Building LLMs from a Modular PerspectiveSep 18, 2024Enabling Real-Time Conversations with Minimal Training CostsJul 5, 2023OpenDelta: A Plug-and-play Library for Parameter-efficient Adaptation of Pre-trained ModelsOct 5, 2023Predicting Emergent Abilities with Infinite Resolution EvaluationJun 5, 2024Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model TrainingMar 14, 2024BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences