Showing 1–10 of 10 results
/ Date/ Name
Mar 8, 2026DualSpec: Accelerating Deep Research Agents via Dual-Process Action SpeculationAug 19, 2024AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE InferenceAug 26, 2023Memory-aware Scheduling for Complex Wired Networks with Iterative Graph OptimizationFeb 21, 2024ProPD: Dynamic Token Tree Pruning and Generation for LLM Parallel DecodingApr 8, 2025HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE InferenceOct 12, 2024PrivQuant: Communication-Efficient Private Inference with Quantized Network/Protocol Co-OptimizationSep 11, 2025HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory ProcessingFeb 6, 2026HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and ReductionJul 24, 2025SpecASR: Accelerating LLM-based Automatic Speech Recognition via Speculative DecodingAug 20, 2025H2EAL: Hybrid-Bonding Architecture with Hybrid Sparse Attention for Efficient Long-Context LLM Inference