"au:"Jiakun Fan"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Jiakun Fan"" — arXiv2 Search

Showing 1–6 of 6 results

/ Date/ Name

Jun 11, 2025SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving Jun 3, 2025APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUs Dec 18, 2025Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving Jan 15, 2026WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching Apr 8, 2026ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge--Cloud Speculative LLM Serving Feb 10, 2026AgentCgroup: Understanding and Controlling OS Resources of AI Agents