arXiv2
Search
Dark
/ Date
/ Name
Aa
W
/ Date
/ Name
"au:"Jiakun Fan"" — arXiv2 Search
Showing 1–6 of 6 results
/ Date
/ Name
Jun 11, 2025
SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving
Jun 3, 2025
APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUs
Dec 18, 2025
Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM Serving
Jan 15, 2026
WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching
Apr 8, 2026
ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge--Cloud Speculative LLM Serving
Feb 10, 2026
AgentCgroup: Understanding and Controlling OS Resources of AI Agents