"au:"Jidong Zhai"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Jidong Zhai"" — arXiv2 Search

Showing 1–20 of 32 results

/ Date/ Name

Mar 11, 2025FastCache: Optimizing Multimodal LLM Serving through Lightweight KV-Cache Compression Framework Sep 27, 2025A Flexible Programmable Pipeline Parallelism Framework for Efficient DNN Training Feb 18, 2026FlowPrefill: Decoupling Preemption from Prefill Scheduling Granularity to Mitigate Head-of-Line Blocking in LLM Serving Feb 15, 2022Suppressing ZZ Crosstalk of Quantum Computers through Pulse and Scheduling Co-Optimization May 24, 2022GraphQ IR: Unifying the Semantic Parsing of Graph Query Languages with One Intermediate Representation Mar 24, 2021FastMoE: A Fast Mixture-of-Expert Training System Mar 24, 2025Jenga: Effective Memory Management for Serving LLM with Heterogeneity Jun 17, 2025HARMONY: A Scalable Distributed Vector Database for High-Throughput Approximate Nearest Neighbor Search Nov 8, 2025Lethe: Layer- and Time-Adaptive KV Cache Pruning for Reasoning-Intensive LLM Serving Dec 15, 2025FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection Feb 28, 2026Jano: Adaptive Diffusion Generation with Early-stage Convergence Awareness Apr 21, 2026UniEP: Unified Expert-Parallel MoE MegaKernel for LLM Training Jun 13, 2021G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression Mar 26, 2022A Roadmap for Big Model Oct 4, 2022Unveiling the Black Box of PLMs with Semantic Anchors: Towards Interpretable Neural Semantic Parsing May 16, 2017Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC Jul 11, 2023PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR Feb 20, 2025GS-Cache: A GS-Cache Inference Framework for Large-scale Gaussian Splatting Models May 12, 2025SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models Aug 8, 2025GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models