"au:"Pengfei Zuo"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Pengfei Zuo"" — arXiv2 Search

Showing 1–20 of 21 results

/ Date/ Name

Jan 3, 2019A Secure and Persistent Memory System for Non-volatile Memory Mar 26, 2025Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention Disaggregation Aug 9, 2020SEALing Neural Network Models in Secure Deep Learning Accelerators Jun 15, 2025Serving Large Language Models on Huawei CloudMatrix384 Mar 15, 2017Bandwidth-efficient Storage Services for Mitigating Side Channel Attack May 8, 2019SAWL:A Self-adaptive Wear-leveling NVM Scheme for High Performance Storage Systems Mar 1, 2025Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM Serving Jan 5, 2026RelayGR: Scaling Long-Sequence Generative Recommendation via Cross-Stage Relay-Race Inference Feb 6, 2026DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM Serving May 8, 2019A Scalable Learned Index Scheme in Storage Systems Sep 29, 2025SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving Feb 6, 2026HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and Reduction Jun 14, 2025Efficient Unified Caching for Accelerating Heterogeneous AI Workloads Jan 24, 2023FUSEE: A Fully Memory-Disaggregated Key-Value Store (Extended Version)Sep 19, 2023Ditto: An Elastic and Adaptive Memory-Disaggregated Caching System Jul 21, 2023Transactional Indexes on (RDMA or CXL-based) Disaggregated Memory with Repairable Transaction Mar 23, 2024Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttention Jan 4, 2025AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference Dec 18, 2025Lotus: Optimizing Disaggregated Transactions with Disaggregated Locks Aug 4, 2025Prefill-Decode Aggregation or Disaggregation? Unifying Both for Goodput-Optimized LLM Serving