"au:"Xingda Wei"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Xingda Wei"" — arXiv2 Search

Showing 1–16 of 16 results

/ Date/ Name

Mar 19, 2022No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing May 6, 2024Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory Dec 15, 2022Characterizing Off-path SmartNIC for Accelerating Distributed Systems May 20, 2024PhoenixOS: Concurrent OS-level GPU Checkpoint and Restore with Validated Speculation Jul 21, 2023Transactional Indexes on (RDMA or CXL-based) Disaggregated Memory with Repairable Transaction Dec 29, 2021KRCORE: a microsecond-scale RDMA control plane for elastic computing Jan 24, 2024Characterizing Network Requirements for GPU API Remoting in AI Applications Jun 3, 2025KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider Mar 16, 2026LMetric: Simple is Better - Multiplication May Be All You Need for LLM Request Scheduling May 23, 2025DecLock: A Case of Decoupled Locking for Disaggregated Memory Dec 24, 2024KunServe: Parameter-centric Memory Management for Efficient Memory Overloading Handling in LLM Serving Feb 16, 2025Enabling Efficient Transaction Processing on CXL-Based Memory Sharing Dec 23, 2024BLITZSCALE: Fast and Live Large Model Autoscaling with O(1) Host Caching Nov 20, 2025Fast LLM Post-training via Decoupled and Fastest-of-N Speculation May 23, 2025DiFache: Efficient and Scalable Caching on Disaggregated Memory using Decentralized Coherence Mar 6, 2026Efficient Vector Search in the Wild: One Model for Multi-K Queries