Showing 1–16 of 16 results
/ Date/ Name
Mar 19, 2022No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless ComputingMay 6, 2024Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier MemoryDec 15, 2022Characterizing Off-path SmartNIC for Accelerating Distributed SystemsMay 20, 2024PhoenixOS: Concurrent OS-level GPU Checkpoint and Restore with Validated SpeculationJul 21, 2023Transactional Indexes on (RDMA or CXL-based) Disaggregated Memory with Repairable TransactionDec 29, 2021KRCORE: a microsecond-scale RDMA control plane for elastic computingJan 24, 2024Characterizing Network Requirements for GPU API Remoting in AI ApplicationsJun 3, 2025KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud ProviderMar 16, 2026LMetric: Simple is Better - Multiplication May Be All You Need for LLM Request SchedulingMay 23, 2025DecLock: A Case of Decoupled Locking for Disaggregated MemoryDec 24, 2024KunServe: Parameter-centric Memory Management for Efficient Memory Overloading Handling in LLM ServingFeb 16, 2025Enabling Efficient Transaction Processing on CXL-Based Memory SharingDec 23, 2024BLITZSCALE: Fast and Live Large Model Autoscaling with O(1) Host CachingNov 20, 2025Fast LLM Post-training via Decoupled and Fastest-of-N SpeculationMay 23, 2025DiFache: Efficient and Scalable Caching on Disaggregated Memory using Decentralized CoherenceMar 6, 2026Efficient Vector Search in the Wild: One Model for Multi-K Queries