Showing 1–20 of 21 results
/ Date/ Name
Jan 3, 2019A Secure and Persistent Memory System for Non-volatile MemoryMar 26, 2025Injecting Adrenaline into LLM Serving: Boosting Resource Utilization and Throughput via Attention DisaggregationAug 9, 2020SEALing Neural Network Models in Secure Deep Learning AcceleratorsJun 15, 2025Serving Large Language Models on Huawei CloudMatrix384Mar 15, 2017Bandwidth-efficient Storage Services for Mitigating Side Channel AttackMay 8, 2019SAWL:A Self-adaptive Wear-leveling NVM Scheme for High Performance Storage SystemsMar 1, 2025Progressive Sparse Attention: Algorithm and System Co-design for Efficient Attention in LLM ServingJan 5, 2026RelayGR: Scaling Long-Sequence Generative Recommendation via Cross-Stage Relay-Race InferenceFeb 6, 2026DualMap: Enabling Both Cache Affinity and Load Balancing for Distributed LLM ServingMay 8, 2019A Scalable Learned Index Scheme in Storage SystemsSep 29, 2025SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM ServingFeb 6, 2026HyPER: Bridging Exploration and Exploitation for Scalable LLM Reasoning with Hypothesis Path Expansion and ReductionJun 14, 2025Efficient Unified Caching for Accelerating Heterogeneous AI WorkloadsJan 24, 2023FUSEE: A Fully Memory-Disaggregated Key-Value Store (Extended Version)Sep 19, 2023Ditto: An Elastic and Adaptive Memory-Disaggregated Caching SystemJul 21, 2023Transactional Indexes on (RDMA or CXL-based) Disaggregated Memory with Repairable TransactionMar 23, 2024Cost-Efficient Large Language Model Serving for Multi-turn Conversations with CachedAttentionJan 4, 2025AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM InferenceDec 18, 2025Lotus: Optimizing Disaggregated Transactions with Disaggregated LocksAug 4, 2025Prefill-Decode Aggregation or Disaggregation? Unifying Both for Goodput-Optimized LLM Serving