Showing 1–20 of 27 results
/ Date/ Name
Jul 5, 2024Lazarus: Resilient and Elastic Training of Mixture-of-Experts ModelsMay 10, 2022Serving and Optimizing Machine Learning Workflows on Heterogeneous InfrastructuresMay 28, 2021Rethinking Lifelong Sequential Recommendation with Incremental Multi-Interest AttentionMay 28, 2021Linear-Time Self Attention with Codeword Histogram for Efficient RecommendationApr 4, 2025HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUsJan 17, 2024Computing in the Era of Large Generative Models: From Cloud-Native to AI-NativeNov 21, 2023AR Visualization System for Ship Detection and Recognition Based on AIMar 12, 2025Prompt Inversion Attack against Collaborative Inference of Large Language ModelsSep 19, 2025RLinf: Flexible and Efficient Large-scale Reinforcement Learning via Macro-to-Micro Flow TransformationApr 14, 2023Remote Procedure Call as a Managed System ServiceOct 2, 2024ConServe: Fine-Grained GPU Harvesting for LLM Online and Offline Co-ServingApr 12, 2025DynaServe: Unified and Elastic Execution for Dynamic Disaggregated LLM ServingJan 3, 2026Curator: Efficient Vector Search with Low-Selectivity FiltersOct 22, 2025RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMsAug 17, 2021How Powerful is Graph Convolution for Recommendation?Nov 22, 2021Poisoning Attacks to Local Differential Privacy Protocols for Key-Value DataJun 29, 2024VcLLM: Video Codecs are Secretly Tensor CodecsJun 9, 2025LEANN: A Low-Storage Vector IndexOct 16, 2025Cross-Scenario Unified Modeling of User Interests at Billion ScaleApr 24, 2025An Extensible Software Transport Layer for GPU Networking