Showing 1–18 of 18 results
/ Date/ Name
Feb 16, 2026Efficient Multi-round LLM Inference over Disaggregated ServingJan 19, 2026Unleashing Efficient Asynchronous RL Post-Training via Staleness-Constrained Rollout CoordinationOct 3, 2025TridentServe: A Stage-level Serving System for Diffusion PipelinesSep 1, 2025LobRA: Multi-tenant Fine-tuning over Heterogeneous DataMay 19, 2025Thinking Short and Right Over Thinking Long: Serving LLM Reasoning Efficiently and AccuratelyApr 29, 2025Hetu v2: A General and Scalable Deep Learning System with Hierarchical and Heterogeneous Single Program Multiple Data AnnotationsDec 10, 2024Hydraulis: Balancing Large Transformer Model Training via Co-designing Parallel Strategies and Data AssignmentDec 2, 2024FlexSP: Accelerating Large Language Model Training via Flexible Sequence ParallelismNov 13, 2024LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive HashingOct 17, 2024Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model ParallelizationSep 5, 2024Spindle: Efficient Distributed Training of Multi-Task Large Models via Wavefront SchedulingJul 16, 2024MEMO: Fine-grained Tensor Management For Ultra-long Context LLM TrainingJul 1, 2024PQCache: Product Quantization-based KVCache for Long Context LLM InferenceFeb 29, 2024Retrieval-Augmented Generation for AI-Generated Content: A SurveyMay 27, 2023Accelerating Text-to-Image Editing via Cache-Enabled Sparse Diffusion InferenceJul 29, 2022Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local UpdatesJun 16, 2022BlindFL: Vertical Federated Machine Learning without Peeking into Your DataJul 3, 2019An Experimental Evaluation of Large Scale GBDT Systems