Showing 1–20 of 59 results
/ Date/ Name
Feb 4, 2022MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural NetworksOct 29, 2020Accordion: Adaptive Gradient Communication via Critical Learning Regime IdentificationJun 4, 2018Learning a Code: Machine Learning for Approximate Non-Linear Coded ComputationApr 4, 2026Minos: Systematically Classifying Performance and Power Characteristics of GPU Workloads on HPC ClustersMar 10, 2022LlamaTune: Sample-Efficient DBMS Configuration TuningOct 11, 2019Blink: Fast and Generic Collectives for Distributed MLFeb 4, 2025LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language ModelsFeb 2, 2021AutoFreeze: Automatically Freezing Model Blocks to Accelerate Fine-tuningOct 6, 2020Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with CameoSep 30, 2022Shockwave: Fair and Efficient Cluster Scheduling for Dynamic Adaptation in Machine LearningFeb 20, 2017Hemingway: Modeling Distributed Optimization AlgorithmsOct 29, 2016KeystoneML: Optimizing Pipelines for Large-Scale Advanced AnalyticsJan 30, 2025Scaling Inference-Efficient Language ModelsMay 2, 2019Parity Models: A General Framework for Coding-Based Resilience in ML InferenceJan 6, 2023Does compressing activations help model parallel training?Feb 13, 2017Occupy the Cloud: Distributed Computing for the 99%Aug 23, 2022Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich SystemsJan 20, 2021Marius: Learning Massive Graph Embeddings on a Single MachineFeb 24, 2022BagPipe: Accelerating Deep Recommendation Model TrainingAug 21, 2024PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters