Showing 1–20 of 30 results
/ Date/ Name
Jan 13, 2015Deep Image: Scaling up Image RecognitionFeb 19, 2019Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 MinutesJun 4, 2023Proteus: Simulating the Performance of Distributed DNN TrainingJun 12, 2024DiTFastAttn: Attention Compression for Diffusion Transformer ModelsApr 2, 2024Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models BetterApr 16, 2025VGDFR: Diffusion-based Video Generation with Dynamic Latent Frame RateJun 4, 2024ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video GenerationMar 28, 2025DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion TransformersFeb 17, 2025DLFR-VAE: Dynamic Latent Frame Rate VAE for Video GenerationMay 28, 2024MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision QuantizationFeb 26, 2025AgentSociety Challenge: Designing LLM Agents for User Modeling and Recommendation on Web PlatformsMay 24, 2025PM-KVQ: Progressive Mixed-precision KV Cache Quantization for Long-CoT LLMsSep 3, 2021Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU DatacentersFeb 6, 2024LV-Eval: A Balanced Long-Context Benchmark with 5 Length Levels Up to 256KJan 10, 2022A Simulation Platform for Multi-tenant Machine Learning Services on Thousands of GPUsApr 22, 2017Towards Distributed Machine Learning in Shared Clusters: A Dynamically-Partitioned ApproachJul 1, 2024Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference CostsSep 16, 2024CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context ScenariosDec 18, 2024E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage ModelingMay 25, 2024HETHUB: A Distributed Training System with Heterogeneous Cluster for Large-Scale Models