"au:"Abhinav Bhatele"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Abhinav Bhatele"" — arXiv2 Search

Showing 1–20 of 39 results

/ Date/ Name

Apr 29, 2024Performance-Aligned LLMs for Generating Fast Code May 22, 2023A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs Mar 11, 2023A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training Feb 12, 2025Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers Jun 9, 2025Simulating nationwide coupled disease and fear spread in an agent-based model Apr 3, 2026Communication-free Sampling and 4D Hybrid Parallelism for Scalable Mini-batch GNN Training Mar 9, 2026Speculating Experts Accelerates Inference for Mixture-of-Experts Oct 25, 2021AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning Feb 10, 2023Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training Jun 19, 2023Pipit: Scripting the analysis of parallel execution traces Jul 7, 2020Analytics of Longitudinal System Monitoring Data for Performance Prediction Dec 19, 2024HPC-Coder-V2: Studying Code LLMs Across Low-Resource Parallel Languages Dec 11, 2023ML-based Modeling to Predict I/O Performance on Different Storage Sub-systems May 7, 2025Plexus: Taming Billion-edge Graphs with 3D Parallel Full-graph GNN Training Dec 17, 2025Optimizing Agentic Language Model Inference via Speculative Tool Calls Apr 25, 2025The Big Send-off: Scalable and Performant Collectives for Deep Learning Oct 18, 2023Jorge: Approximate Preconditioning for GPU-efficient Second-order Optimization Jan 23, 2024Automated Programmatic Performance Analysis of Parallel Programs Jan 23, 2024Can Large Language Models Write Parallel Code?Jun 26, 2025ParEval-Repo: A Benchmark Suite for Evaluating LLMs with Repository-level HPC Translation Tasks