Showing 1–20 of 28 results
/ Date/ Name
Jan 18, 2021DFOGraph: An I/O- and Communication-Efficient System for Distributed Fully-out-of-Core Graph ProcessingMar 17, 2022Canvas: Isolated and Adaptive Swapping for Multi-Applications on Remote MemoryJan 1, 2025Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-AttentionApr 21, 2021GraphTheta: A Distributed Graph Neural Network Learning System With Flexible Training StrategyMar 17, 2025A Multi-Power Law for Loss Curve Prediction Across Learning Rate SchedulesAug 2, 2022Toward 6G TK$μ$ Extreme Connectivity: Architecture, Key Technologies and ExperimentsSep 20, 2020TADOC: Text Analytics Directly on CompressionOct 13, 2019LiveGraph: A Transactional Graph Storage System with Purely Sequential Adjacency List ScansAug 17, 2020AIPerf: Automated machine learning as an AI-HPC benchmarkSep 10, 2024KAG: Boosting LLMs in Professional Domains via Knowledge Augmented GenerationNov 17, 2025Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear RegressionNov 24, 2025How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM PretrainingApr 28, 2022Mat2Stencil: A Modular Matrix-Based DSL for Explicit and Implicit Matrix-Free PDE Solvers on Structured GridNov 15, 2017Bridging the Gap Between Neural Networks and Neuromorphic Hardware with A Neural Network CompilerJul 24, 2023PUMA: Secure Inference of LLaMA-7B in Five MinutesMar 7, 2026Making LLMs Optimize Multi-Scenario CUDA Kernels Like ExpertsApr 2, 2020RisGraph: A Real-Time Streaming System for Evolving Graphs to Support Sub-millisecond Per-update Analysis at Millions Ops/sJul 20, 2022Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower LossOct 29, 2015WarpLDA: a Cache Efficient O(1) Algorithm for Latent Dirichlet AllocationOct 8, 2016SaberLDA: Sparsity-Aware Learning of Topic Models on GPUs