cs.PF — arXiv2

Apr 23, 2026Large-Scale Data Parallelization of Product Quantization and Inverted Indexing Using Dask

Apr 23, 2026SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM Inference

Apr 20, 2026Lagrange Index based Scheduling for Minimizing Age of Updates from Heterogeneous Sources

Apr 15, 2026Exploiting Scheduling Flexibility via State-Based Scheduling When Guaranteeing Worst-Case Services

Feb 19, 2026Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUs

Aug 28, 2025Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor Cores

Jan 5, 2025sTiles: An Accelerated Computational Framework for Sparse Factorizations of Structured Matrices

Dec 10, 2024A clustering aggregation algorithm on neutral-atoms and annealing quantum processors

Nov 13, 2024Achieving Consistent and Comparable CPU Evaluation

Oct 10, 2024Plug-and-Play Performance Estimation for LLM Services without Relying on Labeled Data

Jun 14, 2024A Comparison of the Performance of the Molecular Dynamics Simulation Package GROMACS Implemented in the SYCL and CUDA Programming Models

Mar 1, 2024An Experimental Study of Low-Latency Video Streaming over 5G

Oct 20, 2023Exploring the Potential of Flexible 8-bit Format: Design and Algorithm

Dec 2, 2022MMBench: Benchmarking End-to-End Multi-modal DNNs and Understanding Their Hardware-Software Implications

Nov 1, 2022Towards Maximizing Nonlinear Delay Sensitive Rewards in Queuing Systems

Sep 23, 2022Faith: An Efficient Framework for Transformer Verification on GPUs

Aug 8, 2022Constructing Large-Scale Real-World Benchmark Datasets for AIOps

Aug 8, 2022FourCastNet: Accelerating Global High-Resolution Weather Forecasting using Adaptive Fourier Neural Operators

May 19, 2022Extract Dynamic Information To Improve Time Series Modeling: a Case Study with Scientific Workflow

May 11, 2022Access Trends of In-network Cache for Scientific Data