Showing 1–20 of 40 results
/ Date/ Name
Apr 23, 2026Large-Scale Data Parallelization of Product Quantization and Inverted Indexing Using DaskApr 23, 2026SparKV: Overhead-Aware KV Cache Loading for Efficient On-Device LLM InferenceApr 20, 2026Lagrange Index based Scheduling for Minimizing Age of Updates from Heterogeneous SourcesApr 15, 2026Exploiting Scheduling Flexibility via State-Based Scheduling When Guaranteeing Worst-Case ServicesFeb 19, 2026Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUsAug 28, 2025Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor CoresJan 5, 2025sTiles: An Accelerated Computational Framework for Sparse Factorizations of Structured MatricesDec 10, 2024A clustering aggregation algorithm on neutral-atoms and annealing quantum processorsNov 13, 2024Achieving Consistent and Comparable CPU EvaluationOct 10, 2024Plug-and-Play Performance Estimation for LLM Services without Relying on Labeled DataJun 14, 2024A Comparison of the Performance of the Molecular Dynamics Simulation Package GROMACS Implemented in the SYCL and CUDA Programming ModelsMar 1, 2024An Experimental Study of Low-Latency Video Streaming over 5GOct 20, 2023Exploring the Potential of Flexible 8-bit Format: Design and AlgorithmDec 2, 2022MMBench: Benchmarking End-to-End Multi-modal DNNs and Understanding Their Hardware-Software ImplicationsNov 1, 2022Towards Maximizing Nonlinear Delay Sensitive Rewards in Queuing SystemsSep 23, 2022Faith: An Efficient Framework for Transformer Verification on GPUsAug 8, 2022Constructing Large-Scale Real-World Benchmark Datasets for AIOpsAug 8, 2022FourCastNet: Accelerating Global High-Resolution Weather Forecasting using Adaptive Fourier Neural OperatorsMay 19, 2022Extract Dynamic Information To Improve Time Series Modeling: a Case Study with Scientific WorkflowMay 11, 2022Access Trends of In-network Cache for Scientific Data