Showing 21–32 of 32 results
/ Date/ Name
Jun 3, 2025APEX: Asynchronous Parallel CPU-GPU Execution for Online LLM Inference on Constrained GPUsDec 18, 2025Taming the Memory Footprint Crisis: System Design for Production Diffusion LLM ServingApr 8, 2026ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge--Cloud Speculative LLM ServingApr 3, 2015ALEA: Fine-grain Energy Profiling with Basic Block SamplingOct 27, 2017Power Modelling for Heterogeneous Cloud-Edge Data CentersMay 13, 2016Energy Optimization of Memory Intensive Parallel workloadsDec 30, 2014Methods and Metrics for Fair Server Assessment under Real-Time Financial WorkloadsJun 14, 2016BDDT-SCC: A Task-parallel Runtime for Non Cache-Coherent MulticoresSep 12, 2017ENORM: A Framework For Edge NOde Resource ManagementNov 14, 2025DiffPro: Joint Timestep and Layer-Wise Precision Optimization for Efficient Diffusion InferenceMay 6, 2025MARCO: Multi-Agent Code Optimization with Real-Time Knowledge Integration for High-Performance ComputingJan 15, 2026WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching