Showing 1–20 of 22 results
/ Date/ Name
Feb 24, 2021Density Sketches for Sampling and EstimationMay 26, 2023Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test TimeOct 7, 2025vAttention: Verified Sparse AttentionDec 19, 2024HashAttention: Semantic Sparsity for Faster InferenceNov 3, 2023Heterogeneous federated collaborative filtering using FAIR: Federated Averaging in Random SubspacesAug 4, 2021Random Offset Block Embedding Array (ROBE) for CriteoTB Benchmark MLPerf DLRM Model : 1000$\times$ Compression and 3.1$\times$ Faster InferenceFeb 24, 2021Semantically Constrained Memory Allocation (SCMA) for Embedding in Efficient Recommendation SystemsJul 21, 2022Efficient model compression with Random Operation Access Specific Tile (ROAST) hashingJul 21, 2022The trade-offs of model size in large recommendation models : A 10000 $\times$ compressed criteo-tb DLRM model (100 GB parameters to mere 10MB)Oct 17, 2023In defense of parameter sharing for model-compressionFeb 6, 2026SOCKET: SOft Collision Kernel EsTimator for Sparse AttentionOct 29, 2020Active Sampling Count Sketch (ASCS) for Online Sparse Estimation of a Trillion Scale Covariance MatrixOct 8, 2024Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM AdaptationOct 7, 2025Barbarians at the Gate: How AI is Upending Systems ResearchFeb 26, 2021Beyond Convolutions: A Novel Deep Learning Approach for Raw Seismic Data IngestionFeb 12, 2025The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic TasksSep 1, 2015Program Synthesis using Natural LanguageFeb 6, 2025vCache: Verified Semantic Prompt CachingDec 16, 2025Let the Barbarians In: How AI Can Accelerate Systems Performance ResearchJan 2, 2021Smart Car Features using Embedded Systems and IoT