HERP: Hardware for Energy Efficient and Realtime DB Search and Cluster Expansion in Proteomics
cs.DB
/ Authors
/ Abstract
Database search and clustering are fundamental components of many data analytics problems, such as mass spectrometry-driven proteomics. Traditional full clustering and search algorithms suffer from high resource usage and long latencies. We introduce HERP, a lightweight incremental clustering method and a highly parallelizable database (DB) search platform that utilizes 3T2MTJ SOT-MRAM based CAM in 7nm technology for in-memory acceleration. A single hardware initialization using pre-clustered proteomics data allows for continuous DB searching and local re-clustering, providing a more practical and efficient alternative to clustering from scratch. Heuristics derived from the initial pre-clustered data guide the incremental process, accelerating clustering by 20x at a cost of 0.3% increase in clustering error where DB search results overlap by 96% with SOTA algorithms validating search quality. For a 131GB human genome proteomics dataset HERP setup requires 1.19mJ for 2M spectra while 1000 query search consumes only 1.1uJ at SOTA accuracy. Bucket-wise parallelization and query scheduling provides additional 100x speedup.