Ning Lin, Shaocong Wang, Yue Zhang, Yangu He, Kwunhang Wong, Arindam Basu, Dashan Shang, Xiaoming Chen, Zhongrui Wang
Deep neural networks (DNNs), such as the widely-used GPT-3 with billions of parameters, are often kept secret due to high training costs and privacy concerns surrounding the data used to train them. Previous approaches to securing DNNs typically require expensive circuit redesign, resulting in additional overheads such as increased area, energy consumption, and latency. To address these issues, we propose a novel hardware-software co-design approach for DNN intellectual property (IP) protection that capitalizes on the inherent aging characteristics of circuits and a novel differential orientation fine-tuning (DOFT) to ensure effective protection. Hardware-wise, we employ random aging to produce authorized chips. This process circumvents the need for chip redesign, thereby eliminating any additional hardware overhead during the inference procedure of DNNs. Moreover, the authorized chips demonstrate a considerable disparity in DNN inference performance when compared to unauthorized chips. Software-wise, we propose a novel DOFT, which allows pre-trained DNNs to maintain their original accuracy on authorized chips with minimal fine-tuning, while the model's performance on unauthorized chips is reduced to random guessing. Extensive experiments on various models, including MLP, VGG, ResNet, Mixer, and SwinTransformer, with lightweight binary and practical multi-bit weights demonstrate that the proposed method achieves effective IP protection, with only 10\% accuracy on unauthorized chips, while preserving nearly the original accuracy on authorized ones.
Bo Wang, Jiehong Lin, Chenzhi Liu, Xinting Hu, Yifei Yu, Tianjia Liu, Zhongrui Wang, Xiaojuan Qi
We present MG-Nav (Memory-Guided Navigation), a dual-scale framework for zero-shot visual navigation that unifies global memory-guided planning with local geometry-enhanced control. At its core is the Sparse Spatial Memory Graph (SMG), a compact, region-centric memory where each node aggregates multi-view keyframe and object semantics, capturing both appearance and spatial structure while preserving viewpoint diversity. At the global level, the agent is localized on SMG and a goal-conditioned node path is planned via an image-to-instance hybrid retrieval, producing a sequence of reachable waypoints for long-horizon guidance. At the local level, a navigation foundation policy executes these waypoints in point-goal mode with obstacle-aware control, and switches to image-goal mode when navigating from the final node towards the visual target. To further enhance viewpoint alignment and goal recognition, we introduce VGGT-adapter, a lightweight geometric module built on the pre-trained VGGT model, which aligns observation and goal features in a shared 3D-aware space. MG-Nav operates global planning and local control at different frequencies, using periodic re-localization to correct errors. Experiments on HM3D Instance-Image-Goal and MP3D Image-Goal benchmarks demonstrate that MG-Nav achieves state-of-the-art zero-shot performance and remains robust under dynamic rearrangements and unseen scene conditions.
Xiaoshan Wu, Xiaoyang Lyu, Yifei Yu, Bo Wang, Zhongrui Wang, Xiaojuan Qi
Dense semantic segmentation in dynamic environments is fundamentally limited by the low-frame-rate (LFR) nature of standard cameras, which creates critical perceptual gaps between frames. To solve this, we introduce Anytime Interframe Semantic Segmentation: a new task for predicting segmentation at any arbitrary time using only a single past RGB frame and a stream of asynchronous event data. This task presents a core challenge: how to robustly propagate dense semantic features using a motion field derived from sparse and often noisy event data, all while mitigating feature degradation in highly dynamic scenes. We propose LiFR-Seg, a novel framework that directly addresses these challenges by propagating deep semantic features through time. The core of our method is an uncertainty-aware warping process, guided by an event-driven motion field and its learned, explicit confidence. A temporal memory attention module further ensures coherence in dynamic scenarios. We validate our method on the DSEC dataset and a new high-frequency synthetic benchmark (SHF-DSEC) we contribute. Remarkably, our LFR system achieves performance (73.82% mIoU on DSEC) that is statistically indistinguishable from an HFR upper-bound (within 0.09%) that has full access to the target frame. This work presents a new, efficient paradigm for achieving robust, high-frame-rate perception with low-frame-rate hardware. Project Page: https://candy-crusher.github.io/LiFR_Seg_Proj/#; Code: https://github.com/Candy-Crusher/LiFR-Seg.git.
Ziqing Wang, Yuetong Fang, Jiahang Cao, Qiang Zhang, Zhongrui Wang, Renjing Xu
The combination of Spiking Neural Networks (SNNs) and Transformers has attracted significant attention due to their potential for high energy efficiency and high-performance nature. However, existing works on this topic typically rely on direct training, which can lead to suboptimal performance. To address this issue, we propose to leverage the benefits of the ANN-to-SNN conversion method to combine SNNs and Transformers, resulting in significantly improved performance over existing state-of-the-art SNN models. Furthermore, inspired by the quantal synaptic failures observed in the nervous system, which reduces the number of spikes transmitted across synapses, we introduce a novel Masked Spiking Transformer (MST) framework that incorporates a Random Spike Masking (RSM) method to prune redundant spikes and reduce energy consumption without sacrificing performance. Our experimental results demonstrate that the proposed MST model achieves a significant reduction of 26.8% in power consumption when the masking ratio is 75% while maintaining the same level of performance as the unmasked model.
Shijie Wang, Xi Chen, Chao Zhao, Yuxin Kong, Baojun Lin, Yongyi Wu, Zhaozhao Bi, Ziyi Xuan, Tao Li, Yuxiang Li, Wei Zhang, En Ma, Zhongrui Wang, Wei Ma
Abstract: Bionic learning with fused sensing, memory and processing functions outperforms artificial neural networks running on silicon chips in terms of efficiency and footprint. However, digital hardware implementation of bionic learning suffers from device heterogeneity in sensors and processing cores, which incurs large hardware, energy and time overheads. Here, we present a universal solution to simultaneously perform multi-modal sensing, memory and processing using organic electrochemical transistors with designed architecture and tailored channel morphology, selective ion injection into the crystalline/amorphous regions. The resultant device work as either a volatile receptor that shows multi-modal sensing, or a non-volatile synapse that features record-high 10-bit analog states, low switching stochasticity and good retention without the integration of any extra devices. Homogeneous integration of such devices enables bionic learning functions such as conditioned reflex and real-time cardiac disease diagnose via reservoir computing, illustrating the promise for future smart edge health informatics.
Hao Wang, Lingfeng Zhang, Erjia Xiao, Xin Wang, Zhongrui Wang, Renjing Xu
Non-invasive mobile electroencephalography (EEG) acquisition systems have been utilized for long-term monitoring of seizures, yet they suffer from limited battery life. Resistive random access memory (RRAM) is widely used in computing-in-memory(CIM) systems, which offers an ideal platform for reducing the computational energy consumption of seizure prediction algorithms, potentially solving the endurance issues of mobile EEG systems. To address this challenge, inspired by neuronal mechanisms, we propose a RRAM-based bio-inspired circuit system for correlation feature extraction and seizure prediction. This system achieves a high average sensitivity of 91.2% and a low false positive rate per hour (FPR/h) of 0.11 on the CHB-MIT seizure dataset. The chip under simulation demonstrates an area of approximately 0.83 mm2 and a latency of 62.2 μs. Power consumption is recorded at 24.4 mW during the feature extraction phase and 19.01 mW in the seizure prediction phase, with a cumulative energy consumption of 1.515 μJ for a 3-second window data processing, predicting 29.2 minutes ahead. This method exhibits an 81.3% reduction in computational energy relative to the most efficient existing seizure prediction approaches, establishing a new benchmark for energy efficiency.
Bo Xu, Zefeng Huang, Yuetong Fang, Xin Wang, Bojun Cheng, Shaoliang Yu, Zhongrui Wang, Renjing Xu
Optical neural networks (ONNs) perform extensive computations using photons instead of electrons, resulting in passively energy-efficient and low-latency computing. Among various ONNs, the diffractive optical neural networks (DONNs) particularly excel in energy efficiency, bandwidth, and parallelism, therefore attract considerable attention. However, their performance is limited by the inherent constraints of traditional frame-based sensors, which process and produce dense and redundant information at low operating frequency. Inspired by the spiking neurons in human neural system, which utilize a thresholding mechanism to transmit information sparsely and efficiently, we propose integrating a threshold-locking method into neuromorphic vision sensors to generate sparse and binary information, achieving microsecond-level accurate perception similar to human spiking neurons. By introducing novel Binary Dual Adaptive Training (BAT) and Optically Parallel Mixture of Experts (OPMoE) inference methods, the high-speed, spike-based diffractive optical neural network (S2NN) demonstrates an ultra-fast operating speed of 3649 FPS, which is 30 fold faster than that of reported DONNs, delivering a remarkable computational speed of 417.96 TOPS and a system energy efficiency of 12.6 TOPS/W. Our work demonstrates the potential of incorporating neuromorphic architecture to facilitate optical neural network applications in real-world scenarios for both low-level and high-level machine vision tasks.
Hao Wang, Erjia Xiao, Wenbo Mu, Songhuan He, Zhongyi Ni, Lingfeng Zhang, Xiaokun Zhan, Yifei Cui, Jinguo Liu, Cheng Wang, Zhongrui Wang, Renjing Xu
Due to the high sensitivity of qubits to environmental noise, which leads to decoherence and information loss, active quantum error correction(QEC) is essential. Surface codes represent one of the most promising fault-tolerant QEC schemes, but they require decoders that are accurate, fast, and scalable to large-scale quantum platforms. In all types of decoders, fully neural network-based high-level decoders offer decoding thresholds that surpass baseline decoder-Minimum Weight Perfect Matching (MWPM), and exhibit strong scalability, making them one of the ideal solutions for addressing surface code challenges. However, current fully neural network-based high-level decoders can only operate serially and do not meet the current latency requirements (below 440 ns). To address these challenges, we first propose a parallel fully feedforward neural network (FFNN) high-level surface code decoder, and comprehensively measure its decoding performance on a computing-in-memory (CIM) hardware simulation platform. With the currently available hardware specifications, our work achieves a decoding threshold of 14.22%, surpassing the MWPM baseline of 10.3%, and achieves high pseudo-thresholds of 10.4%, 11.3%, 12%, and 11.6% with decoding latencies of 197.03 ns, 234.87 ns, 243.73 ns, and 251.65 ns for distances of 3, 5, 7 and 9, respectively. The impact of hardware parameters and non-idealities on these results is discussed, and the hardware simulation results are extrapolated to a 4K quantum cryogenic environment.
Songqi Wang, Yue Zhang, Jia Chen, Xinyuan Zhang, Yi Li, Ning Lin, Yangu He, Jichang Yang, Yingjie Yu, Yi Li, Zhongrui Wang, Xiaojuan Qi, Han Wang
The human brain simultaneously optimizes synaptic weights and topology by growing, pruning, and strengthening synapses while performing all computation entirely in memory. In contrast, modern artificial-intelligence systems separate weight optimization from topology optimization and depend on energy-intensive von Neumann architectures. Here, we present a software-hardware co-design that bridges this gap. On the algorithmic side, we introduce a real-time dynamic weight-pruning strategy that monitors weight similarity during training and removes redundancies on the fly, reducing operations by 26.80% on MNIST and 59.94% on ModelNet10 without sacrificing accuracy (91.44% and 77.75%, respectively). On the hardware side, we fabricate a reconfigurable, fully digital compute-in-memory (CIM) chip based on 180 nm one-transistor-one-resistor (1T1R) RRAM arrays. Each array embeds flexible Boolean logic (NAND, AND, XOR, OR), enabling both convolution and similarity evaluation inside memory and eliminating all ADC/DAC overhead. The digital design achieves zero bit-error, reduces silicon area by 72.30% and overall energy by 57.26% compared to analogue RRAM CIM, and lowers energy by 75.61% and 86.53% on MNIST and ModelNet10, respectively, relative to an NVIDIA RTX 4090. Together, our co-design establishes a scalable brain-inspired paradigm for adaptive, energy-efficient edge intelligence in the future.
Zijian Ye, Wei Huang, Yifei Yu, Tianhe Ren, Zhongrui Wang, Xiaojuan Qi
Large language models (LLMs) demonstrate remarkable performance but face substantial computational and memory challenges that limit their practical deployment. Quantization has emerged as a promising solution; however, its effectiveness is often limited by quantization errors arising from weight distributions that are not quantization-friendly and the presence of activation outliers. To address these challenges, we introduce DBellQuant, an innovative post-training quantization (PTQ) framework that achieves nearly 1-bit weight compression and 6-bit activation quantization with minimal performance degradation. DBellQuant uses Learnable Transformation for Dual-Bell (LTDB) algorithm, which transforms single-bell weight distributions into dual-bell forms to reduce binarization errors and applies inverse transformations to smooth activations. DBellQuant sets a new state-of-the-art by preserving superior model performance under aggressive weight and activation quantization. For example, on the Wikitext2 dataset, DBellQuant achieves a perplexity of 14.39 on LLaMA2-13B with 6-bit activation quantization, significantly outperforming BiLLM's 21.35 without activation quantization, underscoring its potential in compressing LLMs for real-world applications.
Qunsong Zeng, Jiawei Liu, Mingrui Jiang, Jun Lan, Yi Gong, Zhongrui Wang, Yida Li, Can Li, Jim Ignowski, Kaibin Huang
To support emerging applications ranging from holographic communications to extended reality, next-generation mobile wireless communication systems require ultra-fast and energy-efficient baseband processors. Traditional complementary metal-oxide-semiconductor (CMOS)-based baseband processors face two challenges in transistor scaling and the von Neumann bottleneck. To address these challenges, in-memory computing-based baseband processors using resistive random-access memory (RRAM) present an attractive solution. In this paper, we propose and demonstrate RRAM-implemented in-memory baseband processing for the widely adopted multiple-input-multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) air interface. Its key feature is to execute the key operations, including discrete Fourier transform (DFT) and MIMO detection using linear minimum mean square error (L-MMSE) and zero forcing (ZF), in one-step. In addition, RRAM-based channel estimation module is proposed and discussed. By prototyping and simulations, we demonstrate the feasibility of RRAM-based full-fledged communication system in hardware, and reveal it can outperform state-of-the-art baseband processors with a gain of 91.2$\times$ in latency and 671$\times$ in energy efficiency by large-scale simulations. Our results pave a potential pathway for RRAM-based in-memory computing to be implemented in the era of the sixth generation (6G) mobile communications.
Shaocong Wang, Yizhao Gao, Yi Li, Woyu Zhang, Yifei Yu, Bo Wang, Ning Lin, Hegan Chen, Yue Zhang, Yang Jiang, Dingchen Wang, Jia Chen, Peng Dai, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Xiaoxin Xu, Hayden So, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu
Visual sensors, including 3D LiDAR, neuromorphic DVS sensors, and conventional frame cameras, are increasingly integrated into edge-side intelligent machines. Realizing intensive multi-sensory data analysis directly on edge intelligent machines is crucial for numerous emerging edge applications, such as augmented and virtual reality and unmanned aerial vehicles, which necessitates unified data representation, unprecedented hardware energy efficiency and rapid model training. However, multi-sensory data are intrinsically heterogeneous, causing significant complexity in the system development for edge-side intelligent machines. In addition, the performance of conventional digital hardware is limited by the physically separated processing and memory units, known as the von Neumann bottleneck, and the physical limit of transistor scaling, which contributes to the slowdown of Moore's law. These limitations are further intensified by the tedious training of models with ever-increasing sizes. We propose a novel hardware-software co-design, random resistive memory-based deep extreme point learning machine (DEPLM), that offers efficient unified point set analysis. We show the system's versatility across various data modalities and two different learning tasks. Compared to a conventional digital hardware-based system, our co-design system achieves huge energy efficiency improvements and training cost reduction when compared to conventional systems. Our random resistive memory-based deep extreme point learning machine may pave the way for energy-efficient and training-friendly edge AI across various data modalities and tasks.
Kwunhang Wong, Songqi Wang, Wei Huang, Xinyuan Zhang, Yangu He, Karl M. H. Lai, Yuzhong Jiao, Ning Lin, Xiaojuan Qi, Xiaoming Chen, Zhongrui Wang
Biologically plausible Spiking Neural Networks (SNNs), characterized by spike sparsity, are growing tremendous attention over intellectual edge devices and critical bio-medical applications as compared to artificial neural networks (ANNs). However, there is a considerable risk from malicious attempts to extract white-box information (i.e., weights) from SNNs, as attackers could exploit well-trained SNNs for profit and white-box adversarial concerns. There is a dire need for intellectual property (IP) protective measures. In this paper, we present a novel secure software-hardware co-designed RRAM-based neuromorphic accelerator for protecting the IP of SNNs. Software-wise, we design a tailored genetic algorithm with classic XOR encryption to target the least number of weights that need encryption. From a hardware perspective, we develop a low-energy decryption module, meticulously designed to provide zero decryption latency. Extensive results from various datasets, including NMNIST, DVSGesture, EEGMMIDB, Braille Letter, and SHD, demonstrate that our proposed method effectively secures SNNs by encrypting a minimal fraction of stealthy weights, only 0.00005% to 0.016% weight bits. Additionally, it achieves a substantial reduction in energy consumption, ranging from x59 to x6780, and significantly lowers decryption latency, ranging from x175 to x4250. Moreover, our method requires as little as one sample per class in dataset for encryption and addresses hessian/gradient-based search insensitive problems. This strategy offers a highly efficient and flexible solution for securing SNNs in diverse applications.
Si-yu Li, Zhongrui Wang, Yingzhuo Han, Shaoqing Xu, Zhiyue Xu, Yingbo Wang, Zhengwen Wang, Yucheng Xue, Aisheng Song, Kenji Watanabe, Takashi Taniguchi, Xueyun Wang, Tian-Bao Ma, Jiawang Hong, Hong-Jun Gao, Yuhang Jiang, Jinhai Mao
Nanoscale polar structures are significant for understanding polarization processes in low-dimensional systems and hold potential for developing high-performance electronics. Here, we demonstrate a polar vortex superstructure arising from the reconstructed moiré patterns in twisted bilayer graphene aligned with hexagonal boron nitride. Scanning tunneling microscopy reveals spatially modulated charge polarization, while theoretical simulations indicate that the in-plane polarization field forms an array of polar vortices. Notably, this polar field is gate-tunable, exhibiting an unconventional gate-tunable polar sliding and screening process. Moreover, its interaction with electron correlations in twisted bilayer graphene leads to modulated correlated states. Our findings establish moiré pattern reconstruction as a powerful strategy for engineering nanoscale polar structures and emergent quantum phases in van der Waals materials.
Meng Xu, Jichang Yang, Ning Lin, Qundao Xu, Siqi Tang, Han Wang, Xiaojuan Qi, Zhongrui Wang, Ming Xu
Lattice field theory (LFT) simulations underpin advances in classical statistical mechanics and quantum field theory, providing a unified computational framework across particle, nuclear, and condensed matter physics. However, the application of these methods to high-dimensional systems remains severely constrained by several challenges, including the prohibitive computational cost and limited parallelizability of conventional sampling algorithms such as hybrid Monte Carlo (HMC), the substantial training expense associated with traditional normalizing flow models, and the inherent energy inefficiency of digital hardware architectures. Here, we introduce a software-hardware co-design that integrates an adaptive normalizing flow (ANF) model with a resistive memory-based neural differential equation solver, enabling efficient generation of LFT configurations. Software-wise, ANF enables efficient parallel generation of statistically independent configurations, thereby reducing computational costs, while low-rank adaptation (LoRA) allows cost-effective fine-tuning across diverse simulation parameters. Hardware-wise, in-memory computing with resistive memory substantially enhances both parallelism and energy efficiency. We validate our approach on the scalar phi4 theory and the effective field theory of graphene wires, using a hybrid analog-digital neural differential equation solver equipped with a 180 nm resistive memory in-memory computing macro. Our co-design enables low-cost computation, achieving approximately 8.2-fold and 13.9-fold reductions in integrated autocorrelation time over HMC, while requiring fine-tuning of less than 8% of the weights via LoRA. Compared to state-of-the-art GPUs, our co-design achieves up to approximately 16.1- and 17.0-fold speedups for the two tasks, as well as 73.7- and 138.0-fold improvements in energy efficiency.
Wei Xuan, Zihao Xuan, Rongliang Fu, Ning Lin, Kwunhang Wong, Zikang Yuan, Lang Feng, Zhongrui Wang, Tsung-Yi Ho, Yuzhong Jiao, Luhong Liang
The rapid deployment of deep neural network (DNN) accelerators in safety-critical domains such as autonomous vehicles, healthcare systems, and financial infrastructure necessitates robust mechanisms to safeguard data confidentiality and computational integrity. Existing security solutions for DNN accelerators, however, suffer from excessive hardware resource demands and frequent off-chip memory access overheads, which degrade performance and scalability. To address these challenges, this paper presents a secure and efficient memory protection framework for DNN accelerators with minimal overhead. First, we propose a bandwidth-aware cryptographic scheme that adapts encryption granularity based on memory traffic patterns, striking a balance between security and resource efficiency. Second, we observe that both the overlapping regions in the intra-layer tiling's sliding window pattern and those resulting from inter-layer tiling strategy discrepancies introduce substantial redundant memory accesses and repeated computational overhead in cryptography. Third, we introduce a multi-level authentication mechanism that effectively eliminates unnecessary off-chip memory accesses, enhancing performance and energy efficiency. Experimental results show that this work decreases performance overhead by over 12% and achieves 87% energy efficiency improvement for both server and edge neural processing units (NPUs), while ensuring robust scalability.
Adnan Mehonic, Daniele Ielmini, Kaushik Roy, Onur Mutlu, Shahar Kvatinsky, Teresa Serrano-Gotarredona, Bernabe Linares-Barranco, Sabina Spiga, Sergey Savelev, Alexander G Balanov, Nitin Chawla, Giuseppe Desoli, Gerardo Malavena, Christian Monzio Compagnoni, Zhongrui Wang, J Joshua Yang, Ghazi Sarwat Syed, Abu Sebastian, Thomas Mikolajick, Beatriz Noheda, Stefan Slesazeck, Bernard Dieny, Tuo-Hung, Hou, Akhil Varri, Frank Bruckerhoff-Pluckelmann, Wolfram Pernice, Xixiang Zhang, Sebastian Pazos, Mario Lanza, Stefan Wiefels, Regina Dittmann, Wing H Ng, Mark Buckwell, Horatio RJ Cox, Daniel J Mannion, Anthony J Kenyon, Yingming Lu, Yuchao Yang, Damien Querlioz, Louis Hutin, Elisa Vianello, Sayeed Shafayet Chowdhury, Piergiulio Mannocci, Yimao Cai, Zhong Sun, Giacomo Pedretti, John Paul Strachan, Dmitri Strukov, Manuel Le Gallo, Stefano Ambrogio, Ilia Valov, Rainer Waser
The roadmap is organized into several thematic sections, outlining current computing challenges, discussing the neuromorphic computing approach, analyzing mature and currently utilized technologies, providing an overview of emerging technologies, addressing material challenges, exploring novel computing concepts, and finally examining the maturity level of emerging technologies while determining the next essential steps for their advancement.
Yaping Zhao, Shuhui Shi, Ramgopal Ravi, Zhongrui Wang, Edmund Y. Lam, Jichang Zhao
The study of socioeconomic status has been reformed by the availability of digital records containing data on real estate, points of interest, traffic and social media trends such as micro-blogging. In this paper, we describe a heterogeneous, multi-source, multi-modal, multi-view and multi-distributional dataset named "H4M". The mixed dataset contains data on real estate transactions, points of interest, traffic patterns and micro-blogging trends from Beijing, China. The unique composition of H4M makes it an ideal test bed for methodologies and approaches aimed at studying and solving problems related to real estate, traffic, urban mobility planning, social sentiment analysis etc. The dataset is available at: https://indigopurple.github.io/H4M/index.html
Yaping Zhao, Ramgopal Ravi, Shuhui Shi, Zhongrui Wang, Edmund Y. Lam, Jichang Zhao
Real estate prices have a significant impact on individuals, families, businesses, and governments. The general objective of real estate price prediction is to identify and exploit socioeconomic patterns arising from real estate transactions over multiple aspects, ranging from the property itself to other contributing factors. However, price prediction is a challenging multidimensional problem that involves estimating many characteristics beyond the property itself. In this paper, we use multiple sources of data to evaluate the economic contribution of different socioeconomic characteristics such as surrounding amenities, traffic conditions and social emotions. Our experiments were conducted on 28,550 houses in Beijing, China and we rank each characteristic by its importance. Since the use of multi-source information improves the accuracy of predictions, the aforementioned characteristics can be an invaluable resource to assess the economic and social value of real estate. Code and data are available at: https://github.com/IndigoPurple/PATE
Bo Wang, Shaocong Wang, Ning Lin, Yi Li, Yifei Yu, Yue Zhang, Jichang Yang, Xiaoshan Wu, Yangu He, Songqi Wang, Rui Chen, Guoqi Li, Xiaojuan Qi, Zhongrui Wang, Dashan Shang
There is unprecedented development in machine learning, exemplified by recent large language models and world simulators, which are artificial neural networks running on digital computers. However, they still cannot parallel human brains in terms of energy efficiency and the streamlined adaptability to inputs of different difficulties, due to differences in signal representation, optimization, run-time reconfigurability, and hardware architecture. To address these fundamental challenges, we introduce pruning optimization for input-aware dynamic memristive spiking neural network (PRIME). Signal representation-wise, PRIME employs leaky integrate-and-fire neurons to emulate the brain's inherent spiking mechanism. Drawing inspiration from the brain's structural plasticity, PRIME optimizes the topology of a random memristive spiking neural network without expensive memristor conductance fine-tuning. For runtime reconfigurability, inspired by the brain's dynamic adjustment of computational depth, PRIME employs an input-aware dynamic early stop policy to minimize latency during inference, thereby boosting energy efficiency without compromising performance. Architecture-wise, PRIME leverages memristive in-memory computing, mirroring the brain and mitigating the von Neumann bottleneck. We validated our system using a 40 nm 256 Kb memristor-based in-memory computing macro on neuromorphic image classification and image inpainting. Our results demonstrate the classification accuracy and Inception Score are comparable to the software baseline, while achieving maximal 62.50-fold improvements in energy efficiency, and maximal 77.0% computational load savings. The system also exhibits robustness against stochastic synaptic noise of analogue memristors. Our software-hardware co-designed model paves the way to future brain-inspired neuromorphic computing with brain-like energy efficiency and adaptivity.