Yaping Zhao, Haitian Zheng, Zhongrui Wang, Jiebo Luo, Edmund Y. Lam
In video denoising, the adjacent frames often provide very useful information, but accurate alignment is needed before such information can be harnassed. In this work, we present a multi-alignment network, which generates multiple flow proposals followed by attention-based averaging. It serves to mimic the non-local mechanism, suppressing noise by averaging multiple observations. Our approach can be applied to various state-of-the-art models that are based on flow estimation. Experiments on a large-scale video dataset demonstrate that our method improves the denoising baseline model by 0.2dB, and further reduces the parameters by 47% with model distillation. Code is available at https://github.com/IndigoPurple/MANet.
Yaping Zhao, Haitian Zheng, Zhongrui Wang, Jiebo Luo, Edmund Y. Lam
To achieve point cloud denoising, traditional methods heavily rely on geometric priors, and most learning-based approaches suffer from outliers and loss of details. Recently, the gradient-based method was proposed to estimate the gradient fields from the noisy point clouds using neural networks, and refine the position of each point according to the estimated gradient. However, the predicted gradient could fluctuate, leading to perturbed and unstable solutions, as well as a long inference time. To address these issues, we develop the momentum gradient ascent method that leverages the information of previous iterations in determining the trajectories of the points, thus improving the stability of the solution and reducing the inference time. Experiments demonstrate that the proposed method outperforms state-of-the-art approaches with a variety of point clouds, noise types, and noise levels. Code is available at: https://github.com/IndigoPurple/MAG
Jichang Yang, Hegan Chen, Jia Chen, Songqi Wang, Shaocong Wang, Yifei Yu, Xi Chen, Bo Wang, Xinyuan Zhang, Binbin Cui, Yi Li, Ning Lin, Meng Xu, Yi Li, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Han Wang, Qi Liu, Kwang-Ting Cheng, Ming Liu
Human brains image complicated scenes when reading a novel. Replicating this imagination is one of the ultimate goals of AI-Generated Content (AIGC). However, current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. This deficiency is rooted in the difference between the brain and digital computers. Digital computers have physically separated storage and processing units, resulting in frequent data transfers during iterative calculations, incurring large time and energy overheads. This issue is further intensified by the conversion of inherently continuous and analog generation dynamics, which can be formulated by neural differential equations, into discrete and digital operations. Inspired by the brain, we propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion, employing emerging resistive memory. The integration of storage and computation within resistive memory synapses surmount the von Neumann bottleneck, benefiting the generative speed and energy efficiency. The closed-loop feedback integrator is time-continuous, analog, and compact, physically implementing an infinite-depth neural network. Moreover, the software-hardware co-design is intrinsically robust to analog noise. We experimentally validate our solution with 180 nm resistive memory in-memory computing macros. Demonstrating equivalent generative quality to the software baseline, our system achieved remarkable enhancements in generative speed for both unconditional and conditional generation tasks, by factors of 64.8 and 156.5, respectively. Moreover, it accomplished reductions in energy consumption by factors of 5.2 and 4.1. Our approach heralds a new horizon for hardware solutions in edge computing for generative AI applications.
Qiming Shao, Zhongrui Wang, Yan Zhou, Shunsuke Fukami, Damien Querlioz, Leon O. Chua
The ever-increasing amount of data from ubiquitous smart devices fosters data-centric and cognitive algorithms. Traditional digital computer systems have separate logic and memory units, resulting in a huge delay and energy cost for implementing these algorithms. Memristors are programmable resistors with a memory, providing a paradigm-shifting approach towards creating intelligent hardware systems to handle data-centric tasks. Spintronic nanodevices are promising choices as they are high-speed, low-power, highly scalable, robust, and capable of constructing dynamic complex systems. In this Review, we survey spintronic devices from a memristor point of view. We introduce spintronic memristors based on magnetic tunnel junctions, nanomagnet ensemble, domain walls, topological spin textures, and spin waves, which represent dramatically different state spaces. They can exhibit steady, oscillatory, stochastic, and chaotic trajectories in their state spaces, which have been exploited for in-memory logic, neuromorphic computing, stochastic and chaos computing. Finally, we discuss challenges and trends in realizing large-scale spintronic memristive systems for practical applications.
Chuanqi Chen, Zhongrui Wang, Nan Chen, Jin-Long Wu
A discrete-time conditional Gaussian Koopman network (CGKN) is developed in this work to learn surrogate models that can perform efficient state forecast and data assimilation (DA) for high-dimensional complex dynamical systems, e.g., systems governed by nonlinear partial differential equations (PDEs). Focusing on nonlinear partially observed systems that are common in many engineering and earth science applications, this work exploits Koopman embedding to discover a proper latent representation of the unobserved system states, such that the dynamics of the latent states are conditional linear, i.e., linear with the given observed system states. The modeled system of the observed and latent states then becomes a conditional Gaussian system, for which the posterior distribution of the latent states is Gaussian and can be efficiently evaluated via analytical formulae. The analytical formulae of DA facilitate the incorporation of DA performance into the learning process of the modeled system, which leads to a framework that unifies scientific machine learning (SciML) and data assimilation. The performance of discrete-time CGKN is demonstrated on several canonical problems governed by nonlinear PDEs with intermittency and turbulent features, including the viscous Burgers' equation, the Kuramoto-Sivashinsky equation, and the 2-D Navier-Stokes equations, with which we show that the discrete-time CGKN framework achieves comparable performance as the state-of-the-art SciML methods in state forecast and provides efficient and accurate DA results. The discrete-time CGKN framework also serves as an example to illustrate unifying the development of SciML models and their other outer-loop applications such as design optimization, inverse problems, and optimal control.
Shaocong Wang, Yi Li, Dingchen Wang, Woyu Zhang, Xi Chen, Danian Dong, Songqi Wang, Xumeng Zhang, Peng Lin, Claudio Gallicchio, Xiaoxin Xu, Qi Liu, Kwang-Ting Cheng, Zhongrui Wang, Dashan Shang, Ming Liu
Recent years have witnessed an unprecedented surge of interest, from social networks to drug discovery, in learning representations of graph-structured data. However, graph neural networks, the machine learning models for handling graph-structured data, face significant challenges when running on conventional digital hardware, including von Neumann bottleneck incurred by physically separated memory and processing units, slowdown of Moore's law due to transistor scaling limit, and expensive training cost. Here we present a novel hardware-software co-design, the random resistor array-based echo state graph neural network, which addresses these challenges. The random resistor arrays not only harness low-cost, nanoscale and stackable resistors for highly efficient in-memory computing using simple physical laws, but also leverage the intrinsic stochasticity of dielectric breakdown to implement random projections in hardware for an echo state network that effectively minimizes the training cost thanks to its fixed and random weights. The system demonstrates state-of-the-art performance on both graph classification using the MUTAG and COLLAB datasets and node classification using the CORA dataset, achieving 34.2x, 93.2x, and 570.4x improvement of energy efficiency and 98.27%, 99.46%, and 95.12% reduction of training cost compared to conventional graph learning on digital hardware, respectively, which may pave the way for the next generation AI system for graph learning.
Hegan Chen, Jichang Yang, Jia Chen, Songqi Wang, Shaocong Wang, Dingchen Wang, Xinyu Tian, Yifei Yu, Xi Chen, Yinan Lin, Yangu He, Xiaoshan Wu, Yi Li, Xinyuan Zhang, Ning Lin, Meng Xu, Yi Li, Xumeng Zhang, Zhongrui Wang, Han Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu
Digital twins, the cornerstone of Industry 4.0, replicate real-world entities through computer models, revolutionising fields such as manufacturing management and industrial automation. Recent advances in machine learning provide data-driven methods for developing digital twins using discrete-time data and finite-depth models on digital computers. However, this approach fails to capture the underlying continuous dynamics and struggles with modelling complex system behaviour. Additionally, the architecture of digital computers, with separate storage and processing units, necessitates frequent data transfers and Analogue-Digital (A/D) conversion, thereby significantly increasing both time and energy costs. Here, we introduce a memristive neural ordinary differential equation (ODE) solver for digital twins, which is capable of capturing continuous-time dynamics and facilitates the modelling of complex systems using an infinite-depth model. By integrating storage and computation within analogue memristor arrays, we circumvent the von Neumann bottleneck, thus enhancing both speed and energy efficiency. We experimentally validate our approach by developing a digital twin of the HP memristor, which accurately extrapolates its nonlinear dynamics, achieving a 4.2-fold projected speedup and a 41.4-fold projected decrease in energy consumption compared to state-of-the-art digital hardware, while maintaining an acceptable error margin. Additionally, we demonstrate scalability through experimentally grounded simulations of Lorenz96 dynamics, exhibiting projected performance improvements of 12.6-fold in speed and 189.7-fold in energy efficiency relative to traditional digital approaches. By harnessing the capabilities of fully analogue computing, our breakthrough accelerates the development of digital twins, offering an efficient and rapid solution to meet the demands of Industry 4.0.
Zhongrui Wang, Chuanqi Chen, Jin-Long Wu, Nan Chen
Lagrangian data assimilation aims to recover hidden Eulerian flow fields from sparse, indirect observations of moving tracers. This problem is challenging because tracer trajectories are nonlinearly coupled with the underlying flow, making posterior inference computationally intractable in realistic, high-dimensional systems. In this work, we develop a Lagrangian conditional Gaussian Koopman network (LaCGKN), a structure-preserving, data-driven framework for joint data assimilation and prediction from Lagrangian observations. LaCGKN embeds Eulerian flow dynamics into a low-dimensional latent space governed by a nonlinear stochastic system with conditional Gaussian structure, enabling analytic posterior updates without ensemble forecasting. Unlike existing conditional Gaussian Koopman formulations that assume direct Eulerian observations, the Lagrangian setting imposes additional demands on the latent representation, which must simultaneously encode the flow dynamics and mediate nonlinear tracer-flow interactions. To address these challenges, the LaCGKN incorporates three key components: (i) tracer homogenization to enforce permutation equivariance and generalize across varying numbers of tracers; (ii) Fourier positional encoding to capture spatial dependence and reconstruct local flow features at moving tracer locations; and (iii) an SVD-inspired low-rank parameterization of the latent transition operator, which reduces model complexity while retaining expressiveness. An application to a two-layer quasi-geostrophic flow with surface tracer observations shows that LaCGKN achieves accurate and efficient Lagrangian data assimilation and prediction, without reliance on ensemble methods or the governing physical model. These results establish the LaCGKN as a unified and computationally tractable alternative to both traditional model-based approaches and purely black-box data-driven methods.
Yi Li, Songqi Wang, Yaping Zhao, Shaocong Wang, Woyu Zhang, Yangu He, Ning Lin, Binbin Cui, Xi Chen, Shiming Zhang, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Xiaoxin Xu, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu
The rapid advancement of artificial intelligence (AI) has been marked by the large language models exhibiting human-like intelligence. However, these models also present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing and exploits emerging analogue electronic devices, such as resistive memory, which features in-memory computing, high scalability, and nonvolatility. However, analogue computing still faces the same challenges as before: programming nonidealities and expensive programming due to the underlying devices physics. Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning to optimize the topology of a randomly weighted analogue resistive memory neural network. Software-wise, the topology of a randomly weighted neural network is optimized by pruning connections rather than precisely tuning resistive memory weights. Hardware-wise, we reveal the physical origin of the programming stochasticity using transmission electron microscopy, which is leveraged for large-scale and low-cost implementation of an overparameterized random neural network containing high-performance sub-networks. We implemented the co-design on a 40nm 256K resistive memory macro, observing 17.3% and 19.9% accuracy improvements in image and audio classification on FashionMNIST and Spoken digits datasets, as well as 9.8% (2%) improvement in PR (ROC) in image segmentation on DRIVE datasets, respectively. This is accompanied by 82.1%, 51.2%, and 99.8% improvement in energy efficiency thanks to analogue in-memory computing. By embracing the intrinsic stochasticity and in-memory computing, this work may solve the biggest obstacle of analogue computing systems and thus unleash their immense potential for next-generation AI hardware.
Yifei Yu, Shaocong Wang, Woyu Zhang, Xinyuan Zhang, Xiuzhe Wu, Yangu He, Jichang Yang, Yue Zhang, Ning Lin, Bo Wang, Xi Chen, Songqi Wang, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu
Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. Replicating this capability in AI finds wide applications in medical imaging, AR/VR, and embodied AI, where input data is often sparse and computing resources are limited. However, traditional signal reconstruction methods on digital computers face both software and hardware challenges. On the software front, difficulties arise from storage inefficiencies in conventional explicit signal representation. Hardware obstacles include the von Neumann bottleneck, which limits data transfer between the CPU and memory, and the limitations of CMOS circuits in supporting parallel processing. We propose a systematic approach with software-hardware co-optimizations for signal reconstruction from sparse inputs. Software-wise, we employ neural field to implicitly represent signals via neural networks, which is further compressed using low-rank decomposition and structured pruning. Hardware-wise, we design a resistive memory-based computing-in-memory (CIM) platform, featuring a Gaussian Encoder (GE) and an MLP Processing Engine (PE). The GE harnesses the intrinsic stochasticity of resistive memory for efficient input encoding, while the PE achieves precise weight mapping through a Hardware-Aware Quantization (HAQ) circuit. We demonstrate the system's efficacy on a 40nm 256Kb resistive memory-based in-memory computing macro, achieving huge energy efficiency and parallelism improvements without compromising reconstruction quality in tasks like 3D CT sparse reconstruction, novel view synthesis, and novel view synthesis for dynamic scenes. This work advances the AI-driven signal restoration technology and paves the way for future efficient and robust medical AI and 3D vision applications.
Ning Lin, Shaocong Wang, Yi Li, Bo Wang, Shuhui Shi, Yangu He, Woyu Zhang, Yifei Yu, Yue Zhang, Xinyuan Zhang, Kwunhang Wong, Songqi Wang, Xiaoming Chen, Hao Jiang, Xumeng Zhang, Peng Lin, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Dashan Shang, Qi Liu, Ming Liu
The human brain is a complex spiking neural network (SNN), capable of learning multimodal signals in a zero-shot manner by generalizing existing knowledge. Remarkably, it maintains minimal power consumption through event-based signal propagation. However, replicating the human brain in neuromorphic hardware presents both hardware and software challenges. Hardware limitations, such as the slowdown of Moore's law and Von Neumann bottleneck, hinder the efficiency of digital computers. Additionally, SNNs are characterized by their software training complexities. To this end, we propose a hardware-software co-design on a 40 nm 256 Kb in-memory computing macro that physically integrates a fixed and random liquid state machine (LSM) SNN encoder with trainable artificial neural network (ANN) projections. We showcase the zero-shot LSM-based learning of multimodal events on the N-MNIST and N-TIDIGITS datasets, including visual and audio data association, as well as neural and visual data alignment for brain-machine interfaces. Our co-design achieves classification accuracy comparable to fully optimized software models, resulting in a 152.83 and 393.07-fold reduction in training costs compared to SOTA contrastive language-image pre-training (CLIP) and Prototypical networks, and a 23.34 and 160-fold improvement in energy efficiency compared to cutting-edge digital hardware, respectively. These proof-of-principle prototypes demonstrate zero-shot multimodal events learning capability for emerging efficient and compact neuromorphic hardware.
Zhongrui Wang, Nan Chen, Di Qi
State estimation in multi-layer turbulent flow fields with only a single layer of partial observation remains a challenging yet practically important task. Applications include inferring the state of the deep ocean by exploiting surface observations. Directly implementing an ensemble Kalman filter based on the full forecast model is usually expensive. One widely used method in practice projects the information of the observed layer to other layers via linear regression. However, when nonlinearity in the highly turbulent flow field becomes dominant, the regression solution will suffer from large uncertainty errors. In this paper, we develop a multi-step nonlinear data assimilation method. A sequence of nonlinear assimilation steps is applied from layer to layer recurrently. Fundamentally different from the traditional linear regression approaches, a conditional Gaussian nonlinear system is adopted as the approximate forecast model to characterize the nonlinear dependence between adjacent layers. The estimated posterior is a Gaussian mixture, which can be highly non-Gaussian. Therefore, the multi-step nonlinear data assimilation method can capture strongly turbulent features, especially intermittency and extreme events, and better quantify the inherent uncertainty. Another notable advantage of the multi-step data assimilation method is that the posterior distribution can be solved using closed-form formulae under the conditional Gaussian framework. Applications to the two-layer quasi-geostrophic system with Lagrangian data assimilation show that the multi-step method outperforms the one-step method with linear stochastic flow models, especially as the tracer number and ensemble size increase.
Chen-Yu Wang, Shi-Jun Liang, Shuang Wang, Pengfei Wang, Zhuan Li, Zhongrui Wang, Anyuan Gao, Chen Pan, Chuan Liu, Jian Liu, Huafeng Yang, Xiaowei Liu, Wenhao Song, Cong Wang, Xiaomu Wang, Kunji Chen, Zhenlin Wang, Kenji Watanabe, Takashi Taniguchi, J. Joshua Yang, Feng Miao
Early processing of visual information takes place in the human retina. Mimicking neurobiological structures and functionalities of the retina provide a promising pathway to achieving vision sensor with highly efficient image processing. Here, we demonstrate a prototype vision sensor that operates via the gate-tunable positive and negative photoresponses of the van der Waals (vdW) vertical heterostructures. The sensor emulates not only the neurobiological functionalities of bipolar cells and photoreceptors but also the unique synaptic connectivity between bipolar cells and photoreceptors. By tuning gate voltage for each pixel, we achieve reconfigurable vision sensor for simultaneously image sensing and processing. Furthermore, our prototype vision sensor itself can be trained to classify the input images, via updating the gate voltages applied individually to each pixel in the sensor. Our work indicates that vdW vertical heterostructures offer a promising platform for the development of neural network vision sensor.
Wei Wang, Loai Danial, Yang Li, Eric Herbelin, Evgeny Pikhay, Yakov Roizin, Barak Hoffer, Zhongrui Wang, Shahar Kvatinsky
Memristor-based neuromorphic computing could overcome the limitations of traditional von Neumann computing architectures -- in which data are shuffled between separate memory and processing units -- and improve the performance of deep neural networks. However, this will require accurate synaptic-like device performance, and memristors typically suffer from poor yield and a limited number of reliable conductance states. Here we report floating gate memristive synaptic devices that are fabricated in a commercial complementary metal-oxide-semiconductor (CMOS) process. These silicon synapses offer analogue tunability, high endurance, long retention times, predictable cycling degradation, moderate device-to-device variations, and high yield. They also provide two orders of magnitude higher energy efficiency for multiply-accumulate operations than graphics processing units. We use two 12-by-8 arrays of the memristive devices for in-situ training of a 19-by-8 memristive restricted Boltzmann machine for pattern recognition via a gradient descent algorithm based on contrastive divergence. We then create a memristive deep belief neural network consisting of three memristive restricted Boltzmann machines. We test this on the modified National Institute of Standards and Technology (MNIST) dataset, demonstrating recognition accuracy up to 97.05%.
Qunsong Zeng, Jiawei Liu, Jun Lan, Yi Gong, Zhongrui Wang, Yida Li, Kaibin Huang
To support emerging applications ranging from holographic communications to extended reality, next-generation mobile wireless communication systems require ultra-fast and energy-efficient (UFEE) baseband processors. Traditional complementary metal-oxide-semiconductor (CMOS)-based baseband processors face two challenges in transistor scaling and the von Neumann bottleneck. To address these challenges, in-memory computing-based baseband processors using resistive random-access memory (RRAM) present an attractive solution. In this paper, we propose and demonstrate RRAM-based in-memory baseband processing for the widely adopted multiple-input-multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) air interface. Its key feature is to execute the key operations, including discrete Fourier transform (DFT) and MIMO detection using linear minimum mean square error (L-MMSE) and zero forcing (ZF), in one-step. In addition, RRAM-based channel estimation as well as mapper/demapper modules are proposed. By prototyping and simulations, we demonstrate that the RRAM-based full-fledged communication system can significantly outperform its CMOS-based counterpart in terms of speed and energy efficiency by $10^3$ and $10^6$ times, respectively. The results pave a potential pathway for RRAM-based in-memory computing to be implemented in the era of the sixth generation (6G) mobile communications.
Kam Chi Loong, Shihao Han, Sishuo Liu, Ning Lin, Zhongrui Wang
Computing-in-memory (CIM) is an emerging computing paradigm, offering noteworthy potential for accelerating neural networks with high parallelism, low latency, and energy efficiency compared to conventional von Neumann architectures. However, existing research has primarily focused on hardware architecture and network co-design for large-scale neural networks, without considering resource constraints. In this study, we aim to develop edge-friendly deep neural networks (DNNs) for accelerators based on resistive random-access memory (RRAM). To achieve this, we propose an edge compilation and resource-constrained RRAM-aware neural architecture search (NAS) framework to search for optimized neural networks meeting specific hardware constraints. Our compilation approach integrates layer partitioning, duplication, and network packing to maximize the utilization of computation units. The resulting network architecture can be optimized for either high accuracy or low latency using a one-shot neural network approach with Pareto optimality achieved through the Non-dominated Sorted Genetic Algorithm II (NSGA-II). The compilation of mobile-friendly networks, like Squeezenet and MobilenetV3 small can achieve over 80% of utilization and over 6x speedup compared to ISAAC-like framework with different crossbar resources. The resulting model from NAS optimized for speed achieved 5x-30x speedup. The code for this paper is available at https://github.com/ArChiiii/rram_nas_comp_pack.
Haoxiong Ren, Yangu He, Kwunhang Wong, Rui Bao, Ning Lin, Zhongrui Wang, Dashan Shang
Spiking Neural Networks (SNNs) are increasingly favored for deployment on resource-constrained edge devices due to their energy-efficient and event-driven processing capabilities. However, training SNNs remains challenging because of the computational intensity of traditional backpropagation algorithms adapted for spike-based systems. In this paper, we propose a novel software-hardware co-design that introduces a hardware-friendly training algorithm, Spiking Direct Feedback Alignment (SDFA) and implement it on a Resistive Random Access Memory (RRAM)-based In-Memory Computing (IMC) architecture, referred to as PipeSDFA, to accelerate SNN training. Software-wise, the computational complexity of SNN training is reduced by the SDFA through the elimination of sequential error propagation. Hardware-wise, a three-level pipelined dataflow is designed based on IMC architecture to parallelize the training process. Experimental results demonstrate that the PipeSDFA training accelerator incurs less than 2% accuracy loss on five datasets compared to baselines, while achieving 1.1X~10.5X and 1.37X~2.1X reductions in training time and energy consumption, respectively compared to PipeLayer.
Wei Xuan, Zhongrui Wang, Lang Feng, Ning Lin, Zihao Xuan, Rongliang Fu, Tsung-Yi Ho, Yuzhong Jiao, Luhong Liang
Ensuring the confidentiality and integrity of DNN accelerators is paramount across various scenarios spanning autonomous driving, healthcare, and finance. However, current security approaches typically require extensive hardware resources, and incur significant off-chip memory access overheads. This paper introduces SeDA, which utilizes 1) a bandwidth-aware encryption mechanism to improve hardware resource efficiency, 2) optimal block granularity through intra-layer and inter-layer tiling patterns, and 3) a multi-level integrity verification mechanism that minimizes, or even eliminates, memory access overheads. Experimental results show that SeDA decreases performance overhead by over 12% for both server and edge neural processing units (NPUs), while ensuring robust scalability.
Yaping Zhao, Guanghan Li, Edmund Y. Lam
With advances in optical sensor technology, heterogeneous camera systems are increasingly used for high-resolution (HR) video acquisition and analysis. However, motion transfer across multiple cameras poses challenges. To address this, we propose an algorithm based on time series analysis that identifies motion seasonality and constructs an additive model to extract transferable patterns. Validated on real-world data, our algorithm demonstrates effectiveness and interpretability. Notably, it improves pose estimation in low-resolution videos by leveraging patterns derived from HR counterparts, enhancing practical utility. Code is available at: https://github.com/IndigoPurple/TSAMT
Yue Zhang, Woyu Zhang, Shaocong Wang, Ning Lin, Yifei Yu, Yangu He, Bo Wang, Hao Jiang, Peng Lin, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu
The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network (DNN) using memristor. The network associates incoming data with the past experience stored as semantic vectors. The network and the semantic memory are physically implemented on noise-robust ternary memristor-based Computing-In-Memory (CIM) and Content-Addressable Memory (CAM) circuits, respectively. We validate our co-designs, using a 40nm memristor macro, on ResNet and PointNet++ for classifying images and 3D points from the MNIST and ModelNet datasets, which not only achieves accuracy on par with software but also a 48.1% and 15.9% reduction in computational budget. Moreover, it delivers a 77.6% and 93.3% reduction in energy consumption.