Yihao Chen, Jiahuei Lin, Bram Adams, Ahmed E. Hassan
[Context]: Containerization ensures the resilience of distributed applications by Kubernetes. Helm is a package manager for Kubernetes applications. A Helm package, namely "Chart'', is a set of pre-configured resources that one could quickly deploy a complex application. However, Helm broadens the attack surface of the distributed applications. [Objective]: This study aims to investigate the prevalence of fixable vulnerabilities, the factors related to the vulnerabilities, and current mitigation strategies in Helm Charts. [Method]: We conduct a mixed-methods study on 11,035 Helm Charts affected by 10,982 fixable vulnerabilities. We analyze the complexity of Charts and compare the distribution of vulnerabilities between official and unofficial Charts. Subsequently, we investigate vulnerability mitigation strategies from the Chart-associated repositories by a grounded theory. [Results]: Our findings highlight that the complexity of a Chart correlates with the number of vulnerabilities, and the official Charts do not contain fewer vulnerabilities compared to unofficial Charts. The 10,982 fixable vulnerabilities are at a median of high severity and can be easily exploited. In addition, we identify 11 vulnerability mitigation strategies in three categories. Due to the complexity of Charts, maintainers are required to investigate where a vulnerability impacts and how to mitigate it. The use of automated strategies is low as automation has limited capability(e.g., a higher number of false positives) in such complex Charts. [Conclusion]: There exists need for automation tools that assist maintainers in mitigating vulnerabilities to reduce manual effort. In addition, Chart maintainers lack incentives to mitigate vulnerabilities, given a lack of guidelines for mitigation responsibilities. Adopting a shared responsibility model in the Helm ecosystem would increase its security.
Yihao Chen, Yue Yang
We develop a deep reinforcement learning method for training a jellyfish-like swimmer to effectively track a moving target in a two-dimensional flow. This swimmer is a flexible object equipped with a muscle model based on torsional springs. We employ a deep Q-network (DQN) that takes the swimmer's geometry and dynamic parameters as inputs, and outputs actions which are the forces applied to the swimmer. In particular, we introduce an action regulation to mitigate the interference from complex fluid-structure interactions. The goal of these actions is to navigate the swimmer to a target point in the shortest possible time. In the DQN training, the data on the swimmer's motions are obtained from simulations conducted using the immersed boundary method. During tracking a moving target, there is an inherent delay between the application of forces and the corresponding response of the swimmer's body due to hydrodynamic interactions between the shedding vortices and the swimmer's own locomotion. Our tests demonstrate that the swimmer, with the DQN agent and action regulation, is able to dynamically adjust its course based on its instantaneous state. This work extends the application scope of machine learning in controlling flexible objects within fluid environments.
Yunqi Hao, Yihao Chen, Minqiang Xu, Jianbo Zhan, Liang He, Lei Fang, Sian Fang, Lin Liu
In recent years, self-supervised learning (SSL) models have made significant progress in audio deepfake detection (ADD) tasks. However, existing SSL models mainly rely on large-scale real speech for pre-training and lack the learning of spoofed samples, which leads to susceptibility to domain bias during the fine-tuning process of the ADD task. To this end, we propose a two-stage learning strategy (Wav2DF-TSL) based on pre-training and hierarchical expert fusion for robust audio deepfake detection. In the pre-training stage, we use adapters to efficiently learn artifacts from 3000 hours of unlabelled spoofed speech, improving the adaptability of front-end features while mitigating catastrophic forgetting. In the fine-tuning stage, we propose the hierarchical adaptive mixture of experts (HA-MoE) method to dynamically fuse multi-level spoofing cues through multi-expert collaboration with gated routing. Experimental results show that the proposed method significantly outperforms the baseline system on all four benchmark datasets, especially on the cross-domain In-the-wild dataset, achieving a 27.5% relative improvement in equal error rate (EER), outperforming the existing state-of-the-art systems. Index Terms: audio deepfake detection, self-supervised learning, parameter-efficient fine-tuning, mixture of experts
Zhaorui Sun, Yihao Chen, Jialong Wang, Minqiang Xu, Lei Fang, Sian Fang, Lin Liu
With the continuous development of speech recognition technology, speaker verification (SV) has become an important method for identity authentication. Traditional SV methods rely on handcrafted feature extraction, while deep learning has significantly improved system performance. However, the scarcity of labeled data still limits the widespread application of deep learning in SV. Self-supervised learning, by mining latent information in large unlabeled datasets, enhances model generalization and is a key technology to address this issue. DINO is an efficient self-supervised learning method that generates pseudo-labels from unlabeled speech data through clustering, supporting subsequent training. However, clustering may produce noisy pseudo-labels, which can reduce overall recognition performance. To address this issue, this paper proposes an improved clustering framework based on similarity connection graphs and Graph Convolutional Networks. By leveraging GCNs' ability to model structured data and incorporating relational information between nodes in the similarity connection graph, the clustering process is optimized, improving pseudo-label accuracy and enhancing the robustness and performance of the self-supervised speaker verification system. Experimental results show that this method significantly improves system performance and provides a new approach for self-supervised speaker verification. Index Terms: Speaker Verification, Self-Supervised Learning, DINO, Clustering Algorithm, Graph Convolutional Network, Similarity Connection Graph
Yihao Chen, Yongxing Shen
Crack nucleation is crucial in many industrial applications. The phase field method for fracture transforms the crack nucleation problem into a minimization problem of the sum of the elastic potential energy and the crack surface energy. Due to the polyconvexity of the formulation, starting from a crackless solid, a standard Newton iteration may lead to a solution with no crack, even though a cracked solution has a lower total energy. As such, the critical load for cracking is highly overestimated. Here, we propose an algorithm termed "parallel universe" algorithm to capture the global minimum. This algorithm has two key ingredients: (a) a necessary condition for cracking solely based on the current crackless solution, and (b) beginning from when this condition is met, Newton iteration with two initial guesses, a crackles one and a cracked one, will both be performed and the converged candidate solution with lower energy is accepted as the solution at that load step. Once the cracked candidate solution is accepted, the crackless one is discarded, i.e., only one universe is retained. This cracked initial guess is obtained only once for all load steps by solving a series of similar minimization problems with a progressively reduced critical crack energy release rate. Numerical examples with isotropic and anisotropic critical crack energy release rates indicate that the proposed algorithm is more reliable (as there is no need to retrace) and more efficient than the standard Newton iteration and a well-known backtracking algorithm.
Yihao Chen, Alexander Lerch
Automatic lyrics generation has received attention from both music and AI communities for years. Early rule-based approaches have~---due to increases in computational power and evolution in data-driven models---~mostly been replaced with deep-learning-based systems. Many existing approaches, however, either rely heavily on prior knowledge in music and lyrics writing or oversimplify the task by largely discarding melodic information and its relationship with the text. We propose an end-to-end melody-conditioned lyrics generation system based on Sequence Generative Adversarial Networks (SeqGAN), which generates a line of lyrics given the corresponding melody as the input. Furthermore, we investigate the performance of the generator with an additional input condition: the theme or overarching topic of the lyrics to be generated. We show that the input conditions have no negative impact on the evaluation metrics while enabling the network to produce more meaningful results.
Yihao Chen, Zefang Wang
Channel pruning is a promising method for accelerating and compressing convolutional neural networks. However, current pruning algorithms still remain unsolved problems that how to assign layer-wise pruning ratios properly and discard the least important channels with a convincing criterion. In this paper, we present a novel channel pruning approach via information theory and interpretability of neural networks. Specifically, we regard information entropy as the expected amount of information for convolutional layers. In addition, if we suppose a matrix as a system of linear equations, a higher-rank matrix represents there exist more solutions to it, which indicates more uncertainty. From the point of view of information theory, the rank can also describe the amount of information. In a neural network, considering the rank and entropy as two information indicators of convolutional layers, we propose a fusion function to reach a compromise of them, where the fusion results are defined as ``information concentration''. When pre-defining layer-wise pruning ratios, we employ the information concentration as a reference instead of heuristic and engineering tuning to provide a more interpretable solution. Moreover, we leverage Shapley values, which are a potent tool in the interpretability of neural networks, to evaluate the channel contributions and discard the least important channels for model compression while maintaining its performance. Extensive experiments demonstrate the effectiveness and promising performance of our method. For example, our method improves the accuracy by 0.21% when reducing 45.5% FLOPs and removing 40.3% parameters for ResNet-56 on CIFAR-10. Moreover, our method obtains loss in Top-1/Top-5 accuracies of 0.43%/0.11% by reducing 41.6% FLOPs and removing 35.0% parameters for ResNet-50 on ImageNet.
Shunming Tao, Yihao Chen, Jingxing Wu, Duancheng Zhao, Hanxuan Cai, Ling Wang
Sep 17, 2022·q-bio.BM·PDF Virus infection is one of the major diseases that seriously threaten human health. To meet the growing demand for mining and sharing data resources related to antiviral drugs and to accelerate the design and discovery of new antiviral drugs, we presented an open-access antiviral drug resource and machine learning platform (VDDB), which, to the best of our knowledge, is the first comprehensive dedicated resource for experimentally verified potential drugs/molecules based on manually curated data. Currently, VDDB highlights 848 clinical vaccines, 199 clinical antibodies, as well as over 710,000 small molecules targeting 39 medically important viruses including SARS-CoV-2. Furthermore, VDDB stores approximately 3 million records of pharmacological data for these collected potential antiviral drugs/molecules, involving 314 cell infection-based phenotypic and 234 target-based genotypic assays. Based on these annotated pharmacological data, VDDB allows users to browse, search and download reliable information about these collects for various viruses of interest. In particular, VDDB also integrates 57 cell infection- and 117 target-based associated high-accuracy machine learning models to support various antivirals identification-related tasks, such as compound activity prediction, virtual screening, drug repositioning and target fishing. VDDB is freely accessible at http://vddb.idruglab.cn.
Ning Lu, Wenwen Yu, Xianbiao Qi, Yihao Chen, Ping Gong, Rong Xiao, Xiang Bai
Attention-based scene text recognizers have gained huge success, which leverages a more compact intermediate representation to learn 1d- or 2d- attention by a RNN-based encoder-decoder architecture. However, such methods suffer from attention-drift problem because high similarity among encoded features leads to attention confusion under the RNN-based local attention mechanism. Moreover, RNN-based methods have low efficiency due to poor parallelization. To overcome these problems, we propose the MASTER, a self-attention based scene text recognizer that (1) not only encodes the input-output attention but also learns self-attention which encodes feature-feature and target-target relationships inside the encoder and decoder and (2) learns a more powerful and robust intermediate representation to spatial distortion, and (3) owns a great training efficiency because of high training parallelization and a high-speed inference because of an efficient memory-cache mechanism. Extensive experiments on various benchmarks demonstrate the superior performance of our MASTER on both regular and irregular scene text. Pytorch code can be found at https://github.com/wenwenyu/MASTER-pytorch, and Tensorflow code can be found at https://github.com/jiangxiluning/MASTER-TF.
Yihao Chen, Simon A. Rogers, Suresh Narayanan, James L. Harden, Robert L. Leheny
An intrinsic feature of disordered and out-of-equilibrium materials, such as glasses, is the dependence of their properties on their history. An important example is rheological memory, in which disordered solids obtain properties based on their mechanical history. Here, we employ x-ray photon correlation spectroscopy (XPCS) with \textit{in situ} rheometry to characterize memory formation in a nanocolloidal soft glass due to cyclic shear. During a cycle, particles undergo irreversible displacements composed of a combination of shear-induced diffusion and strain fields. The magnitudes of these displacements decrease with each cycle before reaching a steady state where the microstructure has become trained to achieve enhanced reversibility. The displacements resemble a random walk in which the directions in each cycle are independent of those in preceding cycles. Accompanying the training is a steady decrease in the dissipation during each cycle towards a steady state value. Memory of this training is revealed by measurements in which the amplitude of the shear is changed after steady state is reached. The magnitude of the particle displacements as well as the dissipation and the change in residual stress vary non-monotonically with the new strain amplitude, having minima near the training amplitude, thereby revealing both microscopic and macroscopic signatures of memory.
Yihao Chen, Haochen Wu, Nan Jiang, Xiang Xia, Qing Gu, Yunqi Hao, Pengfei Cai, Yu Guan, Jialong Wang, Weilin Xie, Lei Fang, Sian Fang, Yan Song, Wu Guo, Lin Liu, Minqiang Xu
This paper describes the USTC-KXDIGIT system submitted to the ASVspoof5 Challenge for Track 1 (speech deepfake detection) and Track 2 (spoofing-robust automatic speaker verification, SASV). Track 1 showcases a diverse range of technical qualities from potential processing algorithms and includes both open and closed conditions. For these conditions, our system consists of a cascade of a frontend feature extractor and a back-end classifier. We focus on extensive embedding engineering and enhancing the generalization of the back-end classifier model. Specifically, the embedding engineering is based on hand-crafted features and speech representations from a self-supervised model, used for closed and open conditions, respectively. To detect spoof attacks under various adversarial conditions, we trained multiple systems on an augmented training set. Additionally, we used voice conversion technology to synthesize fake audio from genuine audio in the training set to enrich the synthesis algorithms. To leverage the complementary information learned by different model architectures, we employed activation ensemble and fused scores from different systems to obtain the final decision score for spoof detection. During the evaluation phase, the proposed methods achieved 0.3948 minDCF and 14.33% EER in the close condition, and 0.0750 minDCF and 2.59% EER in the open condition, demonstrating the robustness of our submitted systems under adversarial conditions. In Track 2, we continued using the CM system from Track 1 and fused it with a CNN-based ASV system. This approach achieved 0.2814 min-aDCF in the closed condition and 0.0756 min-aDCF in the open condition, showcasing superior performance in the SASV system.
Yanyan Liu, Minqiang Xu, Yihao Chen, Liang He, Lei Fang, Sian Fang, Lin Liu
In recent years, large language models (LLM) have made significant progress in the task of generation error correction (GER) for automatic speech recognition (ASR) post-processing. However, in complex noisy environments, they still face challenges such as poor adaptability and low information utilization, resulting in limited effectiveness of GER. To address these issues, this paper proposes a noise-robust multi-modal GER framework (Denoising GER). The framework enhances the model's adaptability to different noisy scenarios through a noise-adaptive acoustic encoder and optimizes the integration of multi-modal information via a heterogeneous feature compensation dynamic fusion (HFCDF) mechanism, improving the LLM's utilization of multi-modal information. Additionally, reinforcement learning (RL) training strategies are introduced to enhance the model's predictive capabilities. Experimental results demonstrate that Denoising GER significantly improves accuracy and robustness in noisy environments and exhibits good generalization abilities in unseen noise scenarios.
Yihao Chen, Qilei Yin, Qi Li, Zhuotao Liu, Ke Xu, Yi Xu, Mingwei Xu, Ziqian Liu, Jianping Wu
BGP is the de facto inter-domain routing protocol to ensure global connectivity of the Internet. However, various reasons, such as deliberate attacks or misconfigurations, could cause BGP routing anomalies. Traditional methods for BGP routing anomaly detection require significant manual investigation of routes by network operators. Although machine learning has been applied to automate the process, prior arts typically impose significant training overhead (such as large-scale data labeling and feature crafting), and only produce uninterpretable results. To address these limitations, this paper presents a routing anomaly detection system centering around a novel network representation learning model named BEAM. The core design of BEAM is to accurately learn the unique properties (defined as \emph{routing role}) of each Autonomous System (AS) in the Internet by incorporating BGP semantics. As a result, routing anomaly detection, given BEAM, is reduced to a matter of discovering unexpected routing role churns upon observing new route announcements. We implement a prototype of our routing anomaly detection system and extensively evaluate its performance. The experimental results, based on 18 real-world RouteViews datasets containing over 11 billion route announcement records, demonstrate that our system can detect all previously-confirmed routing anomalies, while only introducing at most five false alarms every 180 million route announcements. We also deploy our system at a large ISP to perform real-world detection for one month. During the course of deployment, our system detects 497 true anomalies in the wild with an average of only 1.65 false alarms per day.
Yihao Chen, Yue Yang
We develop a deep reinforcement learning framework for controlling a bio-inspired jellyfish swimmer to navigate complex fluid environments with obstacles. While existing methods often rely on kinematic and geometric states, a key challenge remains in achieving efficient obstacle avoidance under strong fluid-structure interactions and near-wall effects. We augment the agent's state representation within a soft actor-critic algorithm to include the real-time forces and torque experienced by the swimmer, providing direct mechanical feedback from vortex-wall interactions. This augmented state space enables the swimmer to perceive and interpret wall proximity and orientation through distinct hydrodynamic force signatures. We analyze how these force and torque patterns, generated by walls at different positions influence the swimmer's decision-making policy. Comparative experiments with a baseline model without force feedback demonstrate that the present one with force feedback achieves higher navigation efficiency in two-dimensional obstacle-avoidance tasks. The results show that explicit force feedback facilitates earlier, smoother maneuvers and enables the exploitation of wall effects for efficient turning behaviors. With an application to autonomous cave mapping, this work underscores the critical role of direct mechanical feedback in fluid environments and presents a physics-aware machine learning framework for advancing robust underwater exploration systems.
Zhelong Jiang, Gang Chen, Ruixiu Qiao, Pengcheng Feng, Yihao Chen, Junjia Su, Zhiyuan Zhao, Min Jin, Xu Chen, Zhigang Li, Huaxiang Lu
The ground state search of the Ising model can be used to solve many combinatorial optimization problems. Under the current computer architecture, an Ising ground state search algorithm suitable for hardware computing is necessary for solving practical problems. Inspired by the potential energy conversion of springs, we propose a point convolutional neural network algorithm for ground state search based on spring vibration model, called Spring-Ising Algorithm. Spring-Ising Algorithm regards the spin as a moving mass point connected to a spring and establish the equation of motion for all spins. Spring-Ising Algorithm can be mapped on the GPU or AI chips through the basic structure of the neural network for fast and efficient parallel computing. The algorithm has very productive results for solving the Ising model and has been test in the recognized test benchmark K2000. The algorithm introduces the concept of dynamic equilibrium to achieve a more detailed local search by dynamically adjusting the weight of the Ising model in the spring oscillation model. Finally, there is the simple hardware test speed evaluation. Spring-Ising Algorithm can provide the possibility to calculate the Ising model on a chip which focuses on accelerating neural network calculations.
Shufa Wei, Xiaolong Xu, Xianbiao Qi, Xi Yin, Jun Xia, Jingyi Ren, Peijun Tang, Yuxiang Zhong, Yihao Chen, Xiaoqin Ren, Yuxin Liang, Liankai Huang, Kai Xie, Weikang Gui, Wei Tan, Shuanglong Sun, Yongquan Hu, Qinxian Liu, Nanjin Li, Chihao Dai, Lihua Wang, Xiaohui Liu, Lei Zhang, Yutao Xie
Large Language Models (LLMs) have demonstrated exceptional capabilities across various natural language processing tasks. Yet, many of these advanced LLMs are tailored for broad, general-purpose applications. In this technical report, we introduce AcademicGPT, designed specifically to empower academic research. AcademicGPT is a continual training model derived from LLaMA2-70B. Our training corpus mainly consists of academic papers, thesis, content from some academic domain, high-quality Chinese data and others. While it may not be extensive in data scale, AcademicGPT marks our initial venture into a domain-specific GPT tailored for research area. We evaluate AcademicGPT on several established public benchmarks such as MMLU and CEval, as well as on some specialized academic benchmarks like PubMedQA, SCIEval, and our newly-created ComputerScienceQA, to demonstrate its ability from general knowledge ability, to Chinese ability, and to academic ability. Building upon AcademicGPT's foundation model, we also developed several applications catered to the academic area, including General Academic Question Answering, AI-assisted Paper Reading, Paper Review, and AI-assisted Title and Abstract Generation.
Yixiang Zhang, Xinhao Deng, Zhongyi Gu, Yihao Chen, Ke Xu, Qi Li, Jianping Wu
Large Language Models (LLMs) are increasingly deployed as agents that orchestrate tasks and integrate external tools to execute complex workflows. We demonstrate that these interactive behaviors leave distinctive fingerprints in encrypted traffic exchanged between users and LLM agents. By analyzing traffic patterns associated with agent workflows and tool invocations, adversaries can infer agent activities, distinguish specific agents, and even profile sensitive user attributes. To highlight this risk, we develop AgentPrint, which achieves an F1-score of 0.866 in agent identification and attains 73.9% and 69.1% top-3 accuracy in user attribute inference for simulated- and real-user settings, respectively. These results uncover an overlooked risk: the very interactivity that empowers LLM agents also exposes user privacy, underscoring the urgent need for technical countermeasures alongside regulatory and policy safeguards.
Wenshuo Wang, Yaomin Shen, Yingjie Tan, Yihao Chen
Spatiotemporal forecasting often relies on computationally intensive models to capture complex dynamics. Knowledge distillation (KD) has emerged as a key technique for creating lightweight student models, with recent advances like frequency-aware KD successfully preserving spectral properties (i.e., high-frequency details and low-frequency trends). However, these methods are fundamentally constrained by operating on pixel-level signals, leaving them blind to the rich semantic and causal context behind the visual patterns. To overcome this limitation, we introduce S^2-KD, a novel framework that unifies Semantic priors with Spectral representations for distillation. Our approach begins by training a privileged, multimodal teacher model. This teacher leverages textual narratives from a Large Multimodal Model (LMM) to reason about the underlying causes of events, while its architecture simultaneously decouples spectral components in its latent space. The core of our framework is a new distillation objective that transfers this unified semantic-spectral knowledge into a lightweight, vision-only student. Consequently, the student learns to make predictions that are not only spectrally accurate but also semantically coherent, without requiring any textual input or architectural overhead at inference. Extensive experiments on benchmarks like WeatherBench and TaxiBJ+ show that S^2-KD significantly boosts the performance of simple student models, enabling them to outperform state-of-the-art methods, particularly in long-horizon and complex non-stationary scenarios.
Hao Yue, Yihao Chen, Tianhang Liang, Xiangrui Li, Xin Kong, Zhelong Jiang, Zhigang Li, Gang Chen, Huaxiang Lu
Binary matrix-vector multiplication (BMVM) is a key operation in post-quantum cryptography schemes like the Classic McEliece cryptosystem. Conventional computing architectures incur significant energy efficiency loss due to data movement of large matrices when handling such tasks. Resistive memory (RRAM) non-volatile compute-in-memory (nvCIM) is an ideal technology for high energy-efficient BMVM processing but faces challenges, including signal margin degradation in high input-parallelism arrays due to device non-idealities and high hardware overhead from current readout and XOR operations. This work presents a RRAM nvCIM architecture featuring: 1) 1T1R cells with high-resistive-state compensation modules; and 2) pulsed current-sensing parity checkers. Based on the 180nm process and test results from RRAM devices, the computing accuracy and efficiency of the architecture are verified by simulation. The proposed architecture performs high-precision current accumulation with a maximum MAC value of 10 and achieves an energy efficiency of 1.51TOPS/W, offering approximately 1.62 times improvement compared to an advanced 28nm FPGA platform.
Zhilong Chen, Chengzong Zhao, Boyuan Chen, Dayi Lin, Yihao Chen, Arthur Leung, Gopi Krishnan Rajbahadur, Gustavo A. Oliva, Haoxiang Zhang, Aaditya Bhatia, Chong Chun Yong, Ahmed E. Hassan
Training software engineering (SWE) LLMs is bottlenecked by expensive infrastructure, inefficient evaluation pipelines, scarce training data, and costly quality control. We present RepoForge, an autonomous, end-to-end pipeline that generates, evaluates, and trains SWE agents at scale. Our key contributions include: (1) RepoForge-8B-Agent, achieving 17.4\% on SWE-Bench-Verified~\citep{swebench_verified2024}, establishing new state-of-the-art for $\leq$8B non-thinking LLMs; (2) 7,304 executable environments auto-generated from real GitHub commits with zero manual intervention; (3) 14$\times$ storage reduction (1.4GB $\rightarrow$ 102MB per instance) via intelligent dependency management and image pruning; (4) $>$70\% faster evaluation using a Ray-powered~\citep{ray2018} distributed RepoForge harness; (5) 19,000$\times$ cheaper labeling through our automated SPICE~\citep{spice2024} difficulty assessment technique. By unifying storage-efficient sandboxing, Ray-powered evaluation harness, automated data generation, SPICE-based labeling, and bubble-free RL scaffold, we demonstrate that even $\leq$8B models can reach new state-of-the-art performance on demanding benchmarks like SWE-Bench-Verified. Our approach addresses critical bottlenecks in SWE agent training: high storage costs of container-based evaluation, inefficient sequential reward pipelines, limited availability of high-quality training data, expensive manual labeling, and multi-turn RL pipeline bottlenecks.