Ling Feng, Christopher Pineda Monterola, Yanqing Hu
Interdependent networks are ubiquitous in our society, ranging from infrastructure to economics, and the study of their cascading behaviors using percolation theory has attracted much attention in the recent years. To analyze the percolation phenomena of these systems, different mathematical frameworks have been proposed including generating functions, eigenvalues among some others. These different frameworks approach the phase transition behaviors from different angles, and have been very successful in shaping the different quantities of interest including critical threshold, size of the giant component, order of phase transition and the dynamics of cascading. These methods also vary in their mathematical complexity in dealing with interdependent networks that have additional complexity in terms of the correlation among different layers of networks or links. In this work, we review a particular approach of simple self-consistent probability equations, and illustrate that it can greatly simplify the mathematical analysis for systems ranging from single layer network to various different interdependent networks. We give an overview on the detailed framework to study the nature of the critical phase transition, value of the critical threshold and size of the giant component for these different systems.
Wenbo Wei, Nicholas Chong Jia Le, Choy Heng Lai, Ling Feng
We observe a novel 'multiple-descent' phenomenon during the training process of LSTM, in which the test loss goes through long cycles of up and down trend multiple times after the model is overtrained. By carrying out asymptotic stability analysis of the models, we found that the cycles in test loss are closely associated with the phase transition process between order and chaos, and the local optimal epochs are consistently at the critical transition point between the two phases. More importantly, the global optimal epoch occurs at the first transition from order to chaos, where the 'width' of the 'edge of chaos' is the widest, allowing the best exploration of better weight configurations for learning.
Lin Zhang, Ling Feng, Kan Chen, Choy Heng Lai
The success of deep neural networks in real-world problems has prompted many attempts to explain their training dynamics and generalization performance, but more guiding principles for the training of neural networks are still needed. Motivated by the edge of chaos principle behind the optimal performance of neural networks, we study the role of various hyperparameters in modern neural network training algorithms in terms of the order-chaos phase diagram. In particular, we study a fully analytical feedforward neural network trained on the widely adopted Fashion-MNIST dataset, and study the dynamics associated with the hyperparameters in back-propagation during the training process. We find that for the basic algorithm of stochastic gradient descent with momentum, in the range around the commonly used hyperparameter values, clear scaling relations are present with respect to the training time during the ordered phase in the phase diagram, and the model's optimal generalization power at the edge of chaos is similar across different training parameter combinations. In the chaotic phase, the same scaling no longer exists. The scaling allows us to choose the training parameters to achieve faster training without sacrificing performance. In addition, we find that the commonly used model regularization method - weight decay - effectively pushes the model towards the ordered phase to achieve better performance. Leveraging on this fact and the scaling relations in the other hyperparameters, we derived a principled guideline for hyperparameter determination, such that the model can achieve optimal performance by saturating it at the edge of chaos. Demonstrated on this simple neural network model and training algorithm, our work improves the understanding of neural network training dynamics, and can potentially be extended to guiding principles of more complex model architectures and algorithms.
Jeremy Oon, Rakhi Manohar Mepparambath, Ling Feng
Despite the significant progress of deep learning models in multitude of applications, their adaption in planning and policy related areas remains challenging due to the black-box nature of these models. In this work, we develop a set of DeepLogit models that follow a novel sequentially constrained approach in estimating deep learning models for transport policy analysis. In the first step of the proposed approach, we estimate a convolutional neural network (CNN) model with only linear terms, which is equivalent of a linear-in-parameter multinomial logit model. We then estimate other deep learning models by constraining the parameters that need interpretability at the values obtained in the linear-in-parameter CNN model and including higher order terms or by introducing advanced deep learning architectures like Transformers. Our approach can retain the interpretability of the selected parameters, yet provides significantly improved model accuracy than the discrete choice model. We demonstrate our approach on a transit route choice example using real-world transit smart card data from Singapore. This study shows the potential for a unifying approach, where theory-based discrete choice model (DCM) and data-driven AI models can leverage each other's strengths in interpretability and predictive power. With the availability of larger datasets and more complex constructions, such approach can lead to more accurate models using discrete choice models while maintaining its applicability in planning and policy-related areas. Our code is available on https://github.com/jeremyoon/route-choice/ .
Feng Ling, Hanliang Guo, Eva Kanso
Cilia and flagella are highly conserved slender organelles that exhibit a variety of rhythmic beating patterns from non-planar cone-like motions to planar wave-like deformations. Although their internal structure, composed of a microtubule-based axoneme driven by dynein motors, is known, the mechanism responsible for these beating patterns remains elusive. Existing theories suggest that the dynein activity is dynamically regulated, via a geometric feedback from the cilium's mechanical deformation to the dynein force. An alternative, open-loop mechanism based on a 'flutter' instability was recently proven to lead to planar oscillations of elastic filaments under follower forces. Here, we show that an elastic filament in viscous fluid, clamped at one end and acted on by an external distribution of compressive axial forces, exhibits a Hopf bifurcation that leads to non-planar spinning of the buckled filament at a locked curvature. We also show the existence of a second bifurcation, at larger force values, that induces a transition from non-planar spinning to planar wave-like oscillations. We elucidate the nature of these instabilities using a combination of nonlinear numerical analysis, linear stability theory, and low-order bead-spring models. Our results show that away from the transition thresholds, these beating patterns are robust to perturbations in the distribution of axial forces and in the filament configuration. These findings support the theory that an open-loop, instability-driven mechanism could explain both the sustained oscillations and the wide variety of periodic beating patterns observed in cilia and flagella.
Nicholas Chong Jia Le, Ling Feng
The presence of $1/f$ noise, also known as pink noise, is a well-established phenomenon in biological neural networks, and is thought to play an important role in information processing in the brain. In this study, we find that such $1/f$ noise is also found in deep neural networks trained on natural language, resembling that of their biological counterparts. Specifically, we trained Long Short-Term Memory (LSTM) networks on the `IMDb' AI benchmark dataset, then measured the neuron activations. The detrended fluctuation analysis (DFA) on the time series of the different neurons demonstrate clear $1/f$ patterns, which is absent in the time series of the inputs to the LSTM. Interestingly, when the neural network is at overcapacity, having more than enough neurons to achieve the learning task, the activation patterns deviate from $1/f$ noise and shifts towards white noise. This is because many of the neurons are not effectively used, showing little fluctuations when fed with input data. We further examine the exponent values in the $1/f$ noise in ``internal" and ``external" activations in the LSTM cell, finding some resemblance in the variations of the exponents in fMRI signals of the human brain. Our findings further supports the hypothesis that $1/f$ noise is a signature of optimal learning. With deep learning models approaching or surpassing humans in certain tasks, and being more ``experimentable'' than their biological counterparts, our study suggests that they are good candidates to understand the fundamental origins of $1/f$ noise.
Ling Feng, Tianhao Wu, Xiangrong Ren, Zhi Jing, Xuliang Duan
This paper introduces a new knowledge distillation method, called education distillation (ED), which is inspired by the structured and progressive nature of human learning. ED mimics the educational stages of primary school, middle school, and university and designs teaching reference blocks. The student model is split into a main body and multiple teaching reference blocks to learn from teachers step by step. This promotes efficient knowledge distillation while maintaining the architecture of the student model. Experimental results on the CIFAR100, Tiny Imagenet, Caltech and Food-101 datasets show that the teaching reference blocks can effectively avoid the problem of forgetting. Compared with conventional single-teacher and multi-teacher knowledge distillation methods, ED significantly improves the accuracy and generalization ability of the student model. These findings highlight the potential of ED to improve model performance across different architectures and datasets, indicating its value in various deep learning scenarios. Code examples can be obtained at: https://github.com/Revolutioner1/ED.git.
Ling Feng, Lin Zhang, Choy Heng Lai
It has long been suggested that the biological brain operates at some critical point between two different phases, possibly order and chaos. Despite many indirect empirical evidence from the brain and analytical indication on simple neural networks, the foundation of this hypothesis on generic non-linear systems remains unclear. Here we develop a general theory that reveals the exact edge of chaos is the boundary between the chaotic phase and the (pseudo)periodic phase arising from Neimark-Sacker bifurcation. This edge is analytically determined by the asymptotic Jacobian norm values of the non-linear operator and influenced by the dimensionality of the system. The optimality at the edge of chaos is associated with the highest information transfer between input and output at this point similar to that of the logistic map. As empirical validations, our experiments on the various deep learning models in computer vision demonstrate the optimality of the models near the edge of chaos, and we observe that the state-of-art training algorithms push the models towards such edge as they become more accurate. We further establishes the theoretical understanding of deep learning model generalization through asymptotic stability.
Anup Kanale, Feng Ling, Hanliang Guo, Sebastian Fuerthauer, Eva Kanso
Ciliated tissues such as in the mammalian lungs, brains, and reproductive tracts, are specialized to pump fluid. They generate flows by the collective activity of hundreds of thousands of individual cilia that beat in a striking metachronal wave pattern. Despite progress in analyzing cilia coordination, a general theory that links coordination and fluid pumping in the limit of large arrays of cilia remains lacking. Here, we conduct in-silico experiments with thousands of hydrodynamically-interacting cilia, and we develop a continuum theory in the limit of infinitely-many independently beating cilia by combining tools from active matter and classical Stokes flow. We find, in both simulations and theory, that isotropic and synchronized ciliary states are unstable. Traveling waves emerge regardless of initial conditions, but the characteristics of the wave and net flows depend on cilia and tissue properties. That is, metachronal phase coordination is a stable global attractor in large ciliary carpets, even under finite perturbations to cilia and tissue properties. These results support the notion that functional specificity of ciliated tissues is interlaced with the tissue architecture and cilia beat kinematics and open up the prospect of establishing structure-to-function maps from cilium-level beat to tissue-level coordination and fluid pumping.
Ling Feng, SK Yang
In Rectified Flow, by obtaining the rectified flow several times, the mapping relationship between distributions can be distilled into a neural network, and the target distribution can be directly predicted by the straight lines of the flow. However, during the pairing process of the mapping relationship, a large amount of error accumulation will occur, resulting in a decrease in performance after multiple rectifications. In the field of flow models, knowledge distillation of multi - teacher diffusion models is also a problem worthy of discussion in accelerating sampling. I intend to combine multi - teacher knowledge distillation with Bezier curves to solve the problem of error accumulation. Currently, the related paper is being written by myself.
Ling Feng, Tianyu Xie, Wei Ma, Ruijie Fu, Yingxiao Zhang, Jun Li, Bei Zhou
The modernization of smart farming is a way to improve agricultural production efficiency, and improve the agricultural production environment. Although many large models have achieved high accuracy in the task of object recognition and segmentation, they cannot really be put into use in the farming industry due to their own poor interpretability and limitations in computational volume. In this paper, we built AnYue Shelduck Dateset, which contains a total of 1951 Shelduck datasets, and performed target detection and segmentation annotation with the help of professional annotators. Based on AnYue ShelduckDateset, this paper describes DuckProcessing, an efficient and powerful module for duck identification based on real shelduckfarms. First of all, using the YOLOv8 module designed to divide the mahjong between them, Precision reached 98.10%, Recall reached 96.53% and F1 score reached 0.95 on the test set. Again using the DuckSegmentation segmentation model, DuckSegmentation reached 96.43% mIoU. Finally, the excellent DuckSegmentation was used as the teacher model, and through knowledge distillation, Deeplabv3 r50 was used as the student model, and the final student model achieved 94.49% mIoU on the test set. The method provides a new way of thinking in practical sisal duck smart farming.
Nixie Sapphira Lesmana, Ling Feng, Kan Chen, Choy Heng Lai
Self-organized criticality (SOC) is widely proposed as a fundamental mechanism for collective behavior, yet its role in objective-driven, heterogeneous adaptive systems underpinning real complex systems remains less understood. We introduce EvoSK, a minimal evolutionary model in which agents perform memory dependent reinforcement learning on a rugged Sherrington-Kirkpatrick landscape while the population evolves through extremal replacement of the least fit agents. We demonstrate that this coupled dynamics drives the system to a critical state residing on the transition boundary between ergodic and non-ergodic phases. At this boundary, the system exhibits scale-free evolutionary avalanches with a mean-field exponent $τ\approx -1.5$, while simultaneously achieving collective rewards that surpass those of any manually finetuned, non-evolutionary regime. Our results provide a mechanistic link between the statistical physics of ergodicity breaking and the functional optimality of complex adaptive systems, suggesting that the edge of ergodicity breaking acts as a robust attractor for systems adapting on rugged, high-dimensional landscapes.
Jiarong Xie, Xiangrong Wang, Ling Feng, Jin-Hua Zhao, Yamir Moreno, Yanqing Hu
Percolation theory has been widely used to study phase transitions in complex networked systems. It has also successfully explained several macroscopic phenomena across different fields. Yet, the existent theoretical framework for percolation places the focus on the direct interactions among the system's components, while recent empirical observations have shown that indirect interactions are common in many systems like ecological and social networks, among others. Here, we propose a new percolation framework that accounts for indirect interactions, which allows to generalize the current theoretical body and understand the role of the underlying indirect influence of the components of a networked system on its macroscopic behavior. We report a rich phenomenology in which first-order, second-order or hybrid phase transitions are possible depending on whether the links of the substrate network are directed, undirected or a mix, respectively. We also present an analytical framework to characterize the proposed induced percolation, paving the way to further understand network dynamics with indirect interactions.
Lei Cao, Huijun Zhang, Ling Feng, Zihan Wei, Xin Wang, Ningyun Li, Xiaohao He
Despite detection of suicidal ideation on social media has made great progress in recent years, people's implicitly and anti-real contrarily expressed posts still remain as an obstacle, constraining the detectors to acquire higher satisfactory performance. Enlightened by the hidden "tree holes" phenomenon on microblog, where people at suicide risk tend to disclose their inner real feelings and thoughts to the microblog space whose authors have committed suicide, we explore the use of tree holes to enhance microblog-based suicide risk detection from the following two perspectives. (1) We build suicide-oriented word embeddings based on tree hole contents to strength the sensibility of suicide-related lexicons and context based on tree hole contents. (2) A two-layered attention mechanism is deployed to grasp intermittently changing points from individual's open blog streams, revealing one's inner emotional world more or less. Our experimental results show that with suicide-oriented word embeddings and attention, microblog-based suicide risk detection can achieve over 91\% accuracy. A large-scale well-labelled suicide data set is also reported in the paper.
Sirui Hu, Ling Feng, Xiaohan Yang, Yongchao Chen
Federated learning is widely used to perform decentralized training of a global model on multiple devices while preserving the data privacy of each device. However, it suffers from heterogeneous local data on each training device which increases the difficulty to reach the same level of accuracy as the centralized training. Supervised Contrastive Learning which outperform cross-entropy tries to minimizes the difference between feature space of points belongs to the same class and pushes away points from different classes. We propose Supervised Contrastive Federated Learning in which devices can share the learned class-wise feature spaces with each other and add the supervised-contrastive learning loss as a regularization term to foster the feature space learning. The loss tries to minimize the cosine similarity distance between the feature map and the averaged feature map from another device in the same class and maximizes the distance between the feature map and that in a different class. This new regularization term when added on top of the moon regularization term is found to outperform the other state-of-the-art regularization terms in solving the heterogeneous data distribution problem.
Shoubin Kong, Qiaozhu Mei, Ling Feng, Zhe Zhao, Fei Ye
Hundreds of thousands of hashtags are generated every day on Twitter. Only a few become bursting topics. Among the few, only some can be predicted in real-time. In this paper, we take the initiative to conduct a systematic study of a series of challenging real-time prediction problems of bursting hashtags. Which hashtags will become bursting? If they do, when will the burst happen? How long will they remain active? And how soon will they fade away? Based on empirical analysis of real data from Twitter, we provide insightful statistics to answer these questions, which span over the entire lifecycles of hashtags.
Xiaolu Lu, Dongxu Li, Xiang Li, Ling Feng
In this paper, we propose a 2D based partition method for solving the problem of Ranking under Team Context(RTC) on datasets without a priori. We first map the data into 2D space using its minimum and maximum value among all dimensions. Then we construct window queries with consideration of current team context. Besides, during the query mapping procedure, we can pre-prune some tuples which are not top ranked ones. This pre-classified step will defer processing those tuples and can save cost while providing solutions for the problem. Experiments show that our algorithm performs well especially on large datasets with correctness.
Jiajie Zhang, Zhongni Hou, Xin Lv, Shulin Cao, Zhenyu Hou, Yilin Niu, Lei Hou, Yuxiao Dong, Ling Feng, Juanzi Li
Though significant advancements have been achieved in developing long-context large language models (LLMs), the compromised quality of LLM-synthesized data for supervised fine-tuning (SFT) often affects the long-context performance of SFT models and leads to inherent limitations. In principle, reinforcement learning (RL) with appropriate reward signals can further enhance models' capacities. However, how to obtain reliable rewards in long-context scenarios remains unexplored. To this end, we propose LongReward, a novel method that utilizes an off-the-shelf LLM to provide rewards for long-context model responses from four human-valued dimensions: helpfulness, logicality, faithfulness, and completeness, each with a carefully designed assessment pipeline. By combining LongReward and offline RL algorithm DPO, we are able to effectively improve long-context SFT models. Our experiments indicate that LongReward not only significantly improves models' long-context performance but also enhances their ability to follow short instructions. We also find that long-context DPO with LongReward and conventional short-context DPO can be used together without hurting either one's performance.
Jun Li, Xiangmeng Wang, Haoyang Li, Yifei Yan, Hong Va Leong, Ling Feng, Nancy Xiaonan Yu, Qing Li
Suicide is a critical global health issue that requires urgent attention. Even though prior work has revealed valuable insights into detecting current suicide risk on social media, little attention has been paid to developing models that can predict subsequent suicide risk over time, limiting their ability to capture rapid fluctuations in individuals' mental state transitions. In addition, existing work ignores protective factors that play a crucial role in suicide risk prediction, focusing predominantly on risk factors alone. Protective factors such as social support and coping strategies can mitigate suicide risk by moderating the impact of risk factors. Therefore, this study proposes a novel framework for predicting subsequent suicide risk by jointly learning the dynamic influence of both risk factors and protective factors on users' suicide risk transitions. We propose a novel Protective Factor-Aware Dataset, which is built from 12 years of Reddit posts along with comprehensive annotations of suicide risk and both risk and protective factors. We also introduce a Dynamic Factors Influence Learning approach that captures the varying impact of risk and protective factors on suicide risk transitions, recognizing that suicide risk fluctuates over time according to established psychological theories. Our thorough experiments demonstrate that the proposed model significantly outperforms state-of-the-art models and large language models across three datasets. In addition, the proposed Dynamic Factors Influence Learning provides interpretable weights, helping clinicians better understand suicidal patterns and enabling more targeted intervention strategies.
Yuan Zhang, Jian Cao, Ling Zhang, Xiangcheng Liu, Zhiyi Wang, Feng Ling, Weiqian Chen
Learning subtle representation about object parts plays a vital role in fine-grained visual recognition (FGVR) field. The vision transformer (ViT) achieves promising results on computer vision due to its attention mechanism. Nonetheless, with the fixed size of patches in ViT, the class token in deep layer focuses on the global receptive field and cannot generate multi-granularity features for FGVR. To capture region attention without box annotations and compensate for ViT shortcomings in FGVR, we propose a novel method named Adaptive attention multi-scale Fusion Transformer (AFTrans). The Selective Attention Collection Module (SACM) in our approach leverages attention weights in ViT and filters them adaptively to correspond with the relative importance of input patches. The multiple scales (global and local) pipeline is supervised by our weights sharing encoder and can be easily trained end-to-end. Comprehensive experiments demonstrate that AFTrans can achieve SOTA performance on three published fine-grained benchmarks: CUB-200-2011, Stanford Dogs and iNat2017.