Tong Han, Yue Song, David J. Hill
Network connectedness is indispensable for the normal operation of transmission networks. However, there still remains a lack of efficient constraints that can be directly added to the problem formulation of optimal transmission switching (OTS) to ensure network connectedness strictly. To fill this gap, this paper proposes a set of linear connectedness constraints by leveraging the equivalence between network connectedness and feasibility of the vertex potential equation of an electrical flow network. The proposed constraints are compatible with any existing OTS models to ensure topology connectedness. Furthermore, we develop a reduction version for the proposed connectedness constraints, seeking for improvement of computational efficiency. Finally, numerical studies with a DC OTS model show the deficiency of OTS formulations without full consideration of network connectedness and demonstrate the effectiveness of the proposed constraints. The computational burden caused by the connectedness constraints is moderate and can be remarkably relieved by using the reduced version.
Yue Song, David J. Hill, Tao Liu, Tianlun Chen
Flexible transmission line impedances on one hand are a promising control resource for facilitating grid flexibility, but on the other hand add much complexity to the concerned optimization problems. This paper develops a convexification method for the AC optimal power flow with flexible line impedances. First, it is discovered that a flexible-impedance line is equivalent to a constant-impedance line linking a pair of transformers with correlated and continuously adjustable tap ratios. Then, with this circuit equivalent, the original optimization problem is reformulated into a semi-definite program under the existing convex relaxation framework, which improves the solution tractability and optimality in an easy-to-implement manner. The proposed method is verified by numerical tests on the IEEE 118-bus system.
Yue Song, Hao Tang, Nicu Sebe, Wei Wang
Salient object detection has been long studied to identify the most visually attractive objects in images/videos. Recently, a growing amount of approaches have been proposed all of which rely on the contour/edge information to improve detection performance. The edge labels are either put into the loss directly or used as extra supervision. The edge and body can also be learned separately and then fused afterward. Both methods either lead to high prediction errors near the edge or cannot be trained in an end-to-end manner. Another problem is that existing methods may fail to detect objects of various sizes due to the lack of efficient and effective feature fusion mechanisms. In this work, we propose to decompose the saliency detection task into two cascaded sub-tasks, \emph{i.e.}, detail modeling and body filling. Specifically, the detail modeling focuses on capturing the object edges by supervision of explicitly decomposed detail label that consists of the pixels that are nested on the edge and near the edge. Then the body filling learns the body part which will be filled into the detail map to generate more accurate saliency map. To effectively fuse the features and handle objects at different scales, we have also proposed two novel multi-scale detail attention and body attention blocks for precise detail and body modeling. Experimental results show that our method achieves state-of-the-art performances on six public datasets.
Yue Song, Nicu Sebe, Wei Wang
Computing the matrix square root or its inverse in a differentiable manner is important in a variety of computer vision tasks. Previous methods either adopt the Singular Value Decomposition (SVD) to explicitly factorize the matrix or use the Newton-Schulz iteration (NS iteration) to derive the approximate solution. However, both methods are not computationally efficient enough in either the forward pass or in the backward pass. In this paper, we propose two more efficient variants to compute the differentiable matrix square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Padé Approximants (MPA). The backward gradient is computed by iteratively solving the continuous-time Lyapunov equation using the matrix sign function. Both methods yield considerable speed-up compared with the SVD or the Newton-Schulz iteration. Experimental results on the de-correlated batch normalization and second-order vision transformer demonstrate that our methods can also achieve competitive and even slightly better performances. The code is available at \href{https://github.com/KingJamesSong/FastDifferentiableMatSqrt}{https://github.com/KingJamesSong/FastDifferentiableMatSqrt}.
Yue Song, Nicu Sebe, Wei Wang
The task of out-of-distribution (OOD) detection is crucial for deploying machine learning models in real-world settings. In this paper, we observe that the singular value distributions of the in-distribution (ID) and OOD features are quite different: the OOD feature matrix tends to have a larger dominant singular value than the ID feature, and the class predictions of OOD samples are largely determined by it. This observation motivates us to propose \texttt{RankFeat}, a simple yet effective \texttt{post hoc} approach for OOD detection by removing the rank-1 matrix composed of the largest singular value and the associated singular vectors from the high-level feature (\emph{i.e.,} $\mathbf{X}{-} \mathbf{s}_{1}\mathbf{u}_{1}\mathbf{v}_{1}^{T}$). \texttt{RankFeat} achieves the \emph{state-of-the-art} performance and reduces the average false positive rate (FPR95) by 17.90\% compared with the previous best method. Extensive ablation studies and comprehensive theoretical analyses are presented to support the empirical results.
Yue Song, T. Anderson Keller, Nicu Sebe, Max Welling
Despite the significant recent progress in deep generative models, the underlying structure of their latent spaces is still poorly understood, thereby making the task of performing semantically meaningful latent traversals an open research challenge. Most prior work has aimed to solve this challenge by modeling latent structures linearly, and finding corresponding linear directions which result in `disentangled' generations. In this work, we instead propose to model latent structures with a learned dynamic potential landscape, thereby performing latent traversals as the flow of samples down the landscape's gradient. Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations, thereby allowing them to flexibly vary over both space and time. To achieve disentanglement, multiple potentials are learned simultaneously, and are constrained by a classifier to be distinct and semantically self-consistent. Experimentally, we demonstrate that our method achieves both more qualitatively and quantitatively disentangled trajectories than state-of-the-art baselines. Further, we demonstrate that our method can be integrated as a regularization term during training, thereby acting as an inductive bias towards the learning of structured representations, ultimately improving model likelihood on similarly structured data.
Yue Song, Jichao Zhang, Nicu Sebe, Wei Wang
Generative Adversarial Networks (GANs), especially the recent style-based generators (StyleGANs), have versatile semantics in the structured latent space. Latent semantics discovery methods emerge to move around the latent code such that only one factor varies during the traversal. Recently, an unsupervised method proposed a promising direction to directly use the eigenvectors of the projection matrix that maps latent codes to features as the interpretable directions. However, one overlooked fact is that the projection matrix is non-orthogonal and the number of eigenvectors is too large. The non-orthogonality would entangle semantic attributes in the top few eigenvectors, and the large dimensionality might result in meaningless variations among the directions even if the matrix is orthogonal. To avoid these issues, we propose Householder Projector, a flexible and general low-rank orthogonal matrix representation based on Householder transformations, to parameterize the projection matrix. The orthogonality guarantees that the eigenvectors correspond to disentangled interpretable semantics, while the low-rank property encourages that each identified direction has meaningful variations. We integrate our projector into pre-trained StyleGAN2/StyleGAN3 and evaluate the models on several benchmarks. Within only $1\%$ of the original training steps for fine-tuning, our projector helps StyleGANs to discover more disentangled and precise semantic attributes without sacrificing image fidelity.
Yue Song, Wei Wang, Nicu Sebe
The task of out-of-distribution (OOD) detection is crucial for deploying machine learning models in real-world settings. In this paper, we observe that the singular value distributions of the in-distribution (ID) and OOD features are quite different: the OOD feature matrix tends to have a larger dominant singular value than the ID feature, and the class predictions of OOD samples are largely determined by it. This observation motivates us to propose \texttt{RankFeat}, a simple yet effective \emph{post hoc} approach for OOD detection by removing the rank-1 matrix composed of the largest singular value and the associated singular vectors from the high-level feature. \texttt{RankFeat} achieves \emph{state-of-the-art} performance and reduces the average false positive rate (FPR95) by 17.90\% compared with the previous best method. The success of \texttt{RankFeat} motivates us to investigate whether a similar phenomenon would exist in the parameter matrices of neural networks. We thus propose \texttt{RankWeight} which removes the rank-1 weight from the parameter matrices of a single deep layer. Our \texttt{RankWeight}is also \emph{post hoc} and only requires computing the rank-1 matrix once. As a standalone approach, \texttt{RankWeight} has very competitive performance against other methods across various backbones. Moreover, \texttt{RankWeight} enjoys flexible compatibility with a wide range of OOD detection methods. The combination of \texttt{RankWeight} and \texttt{RankFeat} refreshes the new \emph{state-of-the-art} performance, achieving the FPR95 as low as 16.13\% on the ImageNet-1k benchmark. Extensive ablation studies and comprehensive theoretical analyses are presented to support the empirical results. Code is publicly available via \url{https://github.com/KingJamesSong/RankFeat}.
Zhao Song, Song Yue, Jiahao Zhang
The rapid growth of AI conference submissions has created an overwhelming reviewing burden. To alleviate this, recent venues such as ICLR 2026 introduced a reviewer nomination policy: each submission must nominate one of its authors as a reviewer, and any paper nominating an irresponsible reviewer is desk-rejected. We study this new policy from the perspective of author welfare. Assuming each author carries a probability of being irresponsible, we ask: how can authors (or automated systems) nominate reviewers to minimize the risk of desk rejections? We formalize and analyze three variants of the desk-rejection risk minimization problem. The basic problem, which minimizes expected desk rejections, is solved optimally by a simple greedy algorithm. We then introduce hard and soft nomination limit variants that constrain how many papers may nominate the same author, preventing widespread failures if one author is irresponsible. These formulations connect to classical optimization frameworks, including minimum-cost flow and linear programming, allowing us to design efficient, principled nomination strategies. Our results provide the first theoretical study for reviewer nomination policies, offering both conceptual insights and practical directions for authors to wisely choose which co-author should serve as the nominated reciprocal reviewer.
Yue Song, T. Anderson Keller, Sevan Brodjian, Takeru Miyato, Yisong Yue, Pietro Perona, Max Welling
Orientation-rich images, such as fingerprints and textures, often exhibit coherent angular directional patterns that are challenging to model using standard generative approaches based on isotropic Euclidean diffusion. Motivated by the role of phase synchronization in biological systems, we propose a score-based generative model built on periodic domains by leveraging stochastic Kuramoto dynamics in the diffusion process. In neural and physical systems, Kuramoto models capture synchronization phenomena across coupled oscillators -- a behavior that we re-purpose here as an inductive bias for structured image generation. In our framework, the forward process performs \textit{synchronization} among phase variables through globally or locally coupled oscillator interactions and attraction to a global reference phase, gradually collapsing the data into a low-entropy von Mises distribution. The reverse process then performs \textit{desynchronization}, generating diverse patterns by reversing the dynamics with a learned score function. This approach enables structured destruction during forward diffusion and a hierarchical generation process that progressively refines global coherence into fine-scale details. We implement wrapped Gaussian transition kernels and periodicity-aware networks to account for the circular geometry. Our method achieves competitive results on general image benchmarks and significantly improves generation quality on orientation-dense datasets like fingerprints and textures. Ultimately, this work demonstrates the promise of biologically inspired synchronization dynamics as structured priors in generative modeling.
Xinran Zhang, David J. Hill, Chao Lu, Yue Song
Load modeling is an important issue in modeling a power system. The approach of ambient signals-based load modeling (ASLM) was recently proposed to better track the time-varying changes of load models. To improve computation efficiency and model structure complexity, a hierarchical framework for ASLM is proposed in this paper. Through this framework, the hidden quasi-convexity of load modeling problem is explored for the first time, and more complicated static load model structures can be applied. In the upper stage, the identification of dynamic load parameters is regarded as an optimization problem. In the lower stage, the optimal static load parameters are obtained through linear regression for a given group of dynamic load parameters. Afterwards, the regression residuals are regarded as the objective function (OF) of the upper stage optimization problem. The proposed method is validated by the case study results in Guangdong Power Grid. The results have shown that the OF is mostly quasi-convex after the transformation of induction motor model, which provides the basis for the application of gradient-based optimization algorithm. The case study results also validate that the proposed approach has better computation efficiency and model structure complexity compared with the previous ASLM approach.
Tianlun Chen, David J. Hill, Yue Song, Albert Y. S. Lam
High penetration of renewable generation poses great challenge to power system operation due to its uncertain nature. In droop-controlled microgrids, the voltage volatility induced by renewable uncertainties is aggravated by the high droop gains. This paper proposes a chance-constrained optimal power flow (CC-OPF) problem with power flow routers (PFRs) to better regulate the voltage profile in microgrids. PFR refer to a general type of network-side controller that brings more flexibility to the power network. Comparing with the normal CC-OPF that relies on power injection flexibility only, the proposed model introduces a new dimension of control from power network to enhance system performance under renewable uncertainties. Since the inclusion of PFRs complicates the problem and makes common solvers no longer apply directly, we design an iterative solution algorithm. For the subproblem in each iteration, chance constraints are transformed into equivalent deterministic ones via sensitivity analysis, so that the subproblem can be efficiently solved by the convex relaxation method. The proposed method is verified on the modified IEEE 33-bus system and the results show that PFRs make a significant contribution to mitigating the voltage volatility and make the system operate in a more economic and secure way.
Yue Song, T. Anderson Keller, Nicu Sebe, Max Welling
A prominent goal of representation learning research is to achieve representations which are factorized in a useful manner with respect to the ground truth factors of variation. The fields of disentangled and equivariant representation learning have approached this ideal from a range of complimentary perspectives; however, to date, most approaches have proven to either be ill-specified or insufficiently flexible to effectively separate all realistic factors of interest in a learned latent space. In this work, we propose an alternative viewpoint on such structured representation learning which we call Flow Factorized Representation Learning, and demonstrate it to learn both more efficient and more usefully structured representations than existing frameworks. Specifically, we introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations. Each latent flow is generated by the gradient field of a learned potential following dynamic optimal transport. Our novel setup brings new understandings to both \textit{disentanglement} and \textit{equivariance}. We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models. Furthermore, we demonstrate that the transformations learned by our model are flexibly composable and can also extrapolate to new data, implying a degree of robustness and generalizability approaching the ultimate goal of usefully factorized representation learning.
Jun Wang, Yue Song, David John Hill, Yunhe Hou
In this letter, we analytically investigate the sensitivity of stability index to its dependent variables in general power systems. Firstly, we give a small-signal model, the stability index is defined as the solution to a semidefinite program (SDP) based on the related Lyapunov equation. In case of stability, the stability index also characterizes the convergence rate of the system after disturbances. Then, by leveraging the duality of SDP, we deduce an analytical formula of the stability sensitivity to any entries of the system Jacobian matrix in terms of the SDP primal and dual variables. Unlike the traditional numerical perturbation method, the proposed sensitivity evaluation method is more accurate with a much lower computational burden. This letter applies a modified microgrid for comparative case studies. The results reveal the significant improvements on the accuracy and computational efficiency of stability sensitivity evaluation.
Yue Song, T. Anderson Keller, Yisong Yue, Pietro Perona, Max Welling
Neural populations exhibit latent dynamical structures that drive time-evolving spiking activities, motivating the search for models that capture both intrinsic network dynamics and external unobserved influences. In this work, we introduce LangevinFlow, a sequential Variational Auto-Encoder where the time evolution of latent variables is governed by the underdamped Langevin equation. Our approach incorporates physical priors -- such as inertia, damping, a learned potential function, and stochastic forces -- to represent both autonomous and non-autonomous processes in neural systems. Crucially, the potential function is parameterized as a network of locally coupled oscillators, biasing the model toward oscillatory and flow-like behaviors observed in biological neural populations. Our model features a recurrent encoder, a one-layer Transformer decoder, and Langevin dynamics in the latent space. Empirically, our method outperforms state-of-the-art baselines on synthetic neural populations generated by a Lorenz attractor, closely matching ground-truth firing rates. On the Neural Latents Benchmark (NLB), the model achieves superior held-out neuron likelihoods (bits per spike) and forward prediction accuracy across four challenging datasets. It also matches or surpasses alternative methods in decoding behavioral metrics such as hand velocity. Overall, this work introduces a flexible, physics-inspired, high-performing framework for modeling complex neural population dynamics and their unobserved influences.
Yue Song, Nicu Sebe, Wei Wang
EigenDecomposition (ED) is at the heart of many computer vision algorithms and applications. One crucial bottleneck limiting its usage is the expensive computation cost, particularly for a mini-batch of matrices in the deep neural networks. In this paper, we propose a QR-based ED method dedicated to the application scenarios of computer vision. Our proposed method performs the ED entirely by batched matrix/vector multiplication, which processes all the matrices simultaneously and thus fully utilizes the power of GPUs. Our technique is based on the explicit QR iterations by Givens rotation with double Wilkinson shifts. With several acceleration techniques, the time complexity of QR iterations is reduced from $O{(}n^5{)}$ to $O{(}n^3{)}$. The numerical test shows that for small and medium batched matrices (\emph{e.g.,} $dim{<}32$) our method can be much faster than the Pytorch SVD function. Experimental results on visual recognition and image generation demonstrate that our methods also achieve competitive performances.
Yue Song, David J. Hill, Tao Liu, Xinran Zhang
The impasse surface is an important concept in the differential-algebraic equation (DAE) model of power systems, which is associated with short-term voltage collapse. This paper establishes a necessary condition for a system trajectory hitting the impasse surface. The condition is in terms of admittance matrices regarding the power network, generators and loads, which specifies the pattern of interaction between those system components that can induce voltage collapse. It applies to generic DAE models featuring high-order synchronous generators, static loads, induction motor loads and lossy power networks. We also identify a class of static load parameters that prevent power systems from hitting the impasse surface; this proves a conjecture made by Hiskens that has been unsolved for decades. Moreover, the obtained results lead to an early indicator of voltage collapse and a novel viewpoint that inductive compensation to the power network has a positive effect on preventing short-term voltage collapse, which are verified via numerical simulations.
Yue Song, David J. Hill, Tao Liu
This paper extends the definitions of effective resistance and effective conductance to characterize the overall relation (positive coupling or antagonism) between any two disjoint sets of nodes in a signed graph. It generalizes the traditional definitions that only apply to a pair of nodes. The monotonicity and convexity properties are preserved by the extended definitions. The extended definitions provide new insights into graph Laplacian definiteness and power network stability. It is proved that the Laplacian matrix of a signed graph is positive semi-definite with only one zero eigenvalue if and only if the effective conductances between some specific pairs of node sets are positive. Also the number of Laplacian negative eigenvalues is upper bounded by the number of negative weighted edges. In addition, new conditions for the small-disturbance angle stability, hyperbolicity and type of power system equilibria are established, which intuitively interpret angle instability as the electrical antagonism between certain two sets of nodes in the defined active power flow graph. Moreover, a novel optimal power flow (OPF) model with effective conductance constraints is formulated, which significantly enhances power system transient stability. By the properties of extended effective conductance, the proposed OPF model admits a convex relaxation representation that achieves global optimality.
Yue Song, Nicu Sebe, Wei Wang
Global covariance pooling (GCP) aims at exploiting the second-order statistics of the convolutional feature. Its effectiveness has been demonstrated in boosting the classification performance of Convolutional Neural Networks (CNNs). Singular Value Decomposition (SVD) is used in GCP to compute the matrix square root. However, the approximate matrix square root calculated using Newton-Schulz iteration \cite{li2018towards} outperforms the accurate one computed via SVD \cite{li2017second}. We empirically analyze the reason behind the performance gap from the perspectives of data precision and gradient smoothness. Various remedies for computing smooth SVD gradients are investigated. Based on our observation and analyses, a hybrid training protocol is proposed for SVD-based GCP meta-layers such that competitive performances can be achieved against Newton-Schulz iteration. Moreover, we propose a new GCP meta-layer that uses SVD in the forward pass, and Padé Approximants in the backward propagation to compute the gradients. The proposed meta-layer has been integrated into different CNN models and achieves state-of-the-art performances on both large-scale and fine-grained datasets.
Yue Song, Thomas Anderson Keller, Yisong Yue, Pietro Perona, Max Welling
There is a vast literature on representation learning based on principles such as coding efficiency, statistical independence, causality, controllability, or symmetry. In this paper we propose to learn representations from sequence data by factorizing the transformations of the latent variables into sparse components. Input data are first encoded as distributions of latent activations and subsequently transformed using a probability flow model, before being decoded to predict a future input state. The flow model is decomposed into a number of rotational (divergence-free) vector fields and a number of potential flow (curl-free) fields. Our sparsity prior encourages only a small number of these fields to be active at any instant and infers the speed with which the probability flows along these fields. Training this model is completely unsupervised using a standard variational objective and results in a new form of disentangled representations where the input is not only represented by a combination of independent factors, but also by a combination of independent transformation primitives given by the learned flow fields. When viewing the transformations as symmetries one may interpret this as learning approximately equivariant representations. Empirically we demonstrate that this model achieves state of the art in terms of both data likelihood and unsupervised approximate equivariance errors on datasets composed of sequence transformations.