Bingcong Li, Meng Ma, Georgios B. Giannakis
The main theme of this work is a unifying algorithm, \textbf{L}oop\textbf{L}ess \textbf{S}ARAH (L2S) for problems formulated as summation of $n$ individual loss functions. L2S broadens a recently developed variance reduction method known as SARAH. To find an $ε$-accurate solution, L2S enjoys a complexity of ${\cal O}\big( (n+κ) \ln (1/ε)\big)$ for strongly convex problems. For convex problems, when adopting an $n$-dependent step size, the complexity of L2S is ${\cal O}(n+ \sqrt{n}/ε)$; while for more frequently adopted $n$-independent step size, the complexity is ${\cal O}(n+ n/ε)$. Distinct from SARAH, our theoretical findings support an $n$-independent step size in convex problems without extra assumptions. For nonconvex problems, the complexity of L2S is ${\cal O}(n+ \sqrt{n}/ε)$. Our numerical tests on neural networks suggest that L2S can have better generalization properties than SARAH. Along with L2S, our side results include the linear convergence of the last iteration for SARAH in strongly convex problems.
Qin Lu, Georgios B. Giannakis
Value function approximation is a crucial module for policy evaluation in reinforcement learning when the state space is large or continuous. The present paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning, where a Gaussian process (GP) prior is presumed on the sought value function, and instantaneous rewards are probabilistically generated based on value function evaluations at two consecutive states. Capitalizing on a random feature-based approximant of the GP prior, an online scalable (OS) approach, termed {OS-GPTD}, is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs. To benchmark the performance of OS-GPTD even in an adversarial setting, where the modeling assumptions are violated, complementary worst-case analyses are performed by upper-bounding the cumulative Bellman error as well as the long-term reward prediction error, relative to their counterparts from a fixed value function estimator with the entire state-reward trajectory in hindsight. Moreover, to alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme, termed OS-EGPTD, that can jointly infer the value function, and select interactively the EGP kernel on-the-fly. Finally, performances of the novel OS-(E)GPTD schemes are evaluated on two benchmark problems.
Manish K. Singh, Vassilis Kekatos, Georgios B. Giannakis
To shift the computational burden from real-time to offline in delay-critical power systems applications, recent works entertain the idea of using a deep neural network (DNN) to predict the solutions of the AC optimal power flow (AC-OPF) once presented load demands. As network topologies may change, training this DNN in a sample-efficient manner becomes a necessity. To improve data efficiency, this work utilizes the fact OPF data are not simple training labels, but constitute the solutions of a parametric optimization problem. We thus advocate training a sensitivity-informed DNN (SI-DNN) to match not only the OPF optimizers, but also their partial derivatives with respect to the OPF parameters (loads). It is shown that the required Jacobian matrices do exist under mild conditions, and can be readily computed from the related primal/dual solutions. The proposed SI-DNN is compatible with a broad range of OPF solvers, including a non-convex quadratically constrained quadratic program (QCQP), its semidefinite program (SDP) relaxation, and MATPOWER; while SI-DNN can be seamlessly integrated in other learning-to-OPF schemes. Numerical tests on three benchmark power systems corroborate the advanced generalization and constraint satisfaction capabilities for the OPF solutions predicted by an SI-DNN over a conventionally trained DNN, especially in low-data setups.
Liang Zhang, Vassilis Kekatos, Georgios B. Giannakis
Although electric vehicles are considered a viable solution to reduce greenhouse gas emissions, their uncoordinated charging could have adverse effects on power system operation. Nevertheless, the task of optimal electric vehicle charging scales unfavorably with the fleet size and the number of control periods, especially when distribution grid limitations are enforced. To this end, vehicle charging is first tackled using the recently revived Frank-Wolfe method. The novel decentralized charging protocol has minimal computational requirements from vehicle controllers, enjoys provable acceleration over existing alternatives, enhances the security of the pricing mechanism against data attacks, and protects user privacy. To comply with voltage limits, a network-constrained EV charging problem is subsequently formulated. Leveraging a linearized model for unbalanced distribution grids, the goal is to minimize the power supply cost while respecting critical voltage regulation and substation capacity limitations. Optimizing variables across grid nodes is accomplished by exchanging information only between neighboring buses via the alternating direction method of multipliers. Numerical tests corroborate the optimality and efficiency of the novel schemes.
Vassilis Kekatos, Gang Wang, Hao Zhu, Georgios B. Giannakis
This chapter aspires to glean some of the recent advances in power system state estimation (PSSE), though our collection is not exhaustive by any means. The Cram{é}r-Rao bound, a lower bound on the (co)variance of any unbiased estimator, is first derived for the PSSE setup. After reviewing the classical Gauss-Newton iterations, contemporary PSSE solvers leveraging relaxations to convex programs and successive convex approximations are explored. A disciplined paradigm for distributed and decentralized schemes is subsequently exemplified under linear(ized) and exact grid models. Novel bad data processing models and fresh perspectives linking critical measurements to cyber-attacks on the state estimator are presented. Finally, spurred by advances in online convex optimization, model-free and model-based state trackers are reviewed.
Vassilis N. Ioannidis, Meng Ma, Athanasios N. Nikolakopoulos, Georgios B. Giannakis, Daniel Romero
The study of networks has witnessed an explosive growth over the past decades with several ground-breaking methods introduced. A particularly interesting -- and prevalent in several fields of study -- problem is that of inferring a function defined over the nodes of a network. This work presents a versatile kernel-based framework for tackling this inference problem that naturally subsumes and generalizes the reconstruction approaches put forth recently by the signal processing on graphs community. Both the static and the dynamic settings are considered along with effective modeling approaches for addressing real-world problems. The herein analytical discussion is complemented by a set of numerical examples, which showcase the effectiveness of the presented techniques, as well as their merits related to state-of-the-art methods.
Liang Zhang, Gang Wang, Georgios B. Giannakis, Jie Chen
The problem of reconstructing a sparse signal vector from magnitude-only measurements (a.k.a., compressive phase retrieval), emerges naturally in diverse applications, but it is NP-hard in general. Building on recent advances in nonconvex optimization, this paper puts forth a new algorithm that is termed compressive reweighted amplitude flow and abbreviated as CRAF, for compressive phase retrieval. Specifically, CRAF operates in two stages. The first stage seeks a sparse initial guess via a new spectral procedure. In the second stage, CRAF implements a few hard thresholding based iterations using reweighted gradients. When there are sufficient measurements, CRAF provably recovers the underlying signal vector exactly with high probability under suitable conditions. Moreover, its sample complexity coincides with that of the state-of-the-art procedures. Finally, substantial simulated tests showcase remarkable performance of the new spectral initialization, as well as improved exact recovery relative to competing alternatives.
Jia Chen, Gang Wang, Yanning Shen, Georgios B. Giannakis
Canonical correlation analysis (CCA) is a powerful technique for discovering whether or not hidden sources are commonly present in two (or more) datasets. Its well-appreciated merits include dimensionality reduction, clustering, classification, feature selection, and data fusion. The standard CCA however, does not exploit the geometry of the common sources, which may be available from the given data or can be deduced from (cross-) correlations. In this paper, this extra information provided by the common sources generating the data is encoded in a graph, and is invoked as a graph regularizer. This leads to a novel graph-regularized CCA approach, that is termed graph (g) CCA. The novel gCCA accounts for the graph-induced knowledge of common sources, while minimizing the distance between the wanted canonical variables. Tailored for diverse practical settings where the number of data is smaller than the data vector dimensions, the dual formulation of gCCA is also developed. One such setting includes kernels that are incorporated to account for nonlinear data dependencies. The resultant graph-kernel (gk) CCA is also obtained in closed form. Finally, corroborating image classification tests over several real datasets are presented to showcase the merits of the novel linear, dual, and kernel approaches relative to competing alternatives.
Gang Wang, Georgios B. Giannakis, Yousef Saad, Jie Chen
This paper deals with finding an $n$-dimensional solution $x$ to a system of quadratic equations of the form $y_i=|\langle{a}_i,x\rangle|^2$ for $1\le i \le m$, which is also known as phase retrieval and is NP-hard in general. We put forth a novel procedure for minimizing the amplitude-based least-squares empirical loss, that starts with a weighted maximal correlation initialization obtainable with a few power or Lanczos iterations, followed by successive refinements based upon a sequence of iteratively reweighted (generalized) gradient iterations. The two (both the initialization and gradient flow) stages distinguish themselves from prior contributions by the inclusion of a fresh (re)weighting regularization technique. The overall algorithm is conceptually simple, numerically scalable, and easy-to-implement. For certain random measurement models, the novel procedure is shown capable of finding the true solution $x$ in time proportional to reading the data $\{(a_i;y_i)\}_{1\le i \le m}$. This holds with high probability and without extra assumption on the signal $x$ to be recovered, provided that the number $m$ of equations is some constant $c>0$ times the number $n$ of unknowns in the signal vector, namely, $m>cn$. Empirically, the upshots of this contribution are: i) (almost) $100\%$ perfect signal recovery in the high-dimensional (say e.g., $n\ge 2,000$) regime given only an information-theoretic limit number of noiseless equations, namely, $m=2n-1$ in the real-valued Gaussian case; and, ii) (nearly) optimal statistical accuracy in the presence of additive noise of bounded support. Finally, substantial numerical tests using both synthetic data and real images corroborate markedly improved signal recovery performance and computational efficiency of our novel procedure relative to state-of-the-art approaches.
Daniel Romero, Meng Ma, Georgios B. Giannakis
A number of applications in engineering, social sciences, physics, and biology involve inference over networks. In this context, graph signals are widely encountered as descriptors of vertex attributes or features in graph-structured data. Estimating such signals in all vertices given noisy observations of their values on a subset of vertices has been extensively analyzed in the literature of signal processing on graphs (SPoG). This paper advocates kernel regression as a framework generalizing popular SPoG modeling and reconstruction and expanding their capabilities. Formulating signal reconstruction as a regression task on reproducing kernel Hilbert spaces of graph signals permeates benefits from statistical learning, offers fresh insights, and allows for estimators to leverage richer forms of prior information than existing alternatives. A number of SPoG notions such as bandlimitedness, graph filters, and the graph Fourier transform are naturally accommodated in the kernel framework. Additionally, this paper capitalizes on the so-called representer theorem to devise simpler versions of existing Thikhonov regularized estimators, and offers a novel probabilistic interpretation of kernel methods on graphs based on graphical models. Motivated by the challenges of selecting the bandwidth parameter in SPoG estimators or the kernel map in kernel-based methods, the present paper further proposes two multi-kernel approaches with complementary strengths. Whereas the first enables estimation of the unknown bandwidth of bandlimited signals, the second allows for efficient graph filter selection. Numerical tests with synthetic as well as real data demonstrate the merits of the proposed methods relative to state-of-the-art alternatives.
Yanning Shen, Brian Baingana, Georgios B. Giannakis
Structural equation models (SEMs) have been widely adopted for inference of causal interactions in complex networks. Recent examples include unveiling topologies of hidden causal networks over which processes such as spreading diseases, or rumors propagate. The appeal of SEMs in these settings stems from their simplicity and tractability, since they typically assume linear dependencies among observable variables. Acknowledging the limitations inherent to adopting linear models, the present paper advocates nonlinear SEMs, which account for (possible) nonlinear dependencies among network nodes. The advocated approach leverages kernels as a powerful encompassing framework for nonlinear modeling, and an efficient estimator with affordable tradeoffs is put forth. Interestingly, pursuit of the novel kernel-based approach yields a convex regularized estimator that promotes edge sparsity, and is amenable to proximal-splitting optimization methods. To this end, solvers with complementary merits are developed by leveraging the alternating direction method of multipliers, and proximal gradient iterations. Experiments conducted on simulated data demonstrate that the novel approach outperforms linear SEMs with respect to edge detection errors. Furthermore, tests on a real gene expression dataset unveil interesting new edges that were not revealed by linear SEMs, which could shed more light on regulatory behavior of human genes.
Liang Zhang, Gang Wang, Daniel Romero, Georgios B. Giannakis
Owing to their low-complexity iterations, Frank-Wolfe (FW) solvers are well suited for various large-scale learning tasks. When block-separable constraints are present, randomized block FW (RB-FW) has been shown to further reduce complexity by updating only a fraction of coordinate blocks per iteration. To circumvent the limitations of existing methods, the present work develops step sizes for RB-FW that enable a flexible selection of the number of blocks to update per iteration while ensuring convergence and feasibility of the iterates. To this end, convergence rates of RB-FW are established through computational bounds on a primal sub-optimality measure and on the duality gap. The novel bounds extend the existing convergence analysis, which only applies to a step-size sequence that does not generally lead to feasible iterates. Furthermore, two classes of step-size sequences that guarantee feasibility of the iterates are also proposed to enhance flexibility in choosing decay rates. The novel convergence results are markedly broadened to encompass also nonconvex objectives, and further assert that RB-FW with exact line-search reaches a stationary point at rate $\mathcal{O}(1/\sqrt{t})$. Performance of RB-FW with different step sizes and number of blocks is demonstrated in two applications, namely charging of electrical vehicles and structural support vector machines. Extensive simulated tests demonstrate the performance improvement of RB-FW relative to existing randomized single-block FW methods.
Fatemeh Sheikholeslami, Dimitris Berberidis, Georgios B. Giannakis
Kernel-based methods enjoy powerful generalization capabilities in handling a variety of learning tasks. When such methods are provided with sufficient training data, broadly-applicable classes of nonlinear functions can be approximated with desired accuracy. Nevertheless, inherent to the nonparametric nature of kernel-based estimators are computational and memory requirements that become prohibitive with large-scale datasets. In response to this formidable challenge, the present work puts forward a low-rank, kernel-based, feature extraction approach that is particularly tailored for online operation, where data streams need not be stored in memory. A novel generative model is introduced to approximate high-dimensional (possibly infinite) features via a low-rank nonlinear subspace, the learning of which leads to a direct kernel function approximation. Offline and online solvers are developed for the subspace learning task, along with affordable versions, in which the number of stored data vectors is confined to a predefined budget. Analytical results provide performance bounds on how well the kernel matrix as well as kernel-based classification and regression tasks can be approximated by leveraging budgeted online subspace learning and feature extraction schemes. Tests on synthetic and real datasets demonstrate and benchmark the efficiency of the proposed method when linear classification and regression is applied to the extracted features.
Dimitris Berberidis, Vassilis Kekatos, Georgios B. Giannakis
Linear regression is arguably the most prominent among statistical inference methods, popular both for its simplicity as well as its broad applicability. On par with data-intensive applications, the sheer size of linear regression problems creates an ever growing demand for quick and cost efficient solvers. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with an affordable computational budget. The present paper introduces means of identifying and omitting "less informative" observations in an online and data-adaptive fashion, built on principles of stochastic approximation and data censoring. First- and second-order stochastic approximation maximum likelihood-based algorithms for censored observations are developed for estimating the regression coefficients. Online algorithms are also put forth to reduce the overall complexity by adaptively performing censoring along with estimation. The novel algorithms entail simple closed-form updates, and have provable (non)asymptotic convergence guarantees. Furthermore, specific rules are investigated for tuning to desired censoring patterns and levels of dimensionality reduction. Simulated tests on real and synthetic datasets corroborate the efficacy of the proposed data-adaptive methods compared to data-agnostic random projection-based alternatives.
Emiliano Dall'Anese, Sairaj V. Dhople, Georgios B. Giannakis
This paper considers future distribution networks featuring inverter-interfaced photovoltaic (PV) systems, and addresses the synthesis of feedback controllers that seek real- and reactive-power inverter setpoints corresponding to AC optimal power flow (OPF) solutions. The objective is to bridge the temporal gap between long-term system optimization and real-time inverter control, and enable seamless PV-owner participation without compromising system efficiency and stability. The design of the controllers is grounded on a dual epsilon-subgradient method, and semidefinite programming relaxations are advocated to bypass the non-convexity of AC OPF formulations. Global convergence of inverter output powers is analytically established for diminishing stepsize rules and strictly convex OPF costs for cases where: i) computational limits dictate asynchronous updates of the controller signals, and ii) inverter reference inputs may be updated at a faster rate than the power-output settling time. Although the focus is on PV systems, the framework naturally accommodates different types of inverter-interfaced energy resources.
Gabriela Martinez, Yu Zhang, Georgios B. Giannakis
To effectively enhance the integration of distributed and renewable energy sources in future smart microgrids, economical energy management accounting for the principal challenge of the variable and non-dispatchable renewables is indispensable and of significant importance. Day-ahead economic generation dispatch with demand-side management for a microgrid in islanded mode is considered in this paper. With the goal of limiting the risk of the loss-of-load probability, a joint chance constrained optimization problem is formulated for the optimal multi-period energy scheduling with multiple wind farms. Bypassing the intractable spatio-temporal joint distribution of the wind power generation, a primal-dual approach is used to obtain a suboptimal solution efficiently. The method is based on first-order optimality conditions and successive approximation of the probabilistic constraint by generation of p-efficient points. Numerical results are reported to corroborate the merits of this approach.
Bingcong Li, Yilang Zhang, Georgios B. Giannakis
Sharpness-aware minimization (SAM) has well-documented merits in enhancing generalization of deep neural network models. Accounting for sharpness in the loss function geometry, where neighborhoods of `flat minima' heighten generalization ability, SAM seeks `flat valleys' by minimizing the maximum loss provoked by an adversarial perturbation within the neighborhood. Although critical to account for sharpness of the loss function, in practice SAM suffers from `over-friendly adversaries,' which can curtail the outmost level of generalization. To avoid such `friendliness,' the present contribution fosters stabilization of adversaries through variance suppression (VASSO). VASSO offers a general approach to provably stabilize adversaries. In particular, when integrating VASSO with SAM, improved generalizability is numerically validated on extensive vision and language tasks. Once applied on top of a computationally efficient SAM variant, VASSO offers a desirable generalization-computation tradeoff.
Athanasios Bacharis, Konstantinos D. Polyzos, Georgios B. Giannakis, Nikolaos Papanikolopoulos
Active vision (AV) has been in the spotlight of robotics research due to its emergence in numerous applications including agricultural tasks such as precision crop monitoring and autonomous harvesting to list a few. A major AV problem that gained popularity is the 3D reconstruction of targeted environments using 2D images from diverse viewpoints. While collecting and processing a large number of arbitrarily captured 2D images can be arduous in many practical scenarios, a more efficient solution involves optimizing the placement of available cameras in 3D space to capture fewer, yet more informative, images that provide sufficient visual information for effective reconstruction of the environment of interest. This process termed as view planning (VP), can be markedly challenged (i) by noise emerging in the location of the cameras and/or in the extracted images, and (ii) by the need to generalize well in other unknown similar agricultural environments without need for re-optimizing or re-training. To cope with these challenges, the present work presents a novel VP framework that considers a reconstruction quality-based optimization formulation that relies on the notion of `structure-from-motion' to reconstruct the 3D structure of the sought environment from the selected 2D images. With no analytic expression of the optimization function and with costly function evaluations, a Bayesian optimization approach is proposed to efficiently carry out the VP process using only a few function evaluations, while accounting for different noise cases. Numerical tests on both simulated and real agricultural settings signify the benefits of the advocated VP approach in efficiently estimating the optimal camera placement to accurately reconstruct 3D environments of interest, and generalize well on similar unknown environments.
Amirhossein Taherpour, Alireza Sadeghi, Georgios B. Giannakis
Apr 15, 2026·quant-ph·PDF Scalable estimation of quantum states with readout errors is a central challenge in large multiqubit systems. Existing overlapping-tomography methods improve scalability by working with local subsystems, but they usually assume known or separately calibrated measurements. At the same time, readout-estimation methods model measurement errors without enforcing consistency among overlapping regional states. In this context, the present paper introduces a unified framework for joint regional quantum state tomography with readout errors. A multiqubit system is partitioned in overlapping regions, each region is assigned to a local density operator and a local confusion matrix, and neighboring regions are coupled through reduced-state consistency on shared subsystems. This leads to a structured bilinear optimization problem. To solve it, a distributed alternating method is developed in which the state-update step is handled by the alternating direction method of multipliers (ADMM), while the confusion-matrix updates are carried out locally in parallel. Analytical guarantees are also established, including a sufficient condition for local identifiability, local quadratic growth of the population misfit, and convergence of the inner state-update procedure. Simulations on Ring, Ladder, Torus, and Hub graph geometries show that joint estimation improves state recovery over fixed-readout reconstruction, recovers a substantial portion of oracle performance, and reveals a clear tradeoff between state estimation performance, communication, and computation.
Antonio G. Marques, Georgios B. Giannakis, Javier Ramos
The performance of systems where multiple users communicate over wireless fading links benefits from channel-adaptive allocation of the available resources. Different from most existing approaches that allocate resources based on perfect channel state information, this work optimizes channel scheduling along with per user rate and power loadings over orthogonal fading channels, when both terminals and scheduler rely on quantized channel state information. Channel-adaptive policies are designed to optimize an average transmit-performance criterion subject to average quality of service requirements. While the resultant optimal policy per fading realization shows that the individual rate and power loadings can be obtained separately for each user, the optimal scheduling is slightly more complicated. Specifically, per fading realization each channel is allocated either to a single (winner) user, or, to a small group of winner users whose percentage of shared resources is found by solving a linear program. A single scheduling scheme combining both alternatives becomes possible by smoothing the original disjoint scheme. The smooth scheduling is asymptotically optimal and incurs reduced computational complexity. Different alternatives to obtain the Lagrange multipliers required to implement the channel-adaptive policies are proposed, including stochastic iterations that are provably convergent and do not require knowledge of the channel distribution. The development of the optimal channel-adaptive allocation is complemented with discussions on the overhead required to implement the novel policies.