Ke Sun, Iñaki Esnaola, Samir M. Perlaza, H. Vincent Poor
Random attacks that jointly minimize the amount of information acquired by the operator about the state of the grid and the probability of attack detection are presented. The attacks minimize the information acquired by the operator by minimizing the mutual information between the observations and the state variables describing the grid. Simultaneously, the attacker aims to minimize the probability of attack detection by minimizing the Kullback-Leibler (KL) divergence between the distribution when the attack is present and the distribution under normal operation. The resulting cost function is the weighted sum of the mutual information and the KL divergence mentioned above. The tradeoff between the probability of attack detection and the reduction of mutual information is governed by the weighting parameter on the KL divergence term in the cost function. The probability of attack detection is evaluated as a function of the weighting parameter. A sufficient condition on the weighting parameter is given for achieving an arbitrarily small probability of attack detection. The attack performance is numerically assessed on the IEEE 30-Bus and 118-Bus test systems.
Cristian Genes, Iñaki Esnaola, Samir. M. Perlaza, Luis F. Ochoa, Daniel Coca
The advanced operation of future electricity distribution systems is likely to require significant observability of the different parameters of interest (e.g., demand, voltages, currents, etc.). Ensuring completeness of data is, therefore, paramount. In this context, an algorithm for recovering missing state variable observations in electricity distribution systems is presented. The proposed method exploits the low rank structure of the state variables via a matrix completion approach while incorporating prior knowledge in the form of second order statistics. Specifically, the recovery method combines nuclear norm minimization with Bayesian estimation. The performance of the new algorithm is compared to the information-theoretic limits and tested trough simulations using real data of an urban low voltage distribution system. The impact of the prior knowledge is analyzed when a mismatched covariance is used and for a Markovian sampling that introduces structure in the observation pattern. Numerical results demonstrate that the proposed algorithm is robust and outperforms existing state of the art algorithms.
Miguel Arrieta, Inaki Esnaola
A battery charging policy that provides privacy guarantees for smart meter systems with finite capacity battery is proposed. For this policy an upper bound on the information leakage rate is provided. The upper bound applies for general random processes modelling the energy consumption of the user. It is shown that the average energy consumption of the user determines the information leakage rate to the utility provider. The upper bound is shown to be tight by deriving the probability law of a random process achieving the bound.
Ke Sun, Inaki Esnaola, Samir M. Perlaza, H. Vincent Poor
Gaussian random attacks that jointly minimize the amount of information obtained by the operator from the grid and the probability of attack detection are presented. The construction of the attack is posed as an optimization problem with a utility function that captures two effects: firstly, minimizing the mutual information between the measurements and the state variables; secondly, minimizing the probability of attack detection via the Kullback-Leibler divergence between the distribution of the measurements with an attack and the distribution of the measurements without an attack. Additionally, a lower bound on the utility function achieved by the attacks constructed with imperfect knowledge of the second order statistics of the state variables is obtained. The performance of the attack construction using the sample covariance matrix of the state variables is numerically evaluated. The above results are tested in the IEEE 30-Bus test system.
Samir M. Perlaza, Gaetan Bisson, Iñaki Esnaola, Alain Jean-Marie, Stefano Rini
The empirical risk minimization (ERM) problem with relative entropy regularization (ERM-RER) is investigated under the assumption that the reference measure is a $σ$-finite measure, and not necessarily a probability measure. Under this assumption, which leads to a generalization of the ERM-RER problem allowing a larger degree of flexibility for incorporating prior knowledge, numerous relevant properties are stated. Among these properties, the solution to this problem, if it exists, is shown to be a unique probability measure, mutually absolutely continuous with the reference measure. Such a solution exhibits a probably-approximately-correct guarantee for the ERM problem independently of whether the latter possesses a solution. For a fixed dataset and under a specific condition, the empirical risk is shown to be a sub-Gaussian random variable when the models are sampled from the solution to the ERM-RER problem. The generalization capabilities of the solution to the ERM-RER problem (the Gibbs algorithm) are studied via the sensitivity of the expected empirical risk to deviations from such a solution towards alternative probability measures. Finally, an interesting connection between sensitivity, generalization error, and lautum information is established.
Jonathan Monsalve, Juan Ramirez, Iñaki Esnaola, Henry Arguello
Compressive covariance estimation has arisen as a class of techniques whose aim is to obtain second-order statistics of stochastic processes from compressive measurements. Recently, these methods have been used in various image processing and communications applications, including denoising, spectrum sensing, and compression. Notice that estimating the covariance matrix from compressive samples leads to ill-posed minimizations with severe performance loss at high compression rates. In this regard, a regularization term is typically aggregated to the cost function to consider prior information about a particular property of the covariance matrix. Hence, this paper proposes an algorithm based on the projected gradient method to recover low-rank or Toeplitz approximations of the covariance matrix from compressive measurements. The algorithm divides the compressive measurements into data subsets projected onto different subspaces and accurately estimates the covariance matrix by solving a single optimization problem assuming that each data subset contains an approximation of the signal statistics. Furthermore, gradient filtering is included at every iteration of the proposed algorithm to minimize the estimation error. The error induced by the proposed splitting approach is analytically derived along with the convergence guarantees of the proposed method. The algorithm estimates the covariance matrix of hyperspectral images from synthetic and real compressive samples. Extensive simulations show that the proposed algorithm can effectively recover the covariance matrix of hyperspectral images from compressive measurements (8-15% approx). Moreover, simulations and theoretical results show that the filtering step reduces the recovery error up to twice the number of eigenvectors. Finally, an optical implementation is proposed, and real measurements are used to validate the theoretical findings.
Jonathan Ling, Dmitry Chizhik, A. Tulino, Inaki Esnaola
Radio channels are typically sparse in the delay domain, and ideal for compressed sensing. A new compressed sensing algorithm called eX-OMP is developed that yields performance similar to that of the optimal MMSE estimator. The new algorithm relies on a small amount additional data. Both eX-OMP and the MMSE estimator adaptively balance channel tracking and noise reduction. They perform better than simple estimators such as the linear-interpolator which fix this trade-off a priori. Some wideband measurements are examined, and the channels are found to be represented by a few delays.
Francisco Daunas, Iñaki Esnaola, Samir M. Perlaza, H. Vincent Poor
The effect of relative entropy asymmetry is analyzed in the context of empirical risk minimization (ERM) with relative entropy regularization (ERM-RER). Two regularizations are considered: $(a)$ the relative entropy of the measure to be optimized with respect to a reference measure (Type-I ERM-RER); and $(b)$ the relative entropy of the reference measure with respect to the measure to be optimized (Type-II ERM-RER). The main result is the characterization of the solution to the Type-II ERM-RER problem and its key properties. By comparing the well-understood Type-I ERM-RER with Type-II ERM-RER, the effects of entropy asymmetry are highlighted. The analysis shows that in both cases, regularization by relative entropy forces the solution's support to collapse into the support of the reference measure, introducing a strong inductive bias that negates the evidence provided by the training data. Finally, it is shown that Type-II regularization is equivalent to Type-I regularization with an appropriate transformation of the empirical risk function.
Cristian Genes, Iñaki Esnaola, Samir Perlaza, Daniel Coca
We study the recovery of missing data from multiple smart grid datasets within a matrix completion framework. The datasets contain the electrical magnitudes required for monitoring and control of the electricity distribution system. Each dataset is described by a low rank matrix. Different datasets are correlated as a result of containing measurements of different physical magnitudes generated by the same distribution system. To assess the validity of matrix completion techniques in the recovery of missing data, we characterize the fundamental limits when two correlated datasets are jointly recovered. We then proceed to evaluate the performance of Singular Value Thresholding (SVT) and Bayesian SVT (BSVT) in this setting. We show that BSVT outperforms SVT by simulating the recovery for different correlated datasets. The performance of BSVT displays the tradeoff behaviour described by the fundamental limit, which suggests that BSVT exploits the correlation between the datasets in an efficient manner.
Samir M. Perlaza, Gaetan Bisson, Iñaki Esnaola, Alain Jean-Marie, Stefano Rini
The optimality and sensitivity of the empirical risk minimization problem with relative entropy regularization (ERM-RER) are investigated for the case in which the reference is a sigma-finite measure instead of a probability measure. This generalization allows for a larger degree of flexibility in the incorporation of prior knowledge over the set of models. In this setting, the interplay of the regularization parameter, the reference measure, the risk function, and the empirical risk induced by the solution of the ERM-RER problem is characterized. This characterization yields necessary and sufficient conditions for the existence of a regularization parameter that achieves an arbitrarily small empirical risk with arbitrarily high probability. The sensitivity of the expected empirical risk to deviations from the solution of the ERM-RER problem is studied. The sensitivity is then used to provide upper and lower bounds on the expected empirical risk. Moreover, it is shown that the expectation of the sensitivity is upper bounded, up to a constant factor, by the square root of the lautum information between the models and the datasets.
Xiuzhen Ye, Iñaki Esnaola, Samir M. Perlaza, Robert F. Harrison
Sparse stealth attack constructions that minimize the mutual information between the state variables and the observations are proposed. The attack construction is formulated as the design of a multivariate Gaussian distribution that aims to minimize the mutual information while limiting the Kullback-Leibler divergence between the distribution of the observations under attack and the distribution of the observations without attack. The sparsity constraint is incorporated as a support constraint of the attack distribution. Two heuristic greedy algorithms for the attack construction are proposed. The first algorithm assumes that the attack vector consists of independent entries, and therefore, requires no communication between different attacked locations. The second algorithm considers correlation between the attack vector entries which results in better attack performance at the expense of coordination between different locations. We numerically evaluate the performance of the proposed attack constructions on IEEE test systems and show that it is feasible to construct stealth attacks that generate significant disruption with a low number of compromised sensors.
Victor Quintero, Samir M. Perlaza, Iñaki Esnaola, Jean-Marie Gorce
In this paper, an achievability region and a converse region for the two-user Gaussian interference channel with noisy channel-output feedback (G-IC-NOF) are presented. The achievability region is obtained using a random coding argument and three well-known techniques: rate splitting, superposition coding and backward decoding. The converse region is obtained using some of the existing perfect-output feedback outer-bounds as well as a set of new outer-bounds that are obtained by using genie-aided models of the original G-IC-NOF. Finally, it is shown that the achievability region and the converse region approximate the capacity region of the G-IC-NOF to within a constant gap in bits per channel use.
Francisco Daunas, Iñaki Esnaola, Samir M. Perlaza, H. Vincent Poor
The solution to empirical risk minimization with $f$-divergence regularization (ERM-$f$DR) is presented under mild conditions on $f$. Under such conditions, the optimal measure is shown to be unique. Examples of the solution for particular choices of the function $f$ are presented. Previously known solutions to common regularization choices are obtained by leveraging the flexibility of the family of $f$-divergences. These include the unique solutions to empirical risk minimization with relative entropy regularization (Type-I and Type-II). The analysis of the solution unveils the following properties of $f$-divergences when used in the ERM-$f$DR problem: $i\bigl)$ $f$-divergence regularization forces the support of the solution to coincide with the support of the reference measure, which introduces a strong inductive bias that dominates the evidence provided by the training data; and $ii\bigl)$ any $f$-divergence regularization is equivalent to a different $f$-divergence regularization with an appropriate transformation of the empirical risk function.
Francisco Daunas, Iñaki Esnaola, Samir M. Perlaza, Gholamali Aminian
The solution to empirical risk minimization with $f$-divergence regularization (ERM-$f$DR) is extended to constrained optimization problems, establishing conditions for equivalence between the solution and constraints. A dual formulation of ERM-$f$DR is introduced, providing a computationally efficient method to derive the normalization function of the ERM-$f$DR solution. This dual approach leverages the Legendre-Fenchel transform and the implicit function theorem, enabling explicit characterizations of the generalization error for general algorithms under mild conditions, and another for ERM-$f$DR solutions.
Nazia Nafis, Inaki Esnaola, Alvaro Martinez-Perez, Maria-Cruz Villa-Uriol, Venet Osmani
Generating synthetic tabular data can be challenging, however evaluation of their quality is just as challenging, if not more. This systematic review sheds light on the critical importance of rigorous evaluation of synthetic health data to ensure reliability, relevance, and their appropriate use. Based on screening of 1766 papers and a detailed review of 101 papers we identified key challenges, including lack of consensus on evaluation methods, improper use of evaluation metrics, limited input from domain experts, inadequate reporting of dataset characteristics, and limited reproducibility of results. In response, we provide several guidelines on the generation and evaluation of synthetic data, to allow the community to unlock and fully harness the transformative potential of synthetic data and accelerate innovation.
Suzana S. A. Silva, Ioannis Dakanalis, Luiz A. C. A. Schiavo, Kostas Tziotziou, Istvan Ballai, Shahin Jafarzadeh, Tiago M. D. Pereira, Georgia Tsiropoula, Gary Verth, Iñaki Esnaola, James A. McLaughlin, Gert J. J. Botha, Viktor Fedun
The Sun's atmosphere hosts swirling plasma structures, known as solar vortices, which have long been thought to channel wave energy into higher layers. Until now, no direct observations have confirmed their role in the heating of the atmosphere. Here, we present the first direct evidence that solar vortices act as structured waveguides, carrying magnetoacoustic modes (waves that behave like sound waves but travel through magnetized plasma) that leave clear wave-heating signatures. By mapping vortex regions at multiple heights and analysing the waves they contain, we show that magnetoacoustic waves efficiently transfer energy, offset losses from radiation, and dominate energy transport in the lower chromosphere. These results challenge the long-standing assumption that vortices primarily support twisting disturbances traveling along magnetic field lines (Alfven waves), revealing instead that magnetoacoustic modes play the leading role in the lower atmosphere. This redefines the role of vortices in magnetized plasmas and has broader implications for wave-plasma interactions in regions of strong magnetic fields.
Yaiza Bermudez, Gaetan Bisson, Iñaki Esnaola, Samir M. Perlaza
In this technical report, rigorous statements and formal proofs are presented for both foundational and advanced folklore theorems on the Radon-Nikodym derivative. The cases of conditional and marginal probability measures are carefully considered, which leads to an identity involving the sum of mutual and lautum information suggesting a new interpretation for such a sum.
Yaiza Bermudez, Samir M. Perlaza, Iñaki Esnaola
In this paper, it is shown, for the first time, that centralized performance is achievable in decentralized learning without sharing the local datasets. Specifically, when clients adopt an empirical risk minimization with relative-entropy regularization (ERM-RER) learning framework and a forward-backward communication between clients is established, it suffices to share the locally obtained Gibbs measures to achieve the same performance as that of a centralized ERM-RER with access to all the datasets. The core idea is that the Gibbs measure produced by client~$k$ is used, as reference measure, by client~$k+1$. This effectively establishes a principled way to encode prior information through a reference measure. In particular, achieving centralized performance in the decentralized setting requires a specific scaling of the regularization factors with the local sample sizes. Overall, this result opens the door to novel decentralized learning paradigms that shift the collaboration strategy from sharing data to sharing the local inductive bias via the reference measures over the set of models.
Miguel Arrieta, Iñaki Esnaola, Michelle Effros
Smart meters enable improvements in electricity distribution system efficiency at some cost in customer privacy. Users with home batteries can mitigate this privacy loss by applying charging policies that mask their underlying energy use. A battery charging policy is proposed and shown to provide universal privacy guarantees subject to a constraint on energy cost. The guarantee bounds our strategy's maximal information leakage from the user to the utility provider under general stochastic models of user energy consumption. The policy construction adapts coding strategies for non-probabilistic permuting channels to this privacy problem.
Ke Sun, Iñaki Esnaola, Antonia M. Tulino, H. Vincent Poor
The learning data requirements are analyzed for the construction of stealth attacks in state estimation. In particular, the training data set is used to compute a sample covariance matrix that results in a random matrix with a Wishart distribution. The ergodic attack performance is defined as the average attack performance obtained by taking the expectation with respect to the distribution of the training data set. The impact of the training data size on the ergodic attack performance is characterized by proposing an upper bound for the performance. Simulations on the IEEE 30-Bus test system show that the proposed bound is tight in practical settings.