Andrea I Luppi, Fernando E. Rosas, Gustavo Deco, Morten L. Kringelbach, Pedro A. M. Mediano
Aug 10, 2023·q-bio.NC·PDF Temporal irreversibility, often referred to as the arrow of time, is a fundamental concept in statistical mechanics. Markers of irreversibility also provide a powerful characterisation of information processing in biological systems. However, current approaches tend to describe temporal irreversibility in terms of a single scalar quantity, without disentangling the underlying dynamics that contribute to irreversibility. Here we propose a broadly applicable information-theoretic framework to characterise the arrow of time in multivariate time series, which yields qualitatively different types of irreversible information dynamics. This multidimensional characterisation reveals previously unreported high-order modes of irreversibility, and establishes a formal connection between recent heuristic markers of temporal irreversibility and metrics of information processing. We demonstrate the prevalence of high-order irreversibility in the hyperactive regime of a biophysical model of brain dynamics, showing that our framework is both theoretically principled and empirically useful. This work challenges the view of the arrow of time as a monolithic entity, enhancing both our theoretical understanding of irreversibility and our ability to detect it in practical applications.
Abel Jansma, Pedro A. M. Mediano, Fernando E. Rosas
The partial information decomposition (PID) and its extension integrated information decomposition ($Φ$ID) are promising frameworks to investigate information phenomena involving multiple variables. An important limitation of these approaches is the high computational cost involved in their calculation. Here we leverage fundamental algebraic properties of these decompositions to enable a computationally-efficient method to estimate them, which we call the fast Möbius transform. Our approach is based on a novel formula for estimating the Möbius function that circumvents important computational bottlenecks. We showcase the capabilities of this approach by presenting two analyses that would be unfeasible without this method: decomposing the information that neural activity at different frequency bands yield about the brain's macroscopic functional organisation, and identifying distinctive dynamical properties of the interactions between multiple voices in baroque music. Overall, our proposed approach illuminates the value of algebraic facets of information decomposition and opens the way to a wide range of future analyses.
Nat Dilokthanakul, Pedro A. M. Mediano, Marta Garnelo, Matthew C. H. Lee, Hugh Salimbeni, Kai Arulkumaran, Murray Shanahan
We study a variant of the variational autoencoder model (VAE) with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the known problem of over-regularisation that has been shown to arise in regular VAEs also manifests itself in our model and leads to cluster degeneracy. We show that a heuristic called minimum information constraint that has been shown to mitigate this effect in VAEs can also be applied to improve unsupervised clustering performance with our model. Furthermore we analyse the effect of this heuristic and provide an intuition of the various processes with the help of visualizations. Finally, we demonstrate the performance of our model on synthetic data, MNIST and SVHN, showing that the obtained clusters are distinct, interpretable and result in achieving competitive performance on unsupervised clustering to the state-of-the-art results.
Enrico Caprioglio, Pedro A. M. Mediano, Luc Berthouze
High-order interdependencies are central features of complex systems, yet a mechanistic explanation for their emergence remains elusive. Currently, it is unknown under what conditions high-order interdependencies, quantified by the information-theoretic construct of synergy, arise in systems governed by pairwise interactions. We solve this problem by providing precise sufficient and necessary conditions for when synergy prevails over low-order interdependencies in the weak interaction regime, namely, we prove that antibalanced (highly frustrated) correlational structures in Gaussian systems are sufficient for synergy-dominance and that antibalanced interaction motifs in Ornstein-Uhlenbeck processes are necessary for synergy-dominance. We validate the applicability of these analytical insights beyond the weak interaction regime, as well as in Ising, oscillatory, and empirical networks from multiple domains. Our results demonstrate that pairwise interactions can give rise to synergistic information in the absence of explicit high-order mechanisms, and highlight structural balance theory as an instrumental conceptual framework to study high-order interdependencies.
Lucas Prieto, Edward Stevinson, Melih Barsbey, Tolga Birdal, Pedro A. M. Mediano
A central idea in mechanistic interpretability is that neural networks represent more features than they have dimensions, arranging them in superposition to form an over-complete basis. This framing has been influential, motivating dictionary learning approaches such as sparse autoencoders. However, superposition has mostly been studied in idealized settings where features are sparse and uncorrelated. In these settings, superposition is typically understood as introducing interference that must be minimized geometrically and filtered out by non-linearities such as ReLUs, yielding local structures like regular polytopes. We show that this account is incomplete for realistic data by introducing Bag-of-Words Superposition (BOWS), a controlled setting to encode binary bag-of-words representations of internet text in superposition. Using BOWS, we find that when features are correlated, interference can be constructive rather than just noise to be filtered out. This is achieved by arranging features according to their co-activation patterns, making interference between active features constructive, while still using ReLUs to avoid false positives. We show that this kind of arrangement is more prevalent in models trained with weight decay and naturally gives rise to semantic clusters and cyclical structures which have been observed in real language models yet were not explained by the standard picture of superposition. Code for this paper can be found at https://github.com/LucasPrietoAl/correlations-feature-geometry.
Hardik Rajpal, Clem von Stengel, Pedro A. M. Mediano, Fernando E. Rosas, Eduardo Viegas, Pablo A. Marquet, Henrik J. Jensen
Oct 31, 2023·q-bio.PE·PDF At what level does selective pressure effectively act? When considering the reproductive dynamics of interacting and mutating agents, it has long been debated whether selection is better understood by focusing on the individual or if hierarchical selection emerges as a consequence of joint adaptation. Despite longstanding efforts in theoretical ecology there is still no consensus on this fundamental issue, most likely due to the difficulty in obtaining adequate data spanning sufficient number of generations and the lack of adequate tools to quantify the effect of hierarchical selection. Here we capitalise on recent advances in information-theoretic data analysis to advance this state of affairs by investigating the emergence of high-order structures -- such as groups of species -- in the collective dynamics of the Tangled Nature model of evolutionary ecology. Our results show that evolutionary dynamics can lead to clusters of species that act as a selective group, that acquire information-theoretic agency. Overall, our findings provide quantitative evidence supporting the relevance of high-order structures in evolutionary ecology, which can emerge even from relatively simple processes of adaptation and selection.
Fernando E. Rosas, Pedro A. M. Mediano, Michael Gastpar
Systems of interest for theoretical or experimental work often exhibit high-order interactions, corresponding to statistical interdependencies in groups of variables that cannot be reduced to dependencies in subsets of them. While still under active development, the framework of partial information decomposition (PID) has emerged as the dominant approach to conceptualise and calculate high-order interdependencies. PID approaches can be grouped in two types: directed approaches that divide variables into sources and targets, and undirected approaches that treat all variables equally. Directed and undirected approaches are usually employed to investigate different scenarios, and hence little is known about how these two types of approaches may relate to each other, or if their corresponding quantities are linked in some way. In this paper we investigate the relationship between the redundancy-synergy index (RSI) and the O-information, which are practical metrics of directed and undirected high-order interdependencies, respectively. Our results reveal tight links between these two quantities, and provide interpretations of them in terms of likelihood ratios in a hypothesis testing setting, as well as in terms of projections in information geometry.
Zhaolu Liu, Robert L. Peach, Pedro A. M. Mediano, Mauricio Barahona
Models that rely solely on pairwise relationships often fail to capture the complete statistical structure of the complex multivariate data found in diverse domains, such as socio-economic, ecological, or biomedical systems. Non-trivial dependencies between groups of more than two variables can play a significant role in the analysis and modelling of such systems, yet extracting such high-order interactions from data remains challenging. Here, we introduce a hierarchy of $d$-order ($d \geq 2$) interaction measures, increasingly inclusive of possible factorisations of the joint probability distribution, and define non-parametric, kernel-based tests to establish systematically the statistical significance of $d$-order interactions. We also establish mathematical links with lattice theory, which elucidate the derivation of the interaction measures and their composite permutation tests; clarify the connection of simplicial complexes with kernel matrix centring; and provide a means to enhance computational efficiency. We illustrate our results numerically with validations on synthetic data, and through an application to neuroimaging data.
Pedro A. M. Mediano, Anil K. Seth, Adam B. Barrett
Jun 25, 2018·q-bio.NC·PDF Integrated Information Theory (IIT) is a prominent theory of consciousness that has at its centre measures that quantify the extent to which a system generates more information than the sum of its parts. While several candidate measures of integrated information (`$Φ$') now exist, little is known about how they compare, especially in terms of their behaviour on non-trivial network models. In this article we provide clear and intuitive descriptions of six distinct candidate measures. We then explore the properties of each of these measures in simulation on networks consisting of eight interacting nodes, animated with Gaussian linear autoregressive dynamics. We find a striking diversity in the behaviour of these measures -- no two measures show consistent agreement across all analyses. Further, only a subset of the measures appear to genuinely reflect some form of dynamical complexity, in the sense of simultaneous segregation and integration between system components. Our results help guide the operationalisation of IIT and advance the development of measures of integrated information that may have more general applicability.
Fernando Rosas, Pedro A. M. Mediano, Michael Gastpar, Henrik J. Jensen
This article introduces a model-agnostic approach to study statistical synergy, a form of emergence in which patterns at large scales are not traceable from lower scales. Our framework leverages various multivariate extensions of Shannon's mutual information, and introduces the O-information as a metric capable of characterising synergy- and redundancy-dominated systems. We develop key analytical properties of the O-information, and study how it relates to other metrics of high-order interactions from the statistical mechanics and neuroscience literature. Finally, as a proof of concept, we use the proposed framework to explore the relevance of statistical synergy in Baroque music scores.
Rodrigo Cofré, Rubén Herzog, Pedro A. M. Mediano, Juan Piccinini, Fernando E. Rosas, Yonatan Sanz Perl, Enzo Tagliazucchi
The scope of human consciousness includes states departing from what most of us experience as ordinary wakefulness. These altered states of consciousness constitute a prime opportunity to study how global changes in brain activity relate to different varieties of subjective experience. We consider the problem of explaining how global signatures of altered consciousness arise from the interplay between large-scale connectivity and local dynamical rules that can be traced to known properties of neural tissue. For this purpose, we advocate a research program aimed at bridging the gap between bottom-up generative models of whole-brain activity and the top-down signatures proposed by theories of consciousness. Throughout this paper, we define altered states of consciousness, discuss relevant signatures of consciousness observed in brain activity, and introduce whole-brain models to explore the mechanisms of altered consciousness from the bottom-up. We discuss the potential of our proposal in view of the current state of the art, give specific examples of how this research agenda might play out, and emphasise how a systematic investigation of altered states of consciousness via bottom-up modelling may help us better understand the biophysical, informational, and dynamical underpinnings of consciousness.
Fernando E. Rosas, Pedro A. M. Mediano, Henrik J. Jensen, Anil K. Seth, Adam B. Barrett, Robin L. Carhart-Harris, Daniel Bor
Apr 17, 2020·q-bio.NC·PDF The broad concept of emergence is instrumental in various of the most challenging open scientific questions -- yet, few quantitative theories of what constitutes emergent phenomena have been proposed. This article introduces a formal theory of causal emergence in multivariate systems, which studies the relationship between the dynamics of parts of a system and macroscopic features of interest. Our theory provides a quantitative definition of downward causation, and introduces a complementary modality of emergent behaviour -- which we refer to as causal decoupling. Moreover, the theory allows practical criteria that can be efficiently calculated in large systems, making our framework applicable in a range of scenarios of practical interest. We illustrate our findings in a number of case studies, including Conway's Game of Life, Reynolds' flocking model, and neural activity as measured by electrocorticography.
Pedro A. M. Mediano, Fernando E. Rosas, Andrea I. Luppi, Henrik J. Jensen, Anil K. Seth, Adam B. Barrett, Robin L. Carhart-Harris, Daniel Bor
Nov 12, 2021·q-bio.NC·PDF Emergence is a profound subject that straddles many scientific disciplines, including the formation of galaxies and how consciousness arises from the collective activity of neurons. Despite the broad interest that exists on this concept, the study of emergence has suffered from a lack of formalisms that could be used to guide discussions and advance theories. Here we summarise, elaborate on, and extend a recent formal theory of causal emergence based on information decomposition, which is quantifiable and amenable to empirical testing. This theory relates emergence with information about a system's temporal evolution that cannot be obtained from the parts of the system separately. This article provides an accessible but rigorous introduction to the framework, discussing the merits of the approach in various scenarios of interest. We also discuss several interpretation issues and potential misunderstandings, while highlighting the distinctive benefits of this formalism.
Hanna M. Tolle, Andrea I Luppi, Anil K. Seth, Pedro A. M. Mediano
Jun 27, 2024·q-bio.NC·PDF Biological neural networks can perform complex computations to predict their environment, far above the limited predictive capabilities of individual neurons. While conventional approaches to understanding these computations often focus on isolating the contributions of single neurons, here we argue that a deeper understanding requires considering emergent dynamics - dynamics that make the whole system "more than the sum of its parts". Specifically, we examine the relationship between prediction performance and emergence by leveraging recent quantitative metrics of emergence, derived from Partial Information Decomposition, and by modelling the prediction of environmental dynamics in a bio-inspired computational framework known as reservoir computing. Notably, we reveal a bidirectional coupling between prediction performance and emergence, which generalises across task environments and reservoir network topologies, and is recapitulated by three key results: 1) Optimising hyperparameters for performance enhances emergent dynamics, and vice versa; 2) Emergent dynamics represent a near sufficient criterion for prediction success in all task environments, and an almost necessary criterion in most environments; 3) Training reservoir computers on larger datasets results in stronger emergent dynamics, which contain task-relevant information crucial for performance. Overall, our study points to a pivotal role of emergence in facilitating environmental predictions in a bio-inspired computational architecture.
Alberto Liardi, Fernando E. Rosas, Robin L. Carhart-Harris, George Blackburne, Daniel Bor, Pedro A. M. Mediano
A key feature of information theory is its universality, as it can be applied to study a broad variety of complex systems. However, many information-theoretic measures can vary significantly even across systems with similar properties, making normalisation techniques essential for allowing meaningful comparisons across datasets. Inspired by the framework of Partial Information Decomposition (PID), here we introduce Null Models for Information Theory (NuMIT), a null model-based non-linear normalisation procedure which improves upon standard entropy-based normalisation approaches and overcomes their limitations. We provide practical implementations of the technique for systems with different statistics, and showcase the method on synthetic models and on human neuroimaging data. Our results demonstrate that NuMIT provides a robust and reliable tool to characterise complex systems of interest, allowing cross-dataset comparisons and providing a meaningful significance test for PID analyses.
Lucas Prieto, Melih Barsbey, Pedro A. M. Mediano, Tolga Birdal
Grokking, the sudden generalization that occurs after prolonged overfitting, is a surprising phenomenon challenging our understanding of deep learning. Although significant progress has been made in understanding grokking, the reasons behind the delayed generalization and its dependence on regularization remain unclear. In this work, we argue that without regularization, grokking tasks push models to the edge of numerical stability, introducing floating point errors in the Softmax function, which we refer to as Softmax Collapse (SC). We demonstrate that SC prevents grokking and that mitigating SC enables grokking without regularization. Investigating the root cause of SC, we find that beyond the point of overfitting, the gradients strongly align with what we call the naïve loss minimization (NLM) direction. This component of the gradient does not alter the model's predictions but decreases the loss by scaling the logits, typically by scaling the weights along their current direction. We show that this scaling of the logits explains the delay in generalization characteristic of grokking and eventually leads to SC, halting further learning. To validate our hypotheses, we introduce two key contributions that address the challenges in grokking tasks: StableMax, a new activation function that prevents SC and enables grokking without regularization, and $\perp$Grad, a training algorithm that promotes quick generalization in grokking tasks by preventing NLM altogether. These contributions provide new insights into grokking, elucidating its delayed generalization, reliance on regularization, and the effectiveness of existing grokking-inducing methods. Code for this paper is available at https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability.
Adam B. Barrett, Borjan Milinkovic, Pedro A. M. Mediano, Fernando E. Rosas, Daniel Bor, Lionel Barnett, Anil K. Seth
Apr 13, 2026·q-bio.NC·PDF The integrated information theory of consciousness (IIT) is uniquely ambitious in proposing a mathematical formula, derived from apparently fundamental properties of conscious experience, to describe the quantity and quality of consciousness for any physical system that possesses it. IIT has generated considerable debate, which has engendered some misunderstandings and misrepresentations. Here we address and hope to remedy this. We begin by concisely summarising the essentials of IIT. Given IIT is supposed to apply universally, we do this with reference to an arbitrary patch of matter, as opposed to the usual system of discrete computational units. Then, after briefly summarising IIT's theoretical and empirical achievements, we focus on five points which we consider especially important for driving forward new theory and increasing understanding. First, a high value of the measure $Φ$ is not synonymous with `more consciousness'. We describe how $Φ$ might be replaced with a suite of quantities to obtain a multi-dimensional characterisation of states of consciousness. Second, we describe with nuance the distinct flavour of panpsychism implied by IIT -- whereby space (and time) are tiled with substrates of (proto-) consciousness -- and find this is not problematic for the theory. Third, $Φ$ is not well-defined for real physical systems, and has not been computed on any real physical system. Fourth, so far only proxies for IIT measures have been computed, and not approximations. Fifth, for IIT to fit with current successful theories in fundamental physics, a reformulation in terms of continuous fields would be needed.
Andrea Tacchetti, H. Francis Song, Pedro A. M. Mediano, Vinicius Zambaldi, Neil C. Rabinowitz, Thore Graepel, Matthew Botvinick, Peter W. Battaglia
The behavioral dynamics of multi-agent systems have a rich and orderly structure, which can be leveraged to understand these systems, and to improve how artificial agents learn to operate in them. Here we introduce Relational Forward Models (RFM) for multi-agent learning, networks that can learn to make accurate predictions of agents' future behavior in multi-agent environments. Because these models operate on the discrete entities and relations present in the environment, they produce interpretable intermediate representations which offer insights into what drives agents' behavior, and what events mediate the intensity and valence of social interactions. Furthermore, we show that embedding RFM modules inside agents results in faster learning systems compared to non-augmented baselines. As more and more of the autonomous systems we develop and interact with become multi-agent in nature, developing richer analysis tools for characterizing how and why agents make decisions is increasingly necessary. Moreover, developing artificial agents that quickly and safely learn to coordinate with one another, and with humans in shared environments, is crucial.
Alexandra M. Proca, Fernando E. Rosas, Andrea I. Luppi, Daniel Bor, Matthew Crosby, Pedro A. M. Mediano
Striking progress has recently been made in understanding human cognition by analyzing how its neuronal underpinnings are engaged in different modes of information processing. Specifically, neural information can be decomposed into synergistic, redundant, and unique features, with synergistic components being particularly aligned with complex cognition. However, two fundamental questions remain unanswered: (a) precisely how and why a cognitive system can become highly synergistic; and (b) how these informational states map onto artificial neural networks in various learning modes. To address these questions, here we employ an information-decomposition framework to investigate the information processing strategies adopted by simple artificial neural networks performing a variety of cognitive tasks in both supervised and reinforcement learning settings. Our results show that synergy increases as neural networks learn multiple diverse tasks. Furthermore, performance in tasks requiring integration of multiple information sources critically relies on synergistic neurons. Finally, randomly turning off neurons during training through dropout increases network redundancy, corresponding to an increase in robustness. Overall, our results suggest that while redundant information is required for robustness to perturbations in the learning process, synergistic information is used to combine information from multiple modalities -- and more generally for flexible and efficient learning. These findings open the door to new ways of investigating how and why learning systems employ specific information-processing strategies, and support the principle that the capacity for general-purpose learning critically relies in the system's information dynamics.
Fernando E. Rosas, Diego Candia-Rivera, Andrea I Luppi, Yike Guo, Pedro A. M. Mediano
Recent research is revealing how cognitive processes are supported by a complex interplay between the brain and the rest of the body, which can be investigated by the analysis of physiological features such as breathing rhythms, heart rate, and skin conductance. Heart rate dynamics are of particular interest as they provide a way to track the sympathetic and parasympathetic outflow from the autonomic nervous system, which is known to play a key role in modulating attention, memory, decision-making, and emotional processing. However, extracting useful information from heartbeats about the autonomic outflow is still challenging due to the noisy estimates that result from standard signal-processing methods. To advance this state of affairs, we propose a paradigm shift in how we conceptualise and model heart rate: instead of being a mere summary of the observed inter-beat intervals, we introduce a modelling framework that views heart rate as a hidden stochastic process that drives the observed heartbeats. Moreover, by leveraging the rich literature of state-space modelling and Bayesian inference, our proposed framework delivers a description of heart rate dynamics that is not a point estimate but a posterior distribution of a generative model. We illustrate the capabilities of our method by showing that it recapitulates linear properties of conventional heart rate estimators, while exhibiting a better discriminative power for metrics of dynamical complexity compared across different physiological states.