Alexandra M. Proca, Fernando E. Rosas, Andrea I. Luppi, Daniel Bor, Matthew Crosby, Pedro A. M. Mediano
Striking progress has recently been made in understanding human cognition by analyzing how its neuronal underpinnings are engaged in different modes of information processing. Specifically, neural information can be decomposed into synergistic, redundant, and unique features, with synergistic components being particularly aligned with complex cognition. However, two fundamental questions remain unanswered: (a) precisely how and why a cognitive system can become highly synergistic; and (b) how these informational states map onto artificial neural networks in various learning modes. To address these questions, here we employ an information-decomposition framework to investigate the information processing strategies adopted by simple artificial neural networks performing a variety of cognitive tasks in both supervised and reinforcement learning settings. Our results show that synergy increases as neural networks learn multiple diverse tasks. Furthermore, performance in tasks requiring integration of multiple information sources critically relies on synergistic neurons. Finally, randomly turning off neurons during training through dropout increases network redundancy, corresponding to an increase in robustness. Overall, our results suggest that while redundant information is required for robustness to perturbations in the learning process, synergistic information is used to combine information from multiple modalities -- and more generally for flexible and efficient learning. These findings open the door to new ways of investigating how and why learning systems employ specific information-processing strategies, and support the principle that the capacity for general-purpose learning critically relies in the system's information dynamics.
Fernando E. Rosas, Diego Candia-Rivera, Andrea I Luppi, Yike Guo, Pedro A. M. Mediano
Recent research is revealing how cognitive processes are supported by a complex interplay between the brain and the rest of the body, which can be investigated by the analysis of physiological features such as breathing rhythms, heart rate, and skin conductance. Heart rate dynamics are of particular interest as they provide a way to track the sympathetic and parasympathetic outflow from the autonomic nervous system, which is known to play a key role in modulating attention, memory, decision-making, and emotional processing. However, extracting useful information from heartbeats about the autonomic outflow is still challenging due to the noisy estimates that result from standard signal-processing methods. To advance this state of affairs, we propose a paradigm shift in how we conceptualise and model heart rate: instead of being a mere summary of the observed inter-beat intervals, we introduce a modelling framework that views heart rate as a hidden stochastic process that drives the observed heartbeats. Moreover, by leveraging the rich literature of state-space modelling and Bayesian inference, our proposed framework delivers a description of heart rate dynamics that is not a point estimate but a posterior distribution of a generative model. We illustrate the capabilities of our method by showing that it recapitulates linear properties of conventional heart rate estimators, while exhibiting a better discriminative power for metrics of dynamical complexity compared across different physiological states.
Mikhail Prokopenko, Paul C. W. Davies, Michael Harré, Marcus Heisler, Zdenka Kuncic, Geraint F. Lewis, Ori Livson, Joseph T. Lizier, Fernando E. Rosas
Sep 18, 2024·q-bio.PE·PDF We study open-ended evolution by focusing on computational and information-processing dynamics underlying major evolutionary transitions. In doing so, we consider biological organisms as hierarchical dynamical systems that generate regularities in their phase-spaces through interactions with their environment. These emergent information patterns can then be encoded within the organism's components, leading to self-modelling "tangled hierarchies". Our main conjecture is that when macro-scale patterns are encoded within micro-scale components, it creates fundamental tensions (computational inconsistencies) between what is encodable at a particular evolutionary stage and what is potentially realisable in the environment. A resolution of these tensions triggers an evolutionary transition which expands the problem-space, at the cost of generating new tensions in the expanded space, in a continual process. We argue that biological complexification can be interpreted computation-theoretically, within the Gödel--Turing--Post recursion-theoretic framework, as open-ended generation of computational novelty. In general, this process can be viewed as a meta-simulation performed by higher-order systems that successively simulate the computation carried out by lower-order systems. This computation-theoretic argument provides a basis for hypothesising the biological arrow of time.
Pedro Urbina-Rodriguez, Zafeirios Fountas, Fernando E. Rosas, Jun Wang, Andrea I. Luppi, Haitham Bou-Ammar, Murray Shanahan, Pedro A. M. Mediano
The independent evolution of intelligence in biological and artificial systems offers a unique opportunity to identify its fundamental computational principles. Here we show that large language models spontaneously develop synergistic cores -- components where information integration exceeds individual parts -- remarkably similar to those in the human brain. Using principles of information decomposition across multiple LLM model families and architectures, we find that areas in middle layers exhibit synergistic processing while early and late layers rely on redundancy, mirroring the informational organisation in biological brains. This organisation emerges through learning and is absent in randomly initialised networks. Crucially, ablating synergistic components causes disproportionate behavioural changes and performance loss, aligning with theoretical predictions about the fragility of synergy. Moreover, fine-tuning synergistic regions through reinforcement learning yields significantly greater performance gains than training redundant components, yet supervised fine-tuning shows no such advantage. This convergence suggests that synergistic information processing is a fundamental property of intelligence, providing targets for principled model design and testable predictions for biological intelligence.
Madalina I. Sas, Fernando E. Rosas, Hardik Rajpal, Daniel Bor, Henrik J. Jensen, Pedro A. M. Mediano
A central challenge in the study of complex systems is the quantification of emergence -- understood as the ability of the system to exhibit collective behaviours that cannot be traced down to the individual components. While recent work has proposed practical measures to detect emergence, these approaches tend to double-count the contribution of shared components, which substantially hinders their capability to effectively study large systems. In this work, we introduce a family of improved information-theoretic measures of emergence that iteratively correct for double-counted terms. Our approach is computationally efficient and provides a controllable trade-off between computational load and sensitivity, leading to more accurate and versatile estimates of emergence. The benefits of the proposed approach are demonstrated by successfully detecting emergence in both simulated and real-world data related to flocking behaviour.
Adam Shai, Loren Amdahl-Culleton, Casper L. Christensen, Henry R. Bigelow, Fernando E. Rosas, Alexander B. Boyd, Eric A. Alt, Kyle J. Ray, Paul M. Riechers
Transformers pretrained via next token prediction learn to factor their world into parts, representing these factors in orthogonal subspaces of the residual stream. We formalize two representational hypotheses: (1) a representation in the product space of all factors, whose dimension grows exponentially with the number of parts, or (2) a factored representation in orthogonal subspaces, whose dimension grows linearly. The factored representation is lossless when factors are conditionally independent, but sacrifices predictive fidelity otherwise, creating a tradeoff between dimensional efficiency and accuracy. We derive precise predictions about the geometric structure of activations for each, including the number of subspaces, their dimensionality, and the arrangement of context embeddings within them. We test between these hypotheses on transformers trained on synthetic processes with known latent structure. Models learn factored representations when factors are conditionally independent, and continue to favor them early in training even when noise or hidden dependencies undermine conditional independence, reflecting an inductive bias toward factoring at the cost of fidelity. This provides a principled explanation for why transformers decompose the world into parts, and suggests that interpretable low dimensional structure may persist even in models trained on complex data.
Fernando E. Rosas, Pedro A. M. Mediano, Martin Biehl, Shamil Chandaria, Daniel Polani
We introduce a novel framework to identify perception-action loops (PALOs) directly from data based on the principles of computational mechanics. Our approach is based on the notion of causal blanket, which captures sensory and active variables as dynamical sufficient statistics -- i.e. as the "differences that make a difference." Moreover, our theory provides a broadly applicable procedure to construct PALOs that requires neither a steady-state nor Markovian dynamics. Using our theory, we show that every bipartite stochastic process has a causal blanket, but the extent to which this leads to an effective PALO formulation varies depending on the integrated information of the bipartition.
Pablo A. Morales, Fernando E. Rosas
The maximum entropy principle (MEP) is one of the most prominent methods to investigate and model complex systems. Despite its popularity, the standard form of the MEP can only generate Boltzmann-Gibbs distributions, which are ill-suited for many scenarios of interest. As a principled approach to extend the reach of the MEP, this paper revisits its foundations in information geometry and shows how the geometry of curved statistical manifolds naturally leads to a generalization of the MEP based on the Rényi entropy. By establishing a bridge between non-Euclidean geometry and the MEP, our proposal sets a solid foundation for the numerous applications of the Rényi entropy, and enables a range of novel methods for complex systems analysis.
Pedro A. M. Mediano, Fernando E. Rosas, Juan Carlos Farah, Murray Shanahan, Daniel Bor, Adam B. Barrett
Jun 18, 2021·q-bio.NC·PDF The apparent dichotomy between information-processing and dynamical approaches to complexity science forces researchers to choose between two diverging sets of tools and explanations, creating conflict and often hindering scientific progress. Nonetheless, given the shared theoretical goals between both approaches, it is reasonable to conjecture the existence of underlying common signatures that capture interesting behaviour in both dynamical and information-processing systems. Here we argue that a pragmatic use of Integrated Information Theory (IIT), originally conceived in theoretical neuroscience, can provide a potential unifying framework to study complexity in general multivariate systems. Furthermore, by leveraging metrics put forward by the integrated information decomposition ($Φ$ID) framework, our results reveal that integrated information can effectively capture surprisingly heterogeneous signatures of complexity -- including metastability and criticality in networks of coupled oscillators as well as distributed computation and emergent stable particles in cellular automata -- without relying on idiosyncratic, ad-hoc criteria. These results show how an agnostic use of IIT can provide important steps towards bridging the gap between informational and dynamical approaches to complex systems.
Fernando E. Rosas, Pedro A. M. Mediano, Michael Gastpar
Learning and compression are driven by the common aim of identifying and exploiting statistical regularities in data, which opens the door for fertile collaboration between these areas. A promising group of compression techniques for learning scenarios is normalised maximum likelihood (NML) coding, which provides strong guarantees for compression of small datasets - in contrast with more popular estimators whose guarantees hold only in the asymptotic limit. Here we consider a NML-based decision strategy for supervised classification problems, and show that it attains heuristic PAC learning when applied to a wide variety of models. Furthermore, we show that the misclassification rate of our method is upper bounded by the maximal leakage, a recently proposed metric to quantify the potential of data leakage in privacy-sensitive scenarios.
Fernando E. Rosas, Pedro A. M. Mediano, Andrea I. Luppi, Thomas F. Varley, Joseph T. Lizier, Sebastiano Stramaglia, Henrik J. Jensen, Daniele Marinazzo
Battiston et al. (arXiv:2110.06023) provide a comprehensive overview of how investigations of complex systems should take into account interactions between more than two elements, which can be modelled by hypergraphs and studied via topological data analysis. Following a separate line of enquiry, a broad literature has developed information-theoretic tools to characterize high-order interdependencies from observed data. While these could seem to be competing approaches aiming to address the same question, in this correspondence we clarify that this is not the case, and that a complete account of higher-order phenomena needs to embrace both.
Andrea I Luppi, Fernando E. Rosas, Gustavo Deco, Morten L. Kringelbach, Pedro A. M. Mediano
Aug 10, 2023·q-bio.NC·PDF Temporal irreversibility, often referred to as the arrow of time, is a fundamental concept in statistical mechanics. Markers of irreversibility also provide a powerful characterisation of information processing in biological systems. However, current approaches tend to describe temporal irreversibility in terms of a single scalar quantity, without disentangling the underlying dynamics that contribute to irreversibility. Here we propose a broadly applicable information-theoretic framework to characterise the arrow of time in multivariate time series, which yields qualitatively different types of irreversible information dynamics. This multidimensional characterisation reveals previously unreported high-order modes of irreversibility, and establishes a formal connection between recent heuristic markers of temporal irreversibility and metrics of information processing. We demonstrate the prevalence of high-order irreversibility in the hyperactive regime of a biophysical model of brain dynamics, showing that our framework is both theoretically principled and empirically useful. This work challenges the view of the arrow of time as a monolithic entity, enhancing both our theoretical understanding of irreversibility and our ability to detect it in practical applications.
Abel Jansma, Pedro A. M. Mediano, Fernando E. Rosas
The partial information decomposition (PID) and its extension integrated information decomposition ($Φ$ID) are promising frameworks to investigate information phenomena involving multiple variables. An important limitation of these approaches is the high computational cost involved in their calculation. Here we leverage fundamental algebraic properties of these decompositions to enable a computationally-efficient method to estimate them, which we call the fast Möbius transform. Our approach is based on a novel formula for estimating the Möbius function that circumvents important computational bottlenecks. We showcase the capabilities of this approach by presenting two analyses that would be unfeasible without this method: decomposing the information that neural activity at different frequency bands yield about the brain's macroscopic functional organisation, and identifying distinctive dynamical properties of the interactions between multiple voices in baroque music. Overall, our proposed approach illuminates the value of algebraic facets of information decomposition and opens the way to a wide range of future analyses.
Fernando E. Rosas
Nov 30, 2025·q-bio.NC·PDF Many systems of interest exhibit nested emergent layers with their own rules and regularities, and our knowledge about them seems naturally organised around these levels. This paper proposes that this type of hierarchical emergence arises as a result of underlying symmetries. By combining principles from information theory, group theory, and statistical mechanics, one finds that dynamical processes that are equivariant with respect to a symmetry group give rise to emergent macroscopic levels organised into a hierarchy determined by the subgroups of the symmetry. The same symmetries happen to also shape Bayesian beliefs, yielding hierarchies of abstract belief states that can be updated autonomously at different levels of resolution. These results are illustrated in Hopfield networks and Ehrenfest diffusion, showing that familiar macroscopic quantities emerge naturally from their symmetries. Together, these results suggest that symmetries provide a fundamental mechanism for emergence and support a structural correspondence between objective and epistemic processes, making feasible inferential problems that would otherwise be computationally intractable.
Hardik Rajpal, Clem von Stengel, Pedro A. M. Mediano, Fernando E. Rosas, Eduardo Viegas, Pablo A. Marquet, Henrik J. Jensen
Oct 31, 2023·q-bio.PE·PDF At what level does selective pressure effectively act? When considering the reproductive dynamics of interacting and mutating agents, it has long been debated whether selection is better understood by focusing on the individual or if hierarchical selection emerges as a consequence of joint adaptation. Despite longstanding efforts in theoretical ecology there is still no consensus on this fundamental issue, most likely due to the difficulty in obtaining adequate data spanning sufficient number of generations and the lack of adequate tools to quantify the effect of hierarchical selection. Here we capitalise on recent advances in information-theoretic data analysis to advance this state of affairs by investigating the emergence of high-order structures -- such as groups of species -- in the collective dynamics of the Tangled Nature model of evolutionary ecology. Our results show that evolutionary dynamics can lead to clusters of species that act as a selective group, that acquire information-theoretic agency. Overall, our findings provide quantitative evidence supporting the relevance of high-order structures in evolutionary ecology, which can emerge even from relatively simple processes of adaptation and selection.
Fernando E. Rosas, Pedro A. M. Mediano, Michael Gastpar
Systems of interest for theoretical or experimental work often exhibit high-order interactions, corresponding to statistical interdependencies in groups of variables that cannot be reduced to dependencies in subsets of them. While still under active development, the framework of partial information decomposition (PID) has emerged as the dominant approach to conceptualise and calculate high-order interdependencies. PID approaches can be grouped in two types: directed approaches that divide variables into sources and targets, and undirected approaches that treat all variables equally. Directed and undirected approaches are usually employed to investigate different scenarios, and hence little is known about how these two types of approaches may relate to each other, or if their corresponding quantities are linked in some way. In this paper we investigate the relationship between the redundancy-synergy index (RSI) and the O-information, which are practical metrics of directed and undirected high-order interdependencies, respectively. Our results reveal tight links between these two quantities, and provide interpretations of them in terms of likelihood ratios in a hypothesis testing setting, as well as in terms of projections in information geometry.
Rodrigo Cofré, Rubén Herzog, Pedro A. M. Mediano, Juan Piccinini, Fernando E. Rosas, Yonatan Sanz Perl, Enzo Tagliazucchi
The scope of human consciousness includes states departing from what most of us experience as ordinary wakefulness. These altered states of consciousness constitute a prime opportunity to study how global changes in brain activity relate to different varieties of subjective experience. We consider the problem of explaining how global signatures of altered consciousness arise from the interplay between large-scale connectivity and local dynamical rules that can be traced to known properties of neural tissue. For this purpose, we advocate a research program aimed at bridging the gap between bottom-up generative models of whole-brain activity and the top-down signatures proposed by theories of consciousness. Throughout this paper, we define altered states of consciousness, discuss relevant signatures of consciousness observed in brain activity, and introduce whole-brain models to explore the mechanisms of altered consciousness from the bottom-up. We discuss the potential of our proposal in view of the current state of the art, give specific examples of how this research agenda might play out, and emphasise how a systematic investigation of altered states of consciousness via bottom-up modelling may help us better understand the biophysical, informational, and dynamical underpinnings of consciousness.
Fernando E. Rosas, Pedro A. M. Mediano, Henrik J. Jensen, Anil K. Seth, Adam B. Barrett, Robin L. Carhart-Harris, Daniel Bor
Apr 17, 2020·q-bio.NC·PDF The broad concept of emergence is instrumental in various of the most challenging open scientific questions -- yet, few quantitative theories of what constitutes emergent phenomena have been proposed. This article introduces a formal theory of causal emergence in multivariate systems, which studies the relationship between the dynamics of parts of a system and macroscopic features of interest. Our theory provides a quantitative definition of downward causation, and introduces a complementary modality of emergent behaviour -- which we refer to as causal decoupling. Moreover, the theory allows practical criteria that can be efficiently calculated in large systems, making our framework applicable in a range of scenarios of practical interest. We illustrate our findings in a number of case studies, including Conway's Game of Life, Reynolds' flocking model, and neural activity as measured by electrocorticography.
Pedro A. M. Mediano, Fernando E. Rosas, Andrea I. Luppi, Henrik J. Jensen, Anil K. Seth, Adam B. Barrett, Robin L. Carhart-Harris, Daniel Bor
Nov 12, 2021·q-bio.NC·PDF Emergence is a profound subject that straddles many scientific disciplines, including the formation of galaxies and how consciousness arises from the collective activity of neurons. Despite the broad interest that exists on this concept, the study of emergence has suffered from a lack of formalisms that could be used to guide discussions and advance theories. Here we summarise, elaborate on, and extend a recent formal theory of causal emergence based on information decomposition, which is quantifiable and amenable to empirical testing. This theory relates emergence with information about a system's temporal evolution that cannot be obtained from the parts of the system separately. This article provides an accessible but rigorous introduction to the framework, discussing the merits of the approach in various scenarios of interest. We also discuss several interpretation issues and potential misunderstandings, while highlighting the distinctive benefits of this formalism.
Daniele Marinazzo, Jan Van Roozendaal, Fernando E. Rosas, Massimo Stella, Renzo Comolatti, Nigel Colenbier, Sebastiano Stramaglia, Yves Rosseel
Psychological network approaches propose to see symptoms or questionnaire items as interconnected nodes, with links between them reflecting pairwise statistical dependencies evaluated cross-sectional, time-series, or panel data. These networks constitute an established methodology to assess the interactions and relative importance of nodes/indicators, providing an important complement to other approaches such as factor analysis. However, focusing the modelling solely on pairwise relationships can neglect potentially critical information shared by groups of three or more variables in the form of higher-order interdependencies. To overcome this important limitation, here we propose an information-theoretic framework based on hypergraphs as psychometric models. As edges in hypergraphs are capable of encompassing several nodes together, this extension can thus provide a richer representation of the interactions that may exist among sets of psychological variables. Our results show how psychometric hypergraphs can highlight meaningful redundant and synergistic interactions on either simulated or state-of-art, re-analyzed psychometric datasets. Overall, our framework extends current network approaches while leading to new ways of assessing the data that differ at their core from other methods, extending the psychometric toolbox and opening promising avenues for future investigation.