Tobias Jung, Daniel Polani, Peter Stone
This paper develops generalizations of empowerment to continuous states. Empowerment is a recently introduced information-theoretic quantity motivated by hypotheses about the efficiency of the sensorimotor loop in biological organisms, but also from considerations stemming from curiosity-driven learning. Empowemerment measures, for agent-environment systems with stochastic transitions, how much influence an agent has on its environment, but only that influence that can be sensed by the agent sensors. It is an information-theoretic generalization of joint controllability (influence on environment) and observability (measurement by sensors) of the environment by the agent, both controllability and observability being usually defined in control theory as the dimensionality of the control/observation spaces. Earlier work has shown that empowerment has various interesting and relevant properties, e.g., it allows us to identify salient states using only the dynamics, and it can act as intrinsic reward without requiring an external reward. However, in this previous work empowerment was limited to the case of small-scale and discrete domains and furthermore state transition probabilities were assumed to be known. The goal of this paper is to extend empowerment to the significantly more important and relevant case of continuous vector-valued state spaces and initially unknown state transition probabilities. The continuous state space is addressed by Monte-Carlo approximation; the unknown transitions are addressed by model learning and prediction for which we apply Gaussian processes regression with iterated forecasting. In a number of well-known continuous control tasks we examine the dynamics induced by empowerment and include an application to exploration and online model learning.
Christoph Salge, Cornelius Glackin, Daniel Polani
This book chapter is an introduction to and an overview of the information-theoretic, task independent utility function "Empowerment", which is defined as the channel capacity between an agent's actions and an agent's sensors. It quantifies how much influence and control an agent has over the world it can perceive. This book chapter discusses the general idea behind empowerment as an intrinsic motivation and showcases several previous applications of empowerment to demonstrate how empowerment can be applied to different sensor-motor configuration, and how the same formalism can lead to different observed behaviors. Furthermore, we also present a fast approximation for empowerment in the continuous domain.
Christoph Salge, Daniel Polani
One of the remarkable feats of intelligent life is that it restructures the world it lives in for its own benefit. This extended abstract outlines how the information-theoretic principle of empowerment, as an intrinsic motivation, can be used to restructure the environment an agent lives in. We present a first qualitative evaluation of how an agent in a 3d-gridworld builds a staircase-like structure, which reflects the agent's embodiment.
Martin Biehl, Daniel Polani
This is a contribution to the formalization of the concept of agents in multivariate Markov chains. Agents are commonly defined as entities that act, perceive, and are goal-directed. In a multivariate Markov chain (e.g. a cellular automaton) the transition matrix completely determines the dynamics. This seems to contradict the possibility of acting entities within such a system. Here we present definitions of actions and perceptions within multivariate Markov chains based on entity-sets. Entity-sets represent a largely independent choice of a set of spatiotemporal patterns that are considered as all the entities within the Markov chain. For example, the entity-set can be chosen according to operational closure conditions or complete specific integration. Importantly, the perception-action loop also induces an entity-set and is a multivariate Markov chain. We then show that our definition of actions leads to non-heteronomy and that of perceptions specialize to the usual concept of perception in the perception-action loop.
Klinsmann Agyei, Pouria Sarhadi, Daniel Polani
Over the past decade, remarkable progress has been made in adopting deep neural networks to enhance the performance of conventional reinforcement learning. A notable milestone was the development of Deep Q-Networks (DQN), which achieved human-level performance across a range of Atari games, demonstrating the potential of deep learning to stabilise and scale reinforcement learning. Subsequently, extensions to continuous control algorithms paved the way for a new paradigm in control, one that has attracted broader attention than any classical control approach in recent literature. These developments also demonstrated strong potential for advancing data-driven, model-free algorithms for control and for achieving higher levels of autonomy. However, the application of these methods has remained largely confined to simulated and gaming environments, with ongoing efforts to extend them to real-world applications. Before such deployment can be realised, a solid and quantitative understanding of their performance on applied control problems is necessary. This paper conducts a comparative analysis of these approaches on four diverse benchmark problems with implementation results. This analysis offers a scrutinising and systematic evaluation to shed light on the real-world capabilities and limitations of deep reinforcement learning methods in applied control settings.
Leonardo Christov-Moore, Arthur Juliani, Alex Kiefer, Joel Lehman, Nicco Reggente, B. Scot Rousse, Adam Safron, Nicolás Hinrichs, Daniel Polani, Antonio Damasio
As artificial agents enter open-ended physical environments -- eldercare, disaster response, and space missions -- they must persist under uncertainty while providing reliable care. Yet current systems struggle to generalize across distribution shifts and lack intrinsic motivation to preserve the well-being of others. Vulnerability and mortality are often seen as constraints to be avoided, yet organisms survive and provide care in an open-ended world with relative ease and efficiency. We argue that generalization and care arise from conditions of physical embodiment: being-in-the-world (the agent is a part of the environment) and being-towards-death (unless counteracted, the agent drifts toward terminal states). These conditions necessitate a homeostatic drive to maintain oneself and maximize the future capacity to continue doing so. Fulfilling this drive over long time horizons in multi-agent environments necessitates robust causal modeling of self and others' embodiment and jointly achievable future states. Because embodied agents are part of the environment, with the self delimited by reliable control, empowering others can expand self-boundaries, enabling other-regard. This provides a path from embodiment toward generalization and care based in shared constraints. We outline a reinforcement-learning framework for examining these questions. Homeostatic mortal agents continually learning in open-ended environments may offer efficient robustness and trustworthy alignment.
Andrei D. Robu, Christoph Salge, Chrystopher L. Nehaniv, Daniel Polani
Jun 17, 2019·q-bio.OT·PDF Being able to measure time, whether directly or indirectly, is a significant advantage for an organism. It allows for the timely reaction to regular or predicted events, reducing the pressure for fast processing of sensory input. Thus, clocks are ubiquitous in biology. In the present paper, we consider minimal abstract pure clocks in different configurations and investigate their characteristic dynamics. We are especially interested in optimally time-resolving clocks. Among these, we find fundamentally diametral clock characteristics, such as oscillatory behavior for purely local time measurement or decay-based clocks measuring time periods of a scale global to the problem. We include also sets of independent clocks ("clock bags"), sequential cascades of clocks and composite clocks with controlled dependency. Clock cascades show a "condensation effect" and the composite clock shows various regimes of markedly different dynamics.
Marcus M. Scheunemann, Sander G. van Dijk, Rebecca Miko, Daniel Barry, George M. Evans, Alessandra Rossi, Daniel Polani
We participated in the RoboCup 2018 competition in Montreal with our newly developed BoldBot based on the Darwin-OP and mostly self-printed custom parts. This paper is about the lessons learnt from that competition and further developments for the RoboCup 2019 competition. Firstly, we briefly introduce the team along with an overview of past achievements. We then present a simple, standalone 2D simulator we use for simplifying the entry for new members with making basic RoboCup concepts quickly accessible. We describe our approach for semantic-segmentation for our vision used in the 2018 competition, which replaced the lookup-table (LUT) implementation we had before. We also discuss the extra structural support we plan to add to the printed parts of the BoldBot and our transition to ROS 2 as our new middleware. Lastly, we will present a collection of open-source contributions of our team.
Wooyoung Chung, Daniel Polani, Stas Tiomkin
Incorporating prior knowledge into a data-driven modeling problem can drastically improve performance, reliability, and generalization outside of the training sample. The stronger the structural properties, the more effective these improvements become. Manifolds are a powerful nonlinear generalization of Euclidean space for modeling finite dimensions. Structural impositions in constrained systems increase when applying group structure, converting them into Lie manifolds. The range of their applications is very wide and includes the important case of robotic tasks. Canonical Correlation Analysis (CCA) can construct a hierarchical sequence of maximal correlations of up to two paired data sets in these Euclidean spaces. We present a method to generalize this concept to Lie Manifolds and demonstrate its efficacy through the substantial improvements it achieves in making structure-consistent predictions about changes in the state of a robotic hand.
Fernando E. Rosas, Bernhard C. Geiger, Andrea I Luppi, Anil K. Seth, Daniel Polani, Michael Gastpar, Pedro A. M. Mediano
Understanding the functional architecture of complex systems is crucial to illuminate their inner workings and enable effective methods for their prediction and control. Recent advances have introduced tools to characterise emergent macroscopic levels; however, while these approaches are successful in identifying when emergence takes place, they are limited in the extent they can determine how it does. Here we address this limitation by developing a computational approach to emergence, which characterises macroscopic processes in terms of their computational capabilities. Concretely, we articulate a view on emergence based on how software works, which is rooted on a mathematical formalism that articulates how macroscopic processes can express self-contained informational, interventional, and computational properties. This framework establishes a hierarchy of nested self-contained processes that determines what computations take place at what level, which in turn delineates the functional architecture of a complex system. This approach is illustrated on paradigmatic models from the statistical physics and computational neuroscience literature, which are shown to exhibit macroscopic processes that are akin to software in human-engineered systems. Overall, this framework enables a deeper understanding of the multi-level structure of complex systems, revealing specific ways in which they can be efficiently simulated, predicted, and controlled.
Stavros Anagnou, Daniel Polani, Christoph Salge
We examine the effect of noise on societies of agents using an agent-based model of evolutionary norm emergence. Generally, we see that noisy societies are more selfish, smaller and discontent, and are caught in rounds of perpetual punishment preventing them from flourishing. Surprisingly, despite the effect of noise on the population, it does not seem to evolve away. We carry out further analysis and provide reasons for why this may be the case. Furthermore, we claim that our framework that evolves the noise/ambiguity of norms may be a new way to model the tight/loose framework of norms, suggesting that despite ambiguous norms detrimental effect on society, evolution does not favour clarity.
Ronit Purian, Daniel Polani
After a decade of on-demand mobility services that change spatial behaviors in metropolitan areas, the Shared Autonomous Vehicle (SAV) service is expected to increase traffic congestion and unequal access to transport services. A paradigm of scheduled supply that is aware of demand but not on-demand is proposed, introducing coordination and social and behavioral understanding, urban cognition and empowerment of agents, into a novel informational framework. Daily routines and other patterns of spatial behaviors outline a fundamental demand layer in a supply-oriented paradigm that captures urban dynamics and spatial-temporal behaviors, mostly in groups. Rather than real-time requests and instant responses that reward unplanned actions, and beyond just reservation of travels in timetables, the intention is to capture mobility flows in scheduled travels along the day considering time of day, places, passengers etc. Regulating goal-directed behaviors and caring for service resources and the overall system welfare is proposed to minimize uncertainty, considering the capacity of mobility interactions to hold value, i.e., Motility as a Service (MaaS). The principal-agent problem in the smart city is a problem of collective action among service providers and users that create expectations based on previous actions and reactions in mutual systems. Planned behavior that accounts for service coordination is expected to stabilize excessive rides and traffic load, and to induce a cognitive gain, thus balancing information load and facilitating cognitive effort.
Hippolyte Charvin, Nicola Catenacci Volpi, Daniel Polani
Extraction of structure, in particular of group symmetries, is increasingly crucial to understanding and building intelligent models. In particular, some information-theoretic models of parsimonious learning have been argued to induce invariance extraction. Here, we formalise these arguments from a group-theoretic perspective. We then extend them to the study of more general probabilistic symmetries, through compressions preserving geometric measures of complexity. More precisely, our framework implements a trade-off between compression and preservation of the divergence from a given hierarchical model, yielding a novel generalisation of the Information Bottleneck framework. Through appropriate choices of hierarchical models, we fully characterise (in the discrete and full support case) channel invariance, channel equivariance and distribution invariance under permutation. Allowing imperfect divergence preservation then leads to principled definitions of "soft symmetries", where the "coarseness" corresponds to the degree of compression of the system. In simple synthetic experiments, we demonstrate that our method successively recovers, at increasingly compressed "resolutions", nested but increasingly perturbed equivariances, where new equivariances emerge at bifurcation points of the trade-off parameter. Our framework suggests a new path for the extraction of generalised probabilistic symmetries.
Stas Tiomkin, Ilya Nemenman, Daniel Polani, Naftali Tishby
Biological systems often choose actions without an explicit reward signal, a phenomenon known as intrinsic motivation. The computational principles underlying this behavior remain poorly understood. In this study, we investigate an information-theoretic approach to intrinsic motivation, based on maximizing an agent's empowerment (the mutual information between its past actions and future states). We show that this approach generalizes previous attempts to formalize intrinsic motivation, and we provide a computationally efficient algorithm for computing the necessary quantities. We test our approach on several benchmark control problems, and we explain its success in guiding intrinsically motivated behaviors by relating our information-theoretic control function to fundamental properties of the dynamical system representing the combined agent-environment system. This opens the door for designing practical artificial, intrinsically motivated controllers and for linking animal behaviors to their dynamical properties.
Malte Harder, Christoph Salge, Daniel Polani
We define a measure of redundant information based on projections in the space of probability distributions. Redundant information between random variables is information that is shared between those variables. But in contrast to mutual information, redundant information denotes information that is shared about the outcome of a third variable. Formalizing this concept, and being able to measure it, is required for the non-negative decomposition of mutual information into redundant and synergistic information. Previous attempts to formalize redundant or synergistic information struggle to capture some desired properties. We introduce a new formalism for redundant information and prove that it satisfies all the properties necessary outlined in earlier work, as well as an additional criterion that we propose to be necessary to capture redundancy. We also demonstrate the behaviour of this new measure for several examples, compare it to previous measures and apply it to the decomposition of transfer entropy.
Marcus M. Scheunemann, Christoph Salge, Daniel Polani, Kerstin Dautenhahn
A challenge in using robots in human-inhabited environments is to design behavior that is engaging, yet robust to the perturbations induced by human interaction. Our idea is to imbue the robot with intrinsic motivation (IM) so that it can handle new situations and appears as a genuine social other to humans and thus be of more interest to a human interaction partner. Human-robot interaction (HRI) experiments mainly focus on scripted or teleoperated robots, that mimic characteristics such as IM to control isolated behavior factors. This article presents a "robotologist" study design that allows comparing autonomously generated behaviors with each other, and, for the first time, evaluates the human perception of IM-based generated behavior in robots. We conducted a within-subjects user study (N=24) where participants interacted with a fully autonomous Sphero BB8 robot with different behavioral regimes: one realizing an adaptive, intrinsically motivated behavior and the other being reactive, but not adaptive. The robot and its behaviors are intentionally kept minimal to concentrate on the effect induced by IM. A quantitative analysis of post-interaction questionnaires showed a significantly higher perception of the dimension "Warmth" compared to the reactive baseline behavior. Warmth is considered a primary dimension for social attitude formation in human social cognition. A human perceived as warm (friendly, trustworthy) experiences more positive social interactions.
Nihat Ay, Daniel Polani, Nathaniel Virgo
We offer a new approach to the information decomposition problem in information theory: given a 'target' random variable co-distributed with multiple 'source' variables, how can we decompose the mutual information into a sum of non-negative terms that quantify the contributions of each random variable, not only individually but also in combination? We derive our composition from cooperative game theory. It can be seen as assigning a "fair share" of the mutual information to each combination of the source variables. Our decomposition is based on a different lattice from the usual 'partial information decomposition' (PID) approach, and as a consequence our decomposition has a smaller number of terms: it has analogs of the synergy and unique information terms, but lacks terms corresponding to redundancy. Because of this, it is able to obey equivalents of the axioms known as 'local positivity' and 'identity', which cannot be simultaneously satisfied by a PID measure.
Andres C. Burgos, Daniel Polani
In a previous study, we considered an information-theoretic model of code evolution. In it, agents obtain information about their (common) environment by the perception of messages of other agents, which is determined by an interaction probability (the structure of the population). For an agent to understand another agent's messages, the former must either know the identity of the latter, or the code producing the messages must be universally interpretable. A universal code, however, introduces a vulnerability: a parasitic entity can take advantage of it. Here, we investigate this problem. In our specific setting, we consider a parasite to be an agent that tries to inflict as much damage as possible in the mutual understanding of the population (i.e. the parasite acts as a disinformation agent). We show that, after introducing a parasite in the population, the former adopts a code such that it captures the information about the environment that is missing in the population. Such agent would be of great value, but only if the rest of the population could understand its messages. However, it is of little use here, since the parasite utilises the most common messages in the population to express different concepts. Now we let the population respond by updating their codes such that, in this arms race, they again maximise their mutual understanding. As a result, there is a code drift in the population where the utilisation of the messages of the parasite is avoided. A consequence of this is that the information that the parasite possesses but the agents lack becomes understandable and readily available.
Ole Steuernagel, Daniel Polani
Some microbial organisms are known to randomly slip into and out of hibernation, irrespective of environmental conditions [1]. In a (genetically) uniform population a typically very small subpopulation becomes metabolically inactive whereas the majority subpopulation remains active and grows. Bacteria such as E. coli, Staphylococcus aureus (MRSA-superbug), Mycobacterium tuberculosis, and Pseudomonas aeruginosa [1-3] show persistence. It can render bacteria less vulnerable in adverse environments [1, 4, 5] and their effective eradication through medication more difficult [2, 3, 6]. Here we show that medication treatment regimes may have to be modified when persistence is taken into account and characterize optimal approaches assuming that the total medication dose is constrained. The determining factors are cumulative toxicity, eradication power of the medication and bacterial response timescales. Persistent organisms have to be fought using tailored eradication strategies which display two fundamental characteristics. Ideally, the treatment time should be significantly longer than in the case of persistence with the medication uniformly spread out over time; however, if treatment time has to be limited, then the application of medication has to be concentrated towards the beginning and end of the treatment. These findings deviate from current clinical practice, and may therefore help to optimize and simplify treatments. Our use of multi-objective optimization [7] to map out the optimal strategies can be generalized to other related problems.
Christoph Salge, Cornelius Glackin, Daniel Polani
One aspect of intelligence is the ability to restructure your own environment so that the world you live in becomes more beneficial to you. In this paper we investigate how the information-theoretic measure of agent empowerment can provide a task-independent, intrinsic motivation to restructure the world. We show how changes in embodiment and in the environment change the resulting behaviour of the agent and the artefacts left in the world. For this purpose, we introduce an approximation of the established empowerment formalism based on sparse sampling, which is simpler and significantly faster to compute for deterministic dynamics. Sparse sampling also introduces a degree of randomness into the decision making process, which turns out to beneficial for some cases. We then utilize the measure to generate agent behaviour for different agent embodiments in a Minecraft-inspired three dimensional block world. The paradigmatic results demonstrate that empowerment can be used as a suitable generic intrinsic motivation to not only generate actions in given static environments, as shown in the past, but also to modify existing environmental conditions. In doing so, the emerging strategies to modify an agent's environment turn out to be meaningful to the specific agent capabilities, i.e., de facto to its embodiment.