Fabian Jirasek, Robert Bamler, Sophie Fellenz, Michael Bortz, Marius Kloft, Stephan Mandt, Hans Hasse
Predictive models of thermodynamic properties of mixtures are paramount in chemical engineering and chemistry. Classical thermodynamic models are successful in generalizing over (continuous) conditions like temperature and concentration. On the other hand, matrix completion methods (MCMs) from machine learning successfully generalize over (discrete) binary systems; these MCMs can make predictions without any data for a given binary system by implicitly learning commonalities across systems. In the present work, we combine the strengths of both worlds in a hybrid approach. The underlying idea is to predict the pair-interaction energies, as they are used in basically all physical models of liquid mixtures, by an MCM. As an example, we embed an MCM into UNIQUAC, a widely-used physical model for the Gibbs excess energy. We train the resulting hybrid model in a Bayesian machine-learning framework on experimental data for activity coefficients in binary systems of 1146 components from the Dortmund Data Bank. We thereby obtain, for the first time, a complete set of UNIQUAC parameters for all binary systems of these components, which allows us to predict, in principle, activity coefficients at arbitrary temperature and composition for any combination of these components, not only for binary but also for multicomponent systems. The hybrid model even outperforms the best available physical model for predicting activity coefficients, the modified UNIFAC (Dortmund) model.
Stephan Mandt, Adrian E. Feiguin, Salvatore R. Manmana
Motivated by the recent experimental observation of negative absolute temperature states in systems of ultracold atomic gases in optical lattices [Braun et al., Science 339, 52 (2013)], we investigate theoretically the formation of these states. More specifically, we consider the relaxation after a sudden inversion of the external parabolic confining potential in the one-dimensional inhomogeneous Bose-Hubbard model. First, we focus on the integrable hard-core boson limit which allows us to treat large systems and arbitrarily long times, providing convincing numerical evidence for relaxation to a generalized Gibbs ensemble at negative temperature T<0, a notion we define in this context. Second, going beyond one dimension, we demonstrate that the emergence of negative temperature states can be understood in a dual way in terms of positive temperatures, which relies on a dynamic symmetry of the Hubbard model. We complement the study by exact diagonalization simulations at finite values of the on-site interaction.
Stephan Mandt, Florian Wenzel, Shinichi Nakajima, John P. Cunningham, Christoph Lippert, Marius Kloft
Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to continuous phenotypes. We introduce the Sparse Probit Linear Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to binary phenotypes. As a technical challenge, the model no longer possesses a closed-form likelihood function. In this paper, we present a scalable approximate inference algorithm that lets us fit the model to high-dimensional data sets. We show on three real-world examples from different domains that in the setup of binary labels, our algorithm leads to better prediction accuracies and also selects features which show less correlation with the confounding factors.
Griffin Mooers, Jens Tuyls, Stephan Mandt, Michael Pritchard, Tom Beucler
While cloud-resolving models can explicitly simulate the details of small-scale storm formation and morphology, these details are often ignored by climate models for lack of computational resources. Here, we explore the potential of generative modeling to cheaply recreate small-scale storms by designing and implementing a Variational Autoencoder (VAE) that performs structural replication, dimensionality reduction, and clustering of high-resolution vertical velocity fields. Trained on ~6*10^6 samples spanning the globe, the VAE successfully reconstructs the spatial structure of convection, performs unsupervised clustering of convective organization regimes, and identifies anomalous storm activity, confirming the potential of generative modeling to power stochastic parameterizations of convection in climate models.
Jakub Swiatkowski, Kevin Roth, Bastiaan S. Veeling, Linh Tran, Joshua V. Dillon, Jasper Snoek, Stephan Mandt, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin
Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work developing this class of methods has explored ever richer parameterizations of the approximate posterior in the hope of improving performance. In contrast, here we share a curious experimental finding that suggests instead restricting the variational distribution to a more compact parameterization. For a variety of deep Bayesian neural networks trained using Gaussian mean-field variational inference, we find that the posterior standard deviations consistently exhibit strong low-rank structure after convergence. This means that by decomposing these variational parameters into a low-rank factorization, we can make our variational approximation more compact without decreasing the models' performance. Furthermore, we find that such factorized parameterizations improve the signal-to-noise ratio of stochastic gradient estimates of the variational lower bound, resulting in faster convergence.
Fabian Jirasek, Robert Bamler, Stephan Mandt
We present a generic way to hybridize physical and data-driven methods for predicting physicochemical properties. The approach `distills' the physical method's predictions into a prior model and combines it with sparse experimental data using Bayesian inference. We apply the new approach to predict activity coefficients at infinite dilution and obtain significant improvements compared to the data-driven and physical baselines and established ensemble methods from the machine learning literature.
Robert Bamler, Stephan Mandt
We present a probabilistic language model for time-stamped text data which tracks the semantic evolution of individual words over time. The model represents words and contexts by latent trajectories in an embedding space. At each moment in time, the embedding vectors are inferred from a probabilistic version of word2vec [Mikolov et al., 2013]. These embedding vectors are connected in time through a latent diffusion process. We describe two scalable variational inference algorithms--skip-gram smoothing and skip-gram filtering--that allow us to train the model jointly over all times; thus learning on all data while simultaneously allowing word and context vectors to drift. Experimental results on three different corpora demonstrate that our dynamic model infers word embedding trajectories that are more interpretable and lead to higher predictive likelihoods than competing methods that are based on static models trained separately on time slices.
Ulrich Schneider, Stephan Mandt, Akos Rapp, Simon Braun, Hendrik Weimer, Immanuel Bloch, Achim Rosch
In this comment we argue that negative absolute temperatures are a well-established concept for systems with bounded spectra. They are not only consistent with thermodynamics, but are even unavoidable for a consistent description of the thermal equilibrium of inverted populations.
Stephan Mandt
Variational solutions of the Boltzmann equation usually rely on the concept of linear response. We extend the variational approach for tight-binding models at high entropies to a regime far beyond linear response. We analyze both weakly interacting fermions and incoherent bosons on a lattice. We consider a case where the particles are driven by a constant force, leading to the well-known Bloch oscillations, and we consider interactions that are weak enough not to overdamp these oscillations. This regime is computationally demanding and relevant for ultracold atoms in optical lattices. We derive a simple theory in terms of coupled dynamic equations for the particle density, energy density, current and heat current, allowing for analytic solutions. As an application, we identify damping coefficients for Bloch oscillations in the Hubbard model at weak interactions and compute them for a one-dimensional toy model. We also approximately solve the long-time dynamics of a weakly interacting, strongly Bloch-oscillating cloud of fermionic particles in a tilted lattice, leading to a subdiffusive scaling exponent.
Stephan Mandt, Matthew D. Hoffman, David M. Blei
Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a Markov chain with a stationary distribution. With this perspective, we derive several new results. (1) We show that constant SGD can be used as an approximate Bayesian posterior inference algorithm. Specifically, we show how to adjust the tuning parameters of constant SGD to best match the stationary distribution to a posterior, minimizing the Kullback-Leibler divergence between these two distributions. (2) We demonstrate that constant SGD gives rise to a new variational EM algorithm that optimizes hyperparameters in complex probabilistic models. (3) We also propose SGD with momentum for sampling and show how to adjust the damping coefficient accordingly. (4) We analyze MCMC algorithms. For Langevin Dynamics and Stochastic Gradient Fisher Scoring, we quantify the approximation errors due to finite learning rates. Finally (5), we use the stochastic process perspective to give a short proof of why Polyak averaging is optimal. Based on this idea, we propose a scalable approximate MCMC algorithm, the Averaged Stochastic Gradient Sampler.
Stephan Mandt, Akos Rapp, Achim Rosch
We consider a cloud of fermionic atoms in an optical lattice described by a Hubbard model with an additional linear potential. While homogeneous interacting systems mainly show damped Bloch oscillations and heating, a finite cloud behaves differently: It expands symmetrically such that gains of potential energy at the top are compensated by losses at the bottom. Interactions stabilize the necessary heat currents by inducing gradients of the inverse temperature 1/T, with T<0 at the bottom of the cloud. An analytic solution of hydrodynamic equations shows that the width of the cloud increases with t^(1/3) for long times consistent with results from our Boltzmann simulations.
Akos Rapp, Stephan Mandt, Achim Rosch
As highly tunable interacting systems, cold atoms in optical lattices are ideal to realize and observe negative absolute temperatures, T < 0. We show theoretically that by reversing the confining potential, stable superfluid condensates at finite momentum and T < 0 can be created with low entropy production for attractive bosons. They may serve as `smoking gun' signatures of equilibrated T < 0. For fermions, we analyze the time scales needed to equilibrate to T < 0. For moderate interactions, the equilibration time is proportional to the square of the radius of the cloud and grows with increasing interaction strengths as atoms and energy are transported by diffusive processes.
Ulrich Schneider, Lucia Hackermüller, Jens Philipp Ronzheimer, Sebastian Will, Simon Braun, Thorsten Best, Immanuel Bloch, Eugene Demler, Stephan Mandt, David Rasch, Achim Rosch
Transport properties are among the defining characteristics of many important phases in condensed matter physics. In the presence of strong correlations they are difficult to predict even for model systems like the Hubbard model. In real materials they are in general obscured by additional complications including impurities, lattice defects or multi-band effects. Ultracold atoms in contrast offer the possibility to study transport and out-of-equilibrium phenomena in a clean and well-controlled environment and can therefore act as a quantum simulator for condensed matter systems. Here we studied the expansion of an initially confined fermionic quantum gas in the lowest band of a homogeneous optical lattice. While we observe ballistic transport for non-interacting atoms, even small interactions render the expansion almost bimodal with a dramatically reduced expansion velocity. The dynamics is independent of the sign of the interaction, revealing a novel, dynamic symmetry of the Hubbard model.
Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Świątkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin
During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors.
Ruihan Yang, Yibo Yang, Joseph Marino, Stephan Mandt
Recent work by Marino et al. (2020) showed improved performance in sequential density estimation by combining masked autoregressive flows with hierarchical latent variable models. We draw a connection between such autoregressive generative models and the task of lossy video compression. Specifically, we view recent neural video compression methods (Lu et al., 2019; Yang et al., 2020b; Agustssonet al., 2020) as instances of a generalized stochastic temporal autoregressive transform, and propose avenues for enhancement based on this insight. Comprehensive evaluations on large-scale video data show improved rate-distortion performance over both state-of-the-art neural and conventional video compression methods.
Yibo Yang, Robert Bamler, Stephan Mandt
We consider the problem of lossy image compression with deep latent variable models. State-of-the-art methods build on hierarchical variational autoencoders (VAEs) and learn inference networks to predict a compressible latent representation of each data point. Drawing on the variational inference perspective on compression, we identify three approximation gaps which limit performance in the conventional approach: an amortization gap, a discretization gap, and a marginalization gap. We propose remedies for each of these three limitations based on ideas related to iterative inference, stochastic annealing for discrete optimization, and bits-back coding, resulting in the first application of bits-back coding to lossy compression. In our experiments, which include extensive baseline comparisons and ablation studies, we achieve new state-of-the-art performance on lossy image compression using an established VAE architecture, by changing only the inference method.
Yibo Yang, Robert Bamler, Stephan Mandt
We propose a novel algorithm for quantizing continuous latent representations in trained models. Our approach applies to deep probabilistic models, such as variational autoencoders (VAEs), and enables both data and model compression. Unlike current end-to-end neural compression methods that cater the model to a fixed quantization scheme, our algorithm separates model design and training from quantization. Consequently, our algorithm enables "plug-and-play" compression with variable rate-distortion trade-off, using a single trained model. Our algorithm can be seen as a novel extension of arithmetic coding to the continuous domain, and uses adaptive quantization accuracy based on estimates of posterior uncertainty. Our experimental results demonstrate the importance of taking into account posterior uncertainties, and show that image compression with the proposed algorithm outperforms JPEG over a wide range of bit rates using only a single standard VAE. Further experiments on Bayesian neural word embeddings demonstrate the versatility of the proposed method.
Stephan Mandt, David Blei
Stochastic variational inference (SVI) lets us scale up Bayesian computation to massive data. It uses stochastic optimization to fit a variational distribution, following easy-to-compute noisy natural gradients. As with most traditional stochastic optimization methods, SVI takes precautions to use unbiased stochastic gradients whose expectations are equal to the true gradients. In this paper, we explore the idea of following biased stochastic gradients in SVI. Our method replaces the natural gradient with a similarly constructed vector that uses a fixed-window moving average of some of its previous terms. We will demonstrate the many advantages of this technique. First, its computational cost is the same as for SVI and storage requirements only multiply by a constant factor. Second, it enjoys significant variance reduction over the unbiased estimates, smaller bias than averaged gradients, and leads to smaller mean-squared error against the full gradient. We test our method on latent Dirichlet allocation with three large corpora.
Stephan Mandt, Darius Sadri, Andrew A. Houck, Hakan E. Türeci
Oct 12, 2014·quant-ph·PDF The quantum dynamics of open many-body systems poses a challenge for computational approaches. Here we develop a stochastic scheme based on the positive P phase-space representation to study the nonequilibrium dynamics of coupled spin-boson networks that are driven and dissipative. Such problems are at the forefront of experimental research in cavity and solid state realizations of quantum optics, as well as cold atom physics, trapped ions and superconducting circuits. We demonstrate and test our method on a driven, dissipative two-site system, each site involving a spin coupled to a photonic mode, with photons hopping between the sites, where we find good agreement with Monte Carlo Wavefunction simulations. In addition to numerically reproducing features recently observed in an experiment [Phys. Rev. X 4, 031043 (2014)], we also predict a novel steady state quantum dynamical phase transition for an asymmetric configuration of drive and dissipation.
Stephan Mandt, James McInerney, Farhan Abrol, Rajesh Ranganath, David Blei
Variational inference (VI) combined with data subsampling enables approximate posterior inference over large data sets, but suffers from poor local optima. We first formulate a deterministic annealing approach for the generic class of conditionally conjugate exponential family models. This approach uses a decreasing temperature parameter which deterministically deforms the objective during the course of the optimization. A well-known drawback to this annealing approach is the choice of the cooling schedule. We therefore introduce variational tempering, a variational algorithm that introduces a temperature latent variable to the model. In contrast to related work in the Markov chain Monte Carlo literature, this algorithm results in adaptive annealing schedules. Lastly, we develop local variational tempering, which assigns a latent temperature to each data point; this allows for dynamic annealing that varies across data. Compared to the traditional VI, all proposed approaches find improved predictive likelihoods on held-out data.