Stochastic Gradient Descent as Approximate Bayesian Inference — arXiv2