Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model — arXiv2