Learning World Models for Interactive Video Generation — arXiv2