End-to-End Training for Autoregressive Video Diffusion via Self-Resampling — arXiv2