Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction — arXiv2