Learning Generalizable Visual Representations via Interactive Gameplay — arXiv2