Learning Robust Global Representations by Penalizing Local Predictive Power — arXiv2