Attention Distillation for Learning Video Representations — arXiv2