Cross-modal Representation Learning for Zero-shot Action Recognition — arXiv2