Learning Transferable Visual Models From Natural Language Supervision — arXiv2