Multimodal Few-Shot Learning with Frozen Language Models — arXiv2