Composing Concepts from Images and Videos via Concept-prompt Binding — arXiv2