Kronecker Mask and Interpretive Prompts are Language-Action Video Learners — arXiv2