Forecasting Human-Object Interaction: Joint Prediction of Motor Attention and Actions in First Person Video — arXiv2