What Are You Doing? A Closer Look at Controllable Human Video Generation — arXiv2