Fast Policy Learning through Imitation and Reinforcement — arXiv2