Language Model Self-improvement by Reinforcement Learning Contemplation — arXiv2