Improving On-policy Learning with Statistical Reward Accumulation — arXiv2