On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization — arXiv2