GEPO: Group Expectation Policy Optimization for Stable Heterogeneous Reinforcement Learning — arXiv2