Improving Retrospective Language Agents via Joint Policy Gradient Optimization — arXiv2