Realistic Evaluation of Model Merging for Compositional Generalization — arXiv2