MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction — arXiv2