Dual Diffusion Models for Multi-modal Guided 3D Avatar Generation — arXiv2