Towards Enhanced Image Generation Via Multi-modal Chain of Thought in Unified Generative Models — arXiv2