Showing 1–14 of 14 results
/ Date/ Name
Mar 19, 2026RPiAE: A Representation-Pivoted Autoencoder Enhancing Both Image Generation and EditingOct 31, 2025RzenEmbed: Towards Comprehensive Multimodal RetrievalOct 13, 2025FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment ModelAug 20, 2025CTA-Flux: Integrating Chinese Cultural Semantics into High-Quality English Text-to-Image CommunitiesAug 14, 2025NanoControl: A Lightweight Framework for Precise and Efficient Control in Diffusion TransformerAug 7, 2025FLUX-Makeup: High-Fidelity, Identity-Consistent, and Robust Makeup Transfer via Diffusion TransformerJul 24, 2025LMM-Det: Make Large Multimodal Models Excel in Object DetectionMay 8, 2025FG-CLIP: Fine-Grained Visual and Textual AlignmentMar 13, 2025PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language ModelsMar 12, 2025NAMI: Efficient Image Generation via Bridged Progressive Rectified Flow TransformersMar 11, 2025U-StyDiT: Ultra-high Quality Artistic Style Transfer Using Diffusion TransformersSep 6, 2024Qihoo-T2X: An Efficient Proxy-Tokenized Diffusion Transformer for Text-to-Any-TaskAug 23, 2024IAA: Inner-Adaptor Architecture Empowers Frozen Large Language Model with Multimodal CapabilitiesSep 2, 2023Bridge Diffusion Model: Bridge Chinese Text-to-Image Diffusion Model with English Communities