Showing 1–20 of 20 results
/ Date/ Name
Mar 23, 2026Seeing is Improving: Visual Feedback for Iterative Text Layout RefinementApr 9, 2025PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text RenderingMar 20, 2025Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language ModelsJun 21, 2024Pistis-RAG: Enhancing Retrieval-Augmented Generation with Human FeedbackJun 17, 2024Hallucination Mitigation Prompts Long-term Video UnderstandingMay 9, 2024Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text RecognitionMay 7, 2024Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and EditingOct 12, 2023Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video RetrievalOct 8, 2023Symmetrical Linguistic Feature Distillation with CLIP for Scene Text RecognitionJul 6, 2023MomentDiff: Generative Video Moment Retrieval from Random to RealMay 9, 2023TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text RecognitionMay 9, 2023Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text RecognitionOct 12, 2022Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small DatasetsSep 2, 2022Geometry Aligned Variational Transformer for Image-conditioned Layout GenerationSep 1, 2022REMOT: A Region-to-Whole Framework for Realistic Human Motion TransferAug 22, 2021From Two to One: A New Scene Text Recognizer with Visual Language Modeling NetworkJun 13, 2021Cross-Modal Attention Consistency for Video-Audio Unsupervised LearningApr 1, 2020Graph Structured Network for Image-Text MatchingMar 30, 2020Multi-Objective Matrix Normalization for Fine-grained Visual RecognitionAug 23, 2019ACE-Net: Biomedical Image Segmentation with Augmented Contracting and Expansive Paths