Showing 1–15 of 15 results
/ Date/ Name
Apr 9, 2023Slide-Transformer: Hierarchical Vision Transformer with Local Self-AttentionOct 17, 2022Contrastive Language-Image Pre-Training with Knowledge GraphsApr 21, 2023Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision TransformersFeb 12, 2026On-Policy Context Distillation for Language ModelsOct 7, 2024Differential TransformerNov 13, 2025Black-Box On-Policy Distillation of Large Language ModelsMar 17, 2026Online Experiential Learning for Language ModelsDec 6, 2023FAAC: Facial Animation Generation with Anchor Frame and Conditional Control for Superior Fidelity and EditabilityDec 14, 2023Agent Attention: On the Integration of Softmax and Linear AttentionJun 10, 2025SeerAttention-R: Sparse Attention Adaptation for Long ReasoningJun 4, 2025Rectified Sparse AttentionNov 18, 2025Step by Step NetworkApr 1, 2026Universal YOCO for Efficient Depth ScalingNov 2, 2025Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise DifferentialsJun 9, 2025Reinforcement Pre-Training