Showing 1–19 of 19 results
/ Date/ Name
May 14, 2023Towards Understanding and Improving Knowledge Distillation for Neural Machine TranslationJun 25, 2024Dual-Space Knowledge Distillation for Large Language ModelsDec 25, 2023Mixture Data for Training Cannot Ensure Out-of-distribution GeneralizationDec 25, 2023ShiftKD: Benchmarking Knowledge Distillation under Distribution ShiftMar 6, 2022Conditional Bilingual Mutual Information Based Adaptive Training for Neural Machine TranslationApr 15, 2025A Dual-Space Framework for General Knowledge Distillation of Large Language ModelsMar 4, 2025AlignDistil: Token-Level Language Model Alignment as Adaptive Policy DistillationMar 2, 2026KDFlow: A User-Friendly and Efficient Knowledge Distillation Framework for Large Language ModelsJun 24, 2024Multilingual Knowledge Editing with Language-Agnostic Factual NeuronsSep 10, 2025CM-Align: Consistency-based Multilingual Alignment for Large Language ModelsMar 18, 2026SCALE:Scalable Conditional Atlas-Level Endpoint transport for virtual cell perturbation predictionOct 8, 2025Think Natively: Unlocking Multilingual Reasoning with Consistency-Enhanced Reinforcement LearningDec 8, 2025M-STAR: Multi-Scale Spatiotemporal Autoregression for Human Mobility ModelingMay 28, 2025Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-ExpertsApr 30, 2026Benchmarking virtual cell models for in-the-wild perturbation responseDec 27, 2018Kerr-de Sitter and Kerr-anti-de Sitter black holes as accelerators for spinning particlesOct 20, 2023A Quality-based Syntactic Template Retriever for Syntactically-controlled Paraphrase GenerationOct 12, 2022Improved Data Augmentation for Translation SuggestionOct 26, 2024GATES: Graph Attention Network with Global Expression Fusion for Deciphering Spatial Transcriptome Architectures