Showing 1–20 of 25 results
/ Date/ Name
Mar 24, 2020A Simple Fix for Convolutional Neural Network via Coordinate EmbeddingOct 22, 2018Towards Universal Dialogue State TrackingJun 11, 2024Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language ModelingSep 2, 2019Scalable and Accurate Dialogue State Tracking via Hierarchical Sequence GenerationJun 19, 2023Sparse Modular Activation for Efficient Sequence ModelingMay 4, 2017Recurrent Soft Attention Model for Common Object RecognitionJul 9, 2025Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long GenerationMar 30, 2026Rethinking Language Model Scaling under Transferable Hypersphere OptimizationJun 27, 2023C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue EvaluationOct 23, 2022Language Model Pre-Training with Sparse Latent TypingDec 5, 2019RoNGBa: A Robustly Optimized Natural Gradient Boosting Training Approach with Leaf Number ClippingJun 30, 2021HySPA: Hybrid Span Generation for Scalable Text-to-Graph ExtractionApr 15, 2026Shuffle the Context: RoPE-Perturbed Self-Distillation for Long-Context AdaptationJun 22, 2025Routing Mamba: Scaling State Space Models with Mixture-of-Experts ProjectionJan 18, 2025BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft DialoguesApr 29, 2025Reinforcement Learning for Reasoning in Large Language Models with One Training ExampleApr 22, 2024Phi-3 Technical Report: A Highly Capable Language Model Locally on Your PhoneSep 30, 2025Understanding the Mixture-of-Experts with Nadaraya-Watson KernelApr 30, 2025Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in MathMay 22, 2025PaTH Attention: Position Encoding via Accumulating Householder Transformations