Showing 1–20 of 24 results
/ Date/ Name
Jun 30, 2025On the Predictive Power of Representation Dispersion in Language ModelsDec 31, 2024Chunk-Distilled Language ModelingDec 23, 2025Distilling to Hybrid Attention Models via KL-Guided Layer SelectionOct 21, 2025Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMsSep 2, 2023Self-Supervised Video Transformers for Isolated Sign Language RecognitionMar 25, 2025Context-Efficient Retrieval with Factual DecompositionApr 14, 2024When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language ModelsJan 16, 2012Wetting-induced budding of vesicles in contact with several aqueous phasesApr 14, 2025How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise GradientsJul 2, 2025DIY-MKG: An LLM-Based Polyglot Language Learning SystemApr 13, 2025SPICE: A Synergistic, Precise, Iterative, and Customizable Image Editing WorkflowOct 31, 2024What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient PerspectiveOct 31, 2025OKBench: Democratizing LLM Evaluation with Fully Automated, On-Demand, Open Knowledge BenchmarkingFeb 19, 2026Reverso: Efficient Time Series Foundation Models for Zero-shot ForecastingMar 4, 2026Why Are Linear RNNs More Parallelizable?Mar 18, 2016The joint distribution of the Parisian ruin time and the number of claims until Parisian ruin in the classical risk modelApr 28, 2026Training Transformers as a Universal ComputerMar 18, 2016Number of claims and ruin time for a refracted risk processMar 5, 2024Towards Training A Chinese Large Language Model for AnesthesiologyDec 14, 2023Learning from Polar Representation: An Extreme-Adaptive Model for Long-Term Time Series Forecasting