Showing 1–18 of 18 results
/ Date/ Name
Mar 1, 2023UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of RerankersNov 9, 2021DistIR: An Intermediate Representation and Simulator for Efficient Neural Network DistributionMay 19, 2022PLAID: An Efficient Engine for Late Interaction RetrievalDec 2, 2022Moving Beyond Downstream Task Accuracy for Information Retrieval BenchmarkingMar 7, 2024Alto: Orchestrating Distributed Compound AI Systems with Nested AncestryMay 3, 2023Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIsAug 20, 2025NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning ModelDec 24, 2025NVIDIA Nemotron 3: Efficient and Open IntelligenceDec 23, 2025Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic ReasoningAug 20, 2020Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning WorkloadsAug 16, 2021On the Opportunities and Risks of Foundation ModelsNov 16, 2022Holistic Evaluation of Language ModelsDec 28, 2022Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLPDec 2, 2021ColBERTv2: Effective and Efficient Retrieval via Lightweight Late InteractionJun 23, 2025PARALLELPROMPT: Extracting Parallelism from Large Language Model QueriesApr 4, 2025Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer ModelsApr 21, 2025ColBERT-serve: Efficient Multi-Stage Memory-Mapped ScoringOct 5, 2023DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines