"au:"Keshav Santhanam"" — arXiv2 Search

/ Date/ Name

/ Date/ Name

"au:"Keshav Santhanam"" — arXiv2 Search

Showing 1–18 of 18 results

/ Date/ Name

Mar 1, 2023UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers Nov 9, 2021DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution May 19, 2022PLAID: An Efficient Engine for Late Interaction Retrieval Dec 2, 2022Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking Mar 7, 2024Alto: Orchestrating Distributed Compound AI Systems with Nested Ancestry May 3, 2023Cheaply Evaluating Inference Efficiency Metrics for Autoregressive Transformer APIs Aug 20, 2025NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Dec 24, 2025NVIDIA Nemotron 3: Efficient and Open Intelligence Dec 23, 2025Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning Aug 20, 2020Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads Aug 16, 2021On the Opportunities and Risks of Foundation Models Nov 16, 2022Holistic Evaluation of Language Models Dec 28, 2022Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP Dec 2, 2021ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction Jun 23, 2025PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries Apr 4, 2025Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models Apr 21, 2025ColBERT-serve: Efficient Multi-Stage Memory-Mapped Scoring Oct 5, 2023DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines