Showing 1–19 of 19 results
/ Date/ Name
Aug 24, 2020Example-Based Named Entity RecognitionJan 24, 2020MT-BioNER: Multi-task Learning for Biomedical Named Entity Recognition using Deep Bidirectional TransformersJun 26, 2025Mind2Web 2: Evaluating Agentic Search with Agent-as-a-JudgeMay 28, 2025RAGPPI: RAG Benchmark for Protein-Protein Interactions in Drug DiscoveryMay 29, 2024Certifying Counterfactual Bias in LLMsJan 27, 2022Reasoning Like Program ExecutorsMar 27, 2025MAVERIX: Multimodal Audio-Visual Evaluation and Recognition IndeXOct 20, 2025OPTAGENT: Optimizing Multi-Agent LLM Interactions Through Verbal Reinforcement Learning for Enhanced ReasoningSep 29, 2025BeyondBench: Contamination-Resistant Evaluation of Reasoning in Language ModelsJun 1, 2025RARE: Retrieval-Aware Robustness Evaluation for Retrieval-Augmented Generation SystemsNov 20, 2025JudgeBoard: Benchmarking and Enhancing Small Language Models for Reasoning EvaluationMar 17, 2025The Amazon Nova Family of Models: Technical Report and Model CardMay 28, 2016Spatial Phase and Amplitude Structuring of Beams Using a Combination of Multiple Orthogonal Spatial Functions with Complex CoefficientsFeb 5, 2026InterPrior: Scaling Generative Control for Physics-Based Human-Object InteractionsApr 24, 2026C-MORAL: Controllable Multi-Objective Molecular Optimization with Reinforcement Alignment for LLMsMar 3, 2024Partial Federated LearningJul 16, 2021TAPEX: Table Pre-training via Learning a Neural SQL ExecutorNov 7, 2025VMDT: Decoding the Trustworthiness of Video Foundation ModelsJul 7, 2025Deep Research Comparator: A Platform For Fine-grained Human Annotations of Deep Research Agents