Showing 561–580 of 1,726 results
/ Date/ Name
Apr 12, 2025REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero- and Few-shot Neurodegenerative DiagnosisApr 12, 2025BioChemInsight: An Online Platform for Automated Extraction of Chemical Structures and Activity Data from PatentsApr 11, 2025Evaluation and Incident Prevention in an Enterprise AI AssistantApr 10, 2025Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement LearningApr 10, 2025Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUsApr 10, 2025Plan-and-Refine: Diverse and Comprehensive Retrieval-Augmented GenerationApr 9, 2025R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE AgentsApr 9, 2025Kaleidoscope: In-language Exams for Massively Multilingual Vision EvaluationApr 8, 2025Generative Framework for Personalized Persuasion: Inferring Causal, Counterfactual, and Latent KnowledgeApr 4, 2025YaleNLP @ PerAnsSumm 2025: Multi-Perspective Integration via Mixture-of-Agents for Enhanced Healthcare QA SummarizationApr 4, 2025AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference DatasetApr 2, 2025On the Role of Feedback in Test-Time Scaling of Agentic AI WorkflowsApr 2, 2025DeepSeek-R1 Thoughtology: Let's think about LLM ReasoningApr 1, 2025WikiVideo: Article Generation from Multiple VideosMar 29, 2025Efficient Adaptation For Remote Sensing Visual GroundingMar 29, 2025FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight ResearchMar 25, 2025Gemma 3 Technical ReportMar 24, 2025Language Model Uncertainty Quantification with Attention ChainMar 23, 2025STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language ModelsMar 21, 2025Summarization Metrics for Spanish and Basque: Do Automatic Scores and LLM-Judges Correlate with Humans?