Showing 1–12 of 12 results
/ Date/ Name
Oct 29, 2024Characterizing the Role of Similarity in the Property Inferences of Language ModelsApr 12, 2025Parameterized Synthetic Text Generation with SimpleStoriesApr 15, 2025RankAlign: A Ranking View of the Generator-Validator Gap in Large Language ModelsSep 16, 2023X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across ParagraphsMay 14, 2025KRISTEVA: Close Reading as a Novel Task for Benchmarking Interpretive ReasoningOct 26, 2023Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike WaysFeb 2, 2021The GEM Benchmark: Natural Language Generation, its Evaluation and MetricsSep 18, 2024To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoningDec 6, 2021NL-Augmenter: A Framework for Task-Sensitive Natural Language AugmentationMar 2, 2023WiCE: Real-World Entailment for Claims in WikipediaAug 16, 2021Reusable Templates and Guides For Documenting Datasets and Models for Natural Language Processing and Generation: A Case Study of the HuggingFace and GEM Data and Model CardsMay 19, 2025ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models