Showing 1–14 of 14 results
/ Date/ Name
May 18, 2023How does the task complexity of masked pretraining objectives affect downstream performance?Jun 16, 2023How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in JapaneseFeb 16, 2024An Empirical Study on Cross-lingual Vocabulary Adaptation for Efficient Language Model InferenceJun 17, 2024How Can We Effectively Expand the Vocabulary of LLMs with 0.01GB of Target Language Text?Jan 6, 2026Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning TasksSep 4, 2021Frustratingly Simple Pretraining Alternatives to Masked Language ModelingDec 4, 2025Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded UpdatesOct 2, 2023appjsonify: An Academic Paper PDF-to-JSON Conversion ToolkitDec 16, 2024Adapting Chat Language Models Using Only Target Unlabeled Language DataMar 3, 2023Hitachi at SemEval-2023 Task 3: Exploring Cross-lingual Multi-task Strategies for Genre and Framing Detection in Online NewsDec 6, 2021Team Hitachi @ AutoMin 2021: Reference-free Automatic Minuting Pipeline with Argument Structure Construction over Topic-based SummarizationAug 11, 2023Learning Deductive Reasoning from Synthetic Corpus based on Formal LogicApr 15, 2026How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source DataNov 19, 2024Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus