Showing 1–20 of 25 results
/ Date/ Name
Apr 4, 2026Measuring the Permission Gate: A Stress-Test Evaluation of Claude Code's Auto ModeMar 31, 2026SkillReducer: Optimizing LLM Agent Skills for Token EfficiencyMar 22, 2026WARBENCH: A Comprehensive Benchmark for Evaluating LLMs in Military Decision-MakingJan 26, 2026VIBEVOICE-ASR Technical ReportJan 5, 2026NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and GenerationDec 15, 2025Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation ModelNov 13, 2025Time-Layer Adaptive Alignment for Speaker Similarity in Flow-Matching Based Zero-Shot TTSSep 6, 2025Red-Teaming Coding Agents from a Tool-Invocation Perspective: An Empirical Security AssessmentJun 20, 2025Differentiation-Based Extraction of Proprietary Data from Fine-Tuned LLMsJun 11, 2025Reasoning as a Resource: Optimizing Fast and Slow Thinking in Code Generation ModelsApr 10, 2025Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement LearningMar 23, 2025STShield: Single-Token Sentinel for Real-Time Jailbreak Detection in Large Language ModelsAug 15, 2024API-guided Dataset Synthesis to Finetune Large Code ModelsJun 8, 2024SelfDefend: LLMs Can Defend Themselves against Jailbreaking in a Practical MannerJan 27, 2024An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial ScenariosDec 7, 2023VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal ModelsOct 10, 2023Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric ApproachOct 10, 2023Refining Decompiled C Code with Large Language ModelsSep 29, 2023Split and Merge: Aligning Position Biases in LLM-based EvaluatorsSep 25, 2023AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data