Showing 1–20 of 21 results
/ Date/ Name
Jul 10, 2024Multilingual Blending: LLM Safety Alignment Evaluation with Language MixtureNov 8, 2021When Cyber-Physical Systems Meet AI: A Benchmark, an Evaluation, and a Way ForwardOct 22, 2023LUNA: A Model-Based Universal Analysis Framework for Large Language ModelsApr 7, 2018A Performance Analysis Model of TCP over Multiple Heterogeneous Paths for 5G Mobile ServicesAug 7, 2024AcTracer: Active Testing of Large Language Model via Multi-Stage SamplingApr 12, 2023AutoRepair: Automated Repair for AI-Enabled Cyber-Physical Systems under Safety-Critical ConditionsAug 20, 2024LeCov: Multi-level Testing Criteria for Large Language ModelsNov 29, 2024Understanding the Design Decisions of Retrieval-Augmented Generation SystemsSep 13, 2023Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in RoboticsAug 26, 2023ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task PlanningMay 6, 2023Mosaic: Model-based Safety Analysis Framework for AI-enabled Cyber-Physical SystemsJul 31, 2023Towards Building AI-CPS with NVIDIA Isaac Sim: An Industrial Benchmark and Case Study for Robotics ManipulationOct 15, 2025TRUSTVIS: A Multi-Dimensional Trustworthiness Evaluation Framework for Large Language ModelsAug 7, 2024MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical SystemsApr 12, 2024Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path ForwardJun 6, 2024GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process ModelSep 19, 2024VLATest: Testing and Evaluating Vision-Language-Action Models for Robotic ManipulationOct 7, 2024LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic ManipulationJul 16, 2023Look Before You Leap: An Exploratory Study of Uncertainty Measurement for Large Language ModelsJul 13, 2025Evaluating LLMs on Sequential API Call Through Automated Test Generation