Showing 1–16 of 16 results
/ Date/ Name
Mar 19, 2025UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model EvaluationOct 19, 2025Uniworld-V2: Reinforce Image Editing with Diffusion Negative-aware Finetuning and MLLM Implicit FeedbackJan 11, 2024LLM-as-a-Coauthor: Can Mixed Human-Written and Machine-Generated Text Be Detected?Jun 20, 2023TrustGPT: A Benchmark for Trustworthy and Responsible Large Language ModelsJun 16, 2024GUI-World: A Video Benchmark and Dataset for Multimodal GUI-oriented UnderstandingNov 10, 2025REACT-LLM: A Benchmark for Evaluating LLM Integration with Causal Features in Clinical Prognostic TasksJun 27, 2024DataGen: Unified Synthetic Dataset Generation via Large Language ModelsSep 21, 2023A Knowledge-Driven Cross-view Contrastive Learning for EEG RepresentationOct 3, 2024Justice or Prejudice? Quantifying Biases in LLM-as-a-JudgeOct 4, 2023MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to UseJun 1, 2024HonestLLM: Toward an Honest and Helpful Large Language ModelJun 10, 2025AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety BasinFeb 7, 2024MLLM-as-a-Judge: Assessing Multimodal LLM-as-a-Judge with Vision-Language BenchmarkJul 6, 2025CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-stepFeb 20, 2025On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and PerspectiveJan 10, 2024TrustLLM: Trustworthiness in Large Language Models