GeneVA: A Dataset of Human Annotations for Generative Text to Video Artifacts
/ Authors
/ Abstract
Video generated by the current state-of-the-art generative models contain undesirable artifacts. We introduce GeneVA, the first large-scale dataset of human-annotated artifact bounding boxes in AI-generated videos. The dataset consists of 16,356 AI-generated videos, each labeled by a human annotator with per-frame artifact bounding boxes, their labels and descriptions, and video quality ratings. A custom data collection pipeline was developed in Prolific, and a novel taxonomy for spatio-temporal artifacts present in AI-generated videos was defined. The videos were from the VidProM [41] dataset, with text prompts from this dataset then used to generate an additional subset of videos using Sora. We trained an artifact detector and caption generator using a pre-trained image-based model, and a custom temporal fusion module. The dataset can be found at https://www.immersivecomputinglab.org/publication/geneva. We hope that datasets like GeneVA will encourage improvements in artifact detection in AI-generated video towards applications such as deepfake detection.
Journal: 2026 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)