Hundreds of images of child sexual abuse found in dataset used to train AI image-generating tools

More than a thousand images of child sexual abuse material were found in a dataset used to train AI image-generating models, according to researchers. The presence of these images could facilitate the creation of realistic AI-generated images of child abuse or ‘deepfake’ images of exploited children. The study highlights concerns about the transparency and safety of training data for generative AI tools. The dataset examined by the researchers contains billions of images sourced from the internet, including social media and adult entertainment websites. Efforts to ensure safety filtering were acknowledged, but concerns remain about the inclusion of explicit content in web-scale datasets. The researchers recommend restricting such datasets to research settings and using curated and well-sourced datasets for public distribution models.