Sarah Silverman, along with authors Christopher Golden and Richard Kadrey, has filed lawsuits against OpenAI and Meta, claiming that the companies trained their language models, ChatGPT and LLaMA, using copyrighted materials without permission. The complaints specifically focus on the "Books2" dataset used by OpenAI, which the plaintiffs argue could only have been sourced from "shadow libraries" that contain illegally available copyrighted content, such as Library Genesis and Sci-Hub.
One piece of evidence presented in Silverman's lawsuit involves a conversation between her legal team and ChatGPT, where the chatbot was able to summarize Silverman's memoir, "The Bedwetter," published in 2010. The passages shared by the chatbot seemed to be reproduced verbatim, raising concerns about copyright infringement.
This is not the first time OpenAI has faced legal challenges regarding copyright infringement. The company has been confronted with multiple lawsuits related to the training of its language models. In June, OpenAI was served with a class action lawsuit accusing the company of violating federal and state privacy laws by scraping data for training ChatGPT and DALL-E.