Authors sue Meta and OpenAI for copyright infringement

Michael Chabon and others claim Meta and OpenAI used their published works to train large language AI models.

September 13, 2023

Left: Meta logo, photo by Chesnot/Getty Images. Right: OpenAI logo, image by Andrew Neel via Pexel.

Michael Chabon and four other acclaimed authors have brought class action lawsuits against Meta and OpenAI (the creators of ChatGPT), claiming copyrighted works by themselves and other writers were used to train the companies’ large language AI models, court documents reviewed by The FADER confirm. Along with David Henry Hwang, Matthew Klam, Rachel Louise Snyder, and Ayelet Waldman — and “on behalf of all others similarly situated” — Chabon filed a complaint demanding a federal jury trial in the Northern District of California against Meta on Tuesday, September 12, having already filed a similar suit against OpenAI this past Friday, September 8.

Their argument is laid out as follows: Meta’s LLaMA (Large Language Model AI) and OpenAI’s ChatGPT are trained by copying huge amounts of text to extract expressive information. These models, therefore, are “uniquely reliant” on the text (termed the “training dataset”). Chabon, Hwang, Klam, Snyder, Waldman, and other authors did not consent for their published works to be ingested into LLaMA or ChatGPT as part of their training datasets, but Meta and OpenAI used them anyway.

As evidence for that final claim — the key component of their lawsuit — the authors point to ChatGPT’s ability to summarize and analyze the contents of their books in great detail, and to Meta’s self-admitted use of ThePile, a publicly available database that sources its 240 gigabytes of book text from a “shadow library” called Bibliotik, which the complaint calls “flagrantly illegal.”

The new lawsuits come amid a rash of copyright infringement claims aimed at the companies behind generative AI models. In music, the complaints have generally emerged when such models have been trained to mimic a singer’s distinct vocal patterns. The most publicized case of this occurred when a producer going by Ghostwriter used generative AI to recreate Drake’s and The Weeknd’s voices and employed them to make a viral hit. Universal Music Group subsequently had the track pulled from digital streaming platforms, and Recording Academy CEO Harvey Mason said last week that it would be ineligible for Grammys consideration, after saying the opposite two days earlier. UMG has also lobbied congress to enact stricter regulations on generative AI into federal law.

In addition to their large language models, Meta and OpenAI both have generative AI music models in the works (AudioCraft and Jukebox, respectively.) Meta did not immediately respond to The FADER’s request for comment, and requests for comment sent to accounts listed as the general press inquiry email and an individual press contact for OpenAI on the company’s website failed to deliver. Find the lawsuits against both companies below, via The Holltywood Reporter and The Register.

Authors sue Meta and OpenAI for copyright infringement

Michael Chabon and others claim Meta and OpenAI used their published works to train large language AI models.

Recommended