Several authors sue OpenAI and Meta for copyright violation

American authors Christopher Golden, Sarah Silverman and Richard Kadrey have sued OpenAI and Meta before two United States district courts, for copyright violation. The lawsuits allege, among other things, that both OpenAI’s ChatGPT and Meta’s LlaMA were trained on data sets obtained in the same way and containing their works.

According to the plaintiffs, OpenAI and Meta obtained their works through online web libraries that serve Internet users to obtain digital versions of books in unorthodox ways, such as through Z-Library. In addition, they pointed out that the books are «available raw via torrent systems«.

All three have presented evidence that when requests are made to ChatGPT, the model will display a summary of their books, which violates their copyright. Among the books that ChatGPT is able to provide summaries of are Silverman’s Bedwetter; Golden’s Ararat and Kadrey’s Sandman Slim. The lawsuit also points out that the chatbot never displayed the notice about the prohibition of its reproduction as published works. In other words, they do not show the information on copyright management that the plaintiffs include in the works they publish.

Regarding the lawsuit filed by the three against Meta, it indicates that Meta was able to access the books of the plaintiff authors to train their LlaMA models. It indicates the reasons that lead the plaintiffs to believe that the data sets that were used in the training.

Among them, the sources of the data sets are cited. Of these, one is ThePile, launched by the company EleutherAI. The Pile, according to the lawsuit, is described in an EleutherAI document as being created from “a copy of the contents of the tracked private Bibliotik«, among other libraries that, according to the demand, are «flagrantly illegal«.

In both lawsuits, the authors claim that they “did not consent to the use of their copyrighted books for use as training material.” Each of the lawsuits, on the other hand, details six crimes for various types of copyright violations, negligence, unjust enrichment and unfair competition. For this reason, they request, among other things, damages and the restitution of benefits.

The lawyers for the three authors are Joseph Savery and Matthe Butterick, who on their website highlight that they are receiving news of “writers, authors, and publishers who are concerned about ChatGPT’s amazing ability to generate text similar to what can be found in copyright-protected text materials. Among them, thousands of books«.

Saveri has also filed various lawsuits against AI companies initiated by various artists and developers. He and Butterick are also representing authors Mona Award and Paul Tremblay in a similar case. In addition, Getty Images has also sued Stability AI, makers of the Stable Diffusion imaging tool, for training their model on copyright-protected images. OpenAI also has to face lawsuits for training ChatGPT with personal data without their owners having given their consent.

More authors and creators are likely to sue companies that have used their works to train their models without permission or compensation, leading, depending on the verdicts, to AI companies that engage in model building and training have to be more careful with the data they use for that task.