Borrow First, Justify Later: AI’s Copyright Playbook

Jun 24, 2025 | Law Student Blog,

Borrow First, Justify Later: AI’s Copyright Playbook

By Omowunmi (Wunmi) Odeja, Washington University School of Law, Juris Doctor Candidate, 2027

Meta Platforms, Inc. has been sued by 13 authors for allegedly using pirated copies of their novels taken from online shadow libraries like LibGen and Books3 to train its AI model, LLaMa. As seen in filings from the case Richard Kadrey et. al. v. Meta Platforms, exhibits produced by the plaintiffs detail Meta’s decision to train its model on a database that contains more than 7 million such books. Becoming increasingly common is the trend of AI developers skipping the permission/licensing stage and falling back on a copyright fair use defense if challenged, after the fact. AI companies, such as OpenAI, have now claimed that their tools would not exist without training on copyrighted materials, and that the use of these materials is essential to their AI models. This blog explores that strategy through the Kadrey and Thomas Reuters v. ROSS Intelligence lawsuits.

The use of copyrighted works in the development of Generative AI systems draws on massive troves of data to enable AI to learn patterns, mimic human expression, and generate new content appearing to be original on the surface. Generative AI is a type of artificial intelligence that can create new content in response to prompts, and its models, particularly large language models (LLMs), rely heavily on analyzing massive amounts of textual data. To ensure these models are effective for the public, they process vast amounts of text-based knowledge, including books, various forms of journalism, and even headnotes, which are protected by copyright law. Unauthorized copying of protected works is copyright infringement unless a valid defense applies.

Fair use defense permits unlicensed use of copyright-protected works in certain circumstances. Section 107 of the Copyright Act outlines four factors that courts must consider when determining whether a use qualifies as fair use. These factors include the purpose and character of the use, the nature of the copyrighted work, the amount and sustainability of the portion used and the effect of the use on the market. AI developers have increasingly tried to rely on the “fair use” defense, historically considered by courts in assessing technologies involving copying or indexing copyrighted materials. These fair use disputes are among the most significant intellectual property cases, and their outcomes may influence AI development and redefine how copyright law applies in the age of AI.

So, does this kind of use to train AI amount to copyright infringement, or is it a type of “copying” protected by fair use? The Delaware federal district court in Thomas Reuters v. ROSS Intelligence recently rejected ROSS Intelligence’s claimed fair use defense, holding that the AI-powered legal research tool infringed the copyrights of over 2,000 Thomas Reuters’ Westlaw headnotes used in training its data. Perhaps the most revealing part of this saga is that ROSS did ask Thomas Reuters for permission to use the Westlaw content in question, but after being turned down, it simply went ahead with the use anyway — boldly invoking fair use after the fact. The fair use doctrine has also been cited in the more recent California federal court case of Kadrey et al. v. Meta, where Meta asserted that training the LLaMa model on copyrighted books without authorization constituted transformative fair use and did not replace the original works. There has been no final decision in (or settlement of) the Kadrey case yet.

Copyright holders have countered this claimed fair use defense by arguing that AI training is exploitative rather than transformative. These AI systems are copying and exploiting works to generate competing content, they say, which threatens the livelihoods of creators by devaluing their creations. If courts begin to rule in favor of AI companies relying on fair use in these circumstances, it would essentially create an avenue for the displacement of original works in the market argue such copyright holders, and if original works are being used without fair compensation, it could potentially affect creativity, which intellectual property law aims to protect. On the other hand, the counter argument is that if AI companies use thousands, if not millions of original works to create content, how realistic is it for the law to require that each creator be compensated? Such an approach would in turn stifle AI innovation by creating financial burdens on accessing training data needed to create generative AI, it is argued.

Is the traditional fair use copyright approach equipped to deal with this level of copying, and at what point does AI’s remixing of millions of original works become transformative enough to justify bypassing permission? As AI tools continue advancing, the original notion of fair use is challenged, and its definition is complicated. As the future remains uncertain for creators, one thing that remains clear is that whatever decisions courts make in this area will set new precedents both in copyright law and AI innovation. The court in ROSS Intelligence, found that ROSS’s use was not transformative because it did not have a “further purpose or different character” from the headnotes. The court observed that ROSS intended to develop a legal research tool akin to Westlaw, leading to competition between the two. These facts don’t apply in Kadrey however because Meta does not aim to replace original works in the market, nor does LLaMa intend to compete with these authors. Could this result in inconsistent rulings?

What is clear is that creatives may not wish to willingly let AI companies use their works to train their models, and AI companies may not be willing to pay millions of dollars to creators for this. The courts are likely to decide how this fair use battle ends– and it could go all the way to the U.S. Supreme Court.  On the one hand, if AI companies won’t pay creatives in these scenarios but the courts determine such unauthorized use constitutes copyright infringement, the courts could make them do so. On the other hand, AI companies may continue using copyrighted works this way without obtaining licenses if the courts ultimately rule in their favor and accept the fair use defense in these scenarios. Either way would set a powerful precedent—one that defines the boundaries of copyright in the age of machine learning. 

While the courts are currently in the process of shaping the legal boundaries of AI and copyright through litigation, perhaps it is time for Congress to take the lead by proposing new copyright legislation aimed at clarifying the fair use of copyrighted works in AI training, whether to ease restrictions for AI developers or to impose new safeguards protecting authors and creators. In the long run, a comprehensive legislative framework might offer more clarity and certainty than case-by-case court rulings.

Our summer associate program is supported by:

The Bar Plan logo