Fair Use or Foul Play? The Legal Boundaries of AI Learning in the Age of Copyright

Anish Sinha

Legal AI that cites and formats correctly

Introduction

Can a machine truly “read” books the way humans do, absorbing language, learning structure, and building intelligence without crossing the lines of copyright law? And if it can, where exactly should the law draw that line? These aren’t just philosophical musings anymore they’re questions courts are now being asked to answer. In June 2025, the U.S. District Court for the Northern District of California delivered a landmark decision in a copyright infringement case involving Anthropic, the AI startup backed by Amazon.[1]

Recently on 23^rd June 2025, U.S. District Court for the Northern District of California delivered a pivotal ruling in Bartz v. Anthropic[2], a case that has already begun reshaping legal conversations around artificial intelligence and copyright. The plaintiffs’ authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, along with their associated corporate entities filed suit against Anthropic, the developer of the Claude language model. Their claim was that Anthropic unlawfully used their copyrighted books to train its AI, both by purchasing and digitizing physical copies, and by downloading pirated versions from shadow libraries.

What made this case legally novel was its focus on how content was acquired, not just how it was used. Was it lawful to scan and ingest purchased books into a language model? Could downloading those same books from piracy sites ever be justified if the end use was still transformative?

Judge William Alsup offered a carefully drawn distinction. He ruled that training AI models on lawfully purchased and scanned books qualified as fair use because the purpose was transformative. But the use of pirated books downloaded en masse from sites like Library Genesis and Pirate Library Mirror was not protected, and Anthropic must now face a jury trial on that issue. The judgment defines a workable line: innovation, yes but only through legal channels.

The Legal Case for Transformative Use

Judge Alsup’s analysis in Bartz v. Anthropic rested heavily on the foundational principle of transformative use, the first and most significant factor in the four-part fair use test[3]. At its core, transformative use asks whether the secondary use adds something new, with a different purpose or character, altering the original work with new expression, meaning, or message. The court concluded that Anthropic’s use of copyrighted books in training its large language model, Claude, met this standard. According to the opinion, Anthropic did not reproduce or distribute the books in any conventional sense, nor did it make them available to the public in whole or in part. Instead, the books were ingested into a training pipeline to teach the model the statistical structure and relationships of language enabling it to generate novel, unpredictable outputs not traceable to any specific book or passage. Judge Alsup compared this process to a human author who reads extensively not to plagiarize, but to learn how to write more effectively as a conceptual framing that has previously found support in fair use jurisprudence.

Importantly, the court applied all four statutory fair use factors and found that three of the four factors weighed in Anthropic’s favor. First, the purpose and character of the use was clearly transformative, as the Claude model was trained to develop general language capacity, not to replicate or redistribute the plaintiffs’ books. Second, although the nature of the copyrighted works with creative fiction and nonfiction typically favors the copyright holder, this factor was outweighed by the transformative purpose of the use. The third factor, the amount and substantiality of the portion used, was not strictly quantified, but the court accepted that large-scale ingestion of full works was necessary to achieve the transformative result. And finally, the effect on the potential market for the original works was found to be minimal or nonexistent. Claude does not provide access to the books themselves, nor does it function as a substitute product in any conceivable market.

The court also placed significant weight on the manner in which Anthropic had acquired and handled the texts. The company had lawfully purchased physical copies of the books, scanned them internally for training purposes, and destroyed the originals after digitization. This process, in Judge Alsup’s view, demonstrated a conscious attempt to avoid competition with the original works or the book market more broadly. He noted that the texts were never distributed, shared, or retained in any user-facing product. Instead, their value lay in helping the model develop a generalized understanding of syntax, semantics, and narrative structure as an application fundamentally different from conventional reading or copying.

In drawing these conclusions, Judge Alsup invoked the precedent set in Authors Guild v. Google[4], where the Second Circuit upheld the legality of scanning entire books to build a searchable database. In that case, the court reasoned that Google’s Book Search project served a non-expressive, information-retrieval function that added new utility to the original works without displacing them in the market. Similarly, in Bartz, the Claude model was not designed to express the books’ messages but to abstract their linguistic patterns into a computational framework. The court viewed this as a form of machine learning that, while data-dependent, did not usurp the expressive or economic core of the plaintiffs’ works. Thus, the use was not only transformative but it was, in the court’s words, “exceedingly transformative,” warranting protection under the fair use doctrine.

Where the Law Draws the Line: Pirated Books and Liability

In this case, the court made clear that the boundary between innovation and infringement is not erased by transformative intent. While training AI models on lawfully acquired texts was upheld as fair use, Anthropic’s use of pirated books was firmly condemned. The court found that downloading millions of unauthorized copies from pirate sites like Books3 and LibGen despite ample legal alternatives constituted outright infringement. As Judge Alsup wrote, “There is no carveout… from the Copyright Act for AI companies”. The mere fact that some pirated books were later used to train LLMs did not retroactively cleanse the initial act of acquiring them unlawfully.

The court drew a crucial distinction that training on books was one specific use, but creating and maintaining a massive “central library” of pirated works was an entirely separate and impermissible one. Anthropic’s decision to retain even unused pirated titles indefinitely, in case they might prove useful someday, revealed a purpose that mirrored traditional library functions not transformative repurposing. “Pirating copies to build a research library… was its own use — and not a transformative one,” the court stated plainly

This stance also dismantled the tech firm’s argument that transformative ends justify infringing means. The court emphasized that copyright law protects not just outputs, but how inputs are acquired. “You can’t just bless yourself by saying ‘I have a research purpose’ and… take any textbook you want,” the court noted at oral argument as a direct rebuke of the idea that downstream innovation excuses upstream theft.

By holding Anthropic liable for the pirated copies, the court affirmed a vital principle: fair use is a shield for creativity, not a license for piracy. Even in the age of AI, creators are entitled to control how their works enter digital ecosystems. The future of machine learning may be expansive, but as this judgment makes clear, it cannot be built on stolen foundations.

Other most emphasizing elements of the Bartz v. Anthropic ruling is its exploration of how Anthropic circumvented the traditional licensing process, not by necessity, but by design. Rather than attempt serious negotiations with rights holders, the company actively avoided the "legal/practice/business slog" of obtaining permissions. As the court noted, Anthropic initially “had many places from which it could have purchased books, but it preferred to steal them”.[5] This admission was taken from internal communications which fundamentally undercuts any argument that the company was boxed into piracy by market constraints. The issue was never accessed and it was deliberate evasion of consent.

This context shaped the court’s analysis of licensing efforts as more performative than sincere. Anthropic had hired a former head of Google Books partnerships, tasked with acquiring “all the books in the world” while still avoiding licensing complexity. While this executive sent a few exploratory emails to major publishers, the conversations were allowed to “wither,” and the company pivoted instead to bulk-buying print books and scanning them without securing digital rights. In doing so, Anthropic reduced the role of authorial consent to a procedural obstacle, something to be circumvented rather than respected. This strategic avoidance, the court implied, exposed a deeper fragility in relying on informal licensing overtures as evidence of good faith.

Judge Alsup was unambiguous in his view that consent is not optional under the Copyright Act. Even when discussing the destructively scanned books that were purchased in print, he clarified that digitizing them without a separate license was only permissible because the digital copy replaced a lawfully acquired physical one, and was neither shared nor multiplied. “There is no carveout… from the Copyright Act for AI companies,” the judge reminded, adding that merely possessing a transformative intention does not confer the right to bypass consent. He framed this principle within broader constitutional terms, citing Kirtsaeng[6] to argue that copyright exists to incentivize creation, not to maximize profit for downstream tech users.

What Bartz ultimately reveals is a deep tension at the center of copyright law in the age of artificial intelligence. Licensing regimes, though often flawed, still function as the legal and moral foundation through which authors retain control over their creative work. When companies attempt to sideline consent by invoking the pursuit of innovation, they risk dismantling that very foundation. Judge Alsup’s ruling serves as a clear reminder that in the urgency to build transformative technologies, developers must not overlook the equally vital role of permission. Although licensing may seem tedious or inconvenient, it remains the structural framework that enables fair creative exchange, and neglecting it is not only unlawful but also harmful to the integrity of the entire system.

Answering the Question: What Can a Machine Lawfully Learn?

The Bartz v. Anthropic judgment provides perhaps the clearest judicial response yet to the question animating modern copyright law that is “What can a machine lawfully learn?” and the answer, as crafted by Judge Alsup, is both nuanced and firm. A machine like a human can learn from copyrighted works so long as that learning transforms the material into something new, does not usurp the market for the original, and is not built upon illegal acquisition. The act of learning, in itself, is not infringement; the legality turns on how the source material is obtained and how it is used.

The court drew a fine but powerful distinction. Training an LLM on copyrighted books when done using lawfully obtained copies, in service of generating novel outputs, and with safeguards against verbatim reproduction is a fair use. “Like any reader aspiring to be a writer,” This is the legal core of machine learning’s legitimacy, not that machines mimic, but that they model, abstract, and generate anew.

But the judgment also laid down unmistakable limits that machines may learn but they may not steal in order to learn. Pirating books to avoid paying authors, and hoarding them in a permanent “research library” to use as needed, was declared impermissible. As the court put it sharply: “Pirating copies to build a research library… was its own use and not a transformative one”. In this, the ruling affirms a foundational truth: that lawful learning by machines must be rooted in lawful respect for the rights of those whose works they study.

In closing, Bartz doesn’t just answer the question of what a machine can lawfully learn but it reframes it. The issue is not whether machines can learn, but whether the legal system will insist they do so with integrity. Fair use opens the door to innovation; piracy slams it shut on fairness. In drawing that line, the court offers a path forward: one where machines may indeed become our most powerful readers and writers but only by reading with consent, and writing with originality.

[1] https://www.reuters.com/legal/litigation/anthropic-wins-key-ruling-ai-authors-copyright-lawsuit-2025-06-24/

[2] No. C 24-05417 WHA

[3] 17 U.S.C. § 107: Section of the US Copyright law

[4] 804 F.3d 202, 217 (2d Cir. 2015)

[5] Page 2 of judgment copy

[6] 568 U.S. 519, 552 (2013)

Download

Bartz-v-Anthropic-Order-on-Fair-Use-6-23-25

Bartz-v-Anthropic-Order-on-Fair-Use-6-23-25.pdf

301 KB

§ Related Reading

Written by Anish Sinha

Topics

columns Bartz v. Anthropic US AI-Generated Content AI and Copyright fair use Copyright AI Training claude piracy Transformative Use licensing consent Anthropic Copyright Act pirated books machine learning U.S. District Court Lawful Acquisition 2025

Fair Use or Foul Play? The Legal Boundaries of AI Learning in the Age of Copyright

Introduction

The Legal Case for Transformative Use

Where the Law Draws the Line: Pirated Books and Liability

The Fragility of Consent and the Licensing Question

Answering the Question: What Can a Machine Lawfully Learn?

Download

§ Related Reading

More in

Navigating without a Helmsman vis-à-vis liability for Maritime autonomous surface ships under Singapore Law

The Punjab Anti-Sacrilege Law, 2026: The High Constitutional Costs of Legislating in Fear

The Green Shift: Navigating Singapore's New Patent Landscape for Climate Tech

Stay ahead of the legal curve

Legal Wires

Legal Wires