32 private links
Seemingly, this is an unspoken understanding at the top AI companies. When one Meta researcher inquired if the company's legal team had okayed using LibGen, another responded: "I didn't ask questions but this is what OpenAI does with GPT3, what Google does with PALM, and what Deepmind does with Chinchilla so we will do it to[o]," per Vanity Fair, from internal messages cited in the suit.
The complaint lays out in steps why the plaintiffs believe the datasets have illicit origins — in a Meta paper detailing LLaMA, the company points to sources for its training datasets, one of which is called ThePile, which was assembled by a company called EleutherAI. ThePile, the complaint points out, was described in an EleutherAI paper as being put together from “a copy of the contents of the Bibliotik private tracker.” Bibliotik and the other “shadow libraries” listed, says the lawsuit, are “flagrantly illegal.”