AI is poised to offer dramatic improvements across sectors in the coming years. AI tools can augment and complement human intelligence and ingenuity. We are already seeing signs of how AI can accelerate drug development, bolster advanced manufacturing, and unlock creativity.
At the same time, concerns about generative AI’s threat to copyright are well represented on Capitol Hill. Members of Congress have held hearings and introduced several bills attempting to increase transparency related to AI-generated content and mitigate potential harm. Now, Senate Commerce Committee Chair Maria Cantwell, alongside Senators Blackburn and Heinrich, has joined the fray with the Content Origin Protection and Integrity from Edited and Deepfaked Media (COPIED) Act of 2024. While the bill includes some strong ideas, its provisions imposing liability for AI training on wide swaths of covered content would be hugely detrimental to American AI development and deployment and should be stripped out.
Modern AI relies on large datasets for “training,” a process by which a system is able to learn how to accomplish a task by analyzing large amounts of data to identify patterns important to accomplishing the task. Multi-modal generative AI models, such as ChatGPT, perform a host of tasks that require a large, diverse set of training data to be effective at writing code, editing images, or suggesting a travel itinerary. Without access to such data, building useful models under prevailing technical conditions would be next to impossible.
The bill focuses on creating standards and techniques for content provenance information (CPI), machine-readable information that contains details on the source of specific content, and if it has been created or edited by an AI model. Specifically, it taps the National Institute of Standards and Technology (NIST) with two tasks. First, the director of NIST would be responsible for creating a public-private partnership to help develop standards for CPI and detect synthetic content, another name for AI-generated content. Second, NIST would be responsible for promoting research and development for technical tools and informing the public about advances in synthetic content detection. Enlisting NIST to lead on these topics could help improve transparency and explainability for AI.
But the bill also creates new liability that would undermine the development and diffusion of AI technologies. The COPIED Act stands to do serious damage to AI development and its potential to bolster online expression, national competitiveness, and innovation.
Under the bill, platforms or developers that do not provide access to the voluntary technical tools developed for CPI could face civil liability. The bill would require that any AI tool that can synthetically generate or edit content must allow a user to embed CPI. For covered platforms, not displaying CPI or modifying it without consent could also trigger liability.
While this may seem innocuous, there is nothing voluntary about guidelines or standards if noncompliance alone creates legal liability. The liability regime would empower the FTC, state AGs, and private entities to sue alleged violators. The private right of action is particularly concerning given that leading supportersof the bill, such as the Recording Industry Association of America, have a history of aggressive litigation against internet platforms and developers for any potential copyright infringement.
Another concerning aspect is the liability for individuals and firms training and using AI models. The bill’s drafters included language that could make developers, platforms, and users liable if any content that has CPI is trained on or appears in an output of an AI system without express consent from the owner. AI model developers and users would be in the hot seat.
With no text and data mining exception for research or ability to provide broad consent for training, developers would face significant liability if they use data from the open web to train models, effectively stopping development on such tools. Compared to jurisdictions with more hospitable regulatory environments, such a framework could disadvantage American AI development. This would also be a giveaway to the largest, most well-resourced AI companies, who can simultaneously obtain licensing agreements and deal with regulatory minutiae while absorbing litigation costs as they arise.
Such liability could make it harder for people to find relevant information online and limit free expression. The bill’s broad definition of “covered content,” which includes any information that is or could be copyrighted, would mean an AI tool combing through data without express consent would trigger liability, imperiling one of the most basic functions of popular models. If someone is using an AI writing tool, including information that contains CPI could trigger liability depending on how the system modifies the content. For reporters, researchers, or engaged citizens, such liability could hamper their ability to find and share information.
Beginning work on standards while respecting established copyright law as it relates to fair use should be the way forward. Private organizations such as the Coalition for Content Provenance and Authenticity and the World Wide Web Consortium are already working on content provenance solutions for AI, and NIST’s official involvement could be beneficial. On training, the Copyright Office is drafting a report to address the legality of AI training on copyrighted works. A wise man once said, “The standard is the standard.” Members of Congress interested in promoting greater trust and transparency around generative AI should heed his message.