The world of Artificial Intelligence as we know it could change forever with the war that OpenAI, creators of ChatGPT, has against some authors and media who have reported that their protected content has been included to train the system, which violates the copyright laws.
Far from admitting its fault, OpenAI has defended itself in a curious way, recognizing that without this data of interest, even if it is copyrighted, AI chatbots would be little more than useless tools if they only used unprotected content.
ChatGPT and the use of copyrighted content
In recent weeks, OpenAI, the company behind the ChatGPT phenomenon, has been on everyone’s lips following the New York Time’s complaint on December 27, 2023 before the Federal District Court in Manhattan. The legendary newspaper reported that millions of articles and research have been used to train the language model behind the GPT-4 technology. This, according to those affected, is a violation of copyright and other content that has been used without authorization to train chatbots.
OpenAI and other rival companies have been accused of illegally profiting from the work of other authors and artists. This unethical use of work done by third parties is further accentuated from the point of view that ChatGPT and other AI-powered tools gain financial benefit by taking advantage of the intellectual property of others.
This has led to a domino effect whereby, for example, prestigious authors such as John Grisham (The Pelican Brief) and George RR Martin (A Song of Ice and Fire) have sued the company for the use of their books to train ChatGPT. As for whoever uncovered this whole scandal, the New York Times, they have demanded that OpenAI destroy any system that has been trained using their work. On the other hand, OpenAI has reached agreements with publishers such as the Associated Press and Axel Springer to obtain access to their content.
Chatbots would be useless
OpenAI assessed the filing of these lawsuits by warning that banning the use of news and books to train chatbots would doom the development of Artificial Intelligence in one of its most popular (and most economically successful) forms.
It seems that according to those responsible for OpenAI, basing the training of their language models only on unprotected content would render their creations practically useless and unusable.
“Because copyright today covers virtually all types of human expression, including blog posts, photographs, forum posts, snippets of software code, and government documents, it would be impossible to train today’s leading AI models without using protected materials.” for copyright. Limiting training data to public domain books and drawings created more than a century ago might make for an interesting experiment, but it would not provide AI systems that meet the needs of today’s citizens.”
Are they then committing a crime? According to OpenAI, no, as they comply with all copyright laws when training their models because “we believe that legally copyright laws do not prohibit training.”