OpenAI admits it's 'impossible' to create ChatGPT-like tools without using copyright material, amid court battles over intellectual property theft allegations

ChatGPT and Microsoft Logo
(Image credit: Daniel Rubino)

What you need to know

  • OpenAI has found itself in the corridors of justice after being slapped with multiple lawsuits over copyright infringement.
  • The company admits that it's impossible to create AI chatbots without using copyrighted material from the internet.
  • It highlighted that copyright law doesn't forbid training while making its submission.

While the OpenAI's fiasco that led to its board of directors to stripe Sam Altman of his position at the company as CEO is out of the way, the company can't catch a break as more trouble is seemingly brewing. As 2023 came to an end, The New York Times publicly announced its plans to sue Microsoft and OpenAI over AI unfairly using its copyrighted material, which negatively impacted the outlet monetarily.

Recently joining the fray, two non-fiction authors filed a class-action lawsuit against Microsoft and OpenAI for intellectual property theft, further staking a claim of $150,000 as restitution for damages. For those unaware, AI-powered chatbots like OpenAI's ChatGPT or Microsoft's Copilot (formerly Bing Chat) heavily steal rely on already existing information and resources from the internet (predominantly from websites) for training purposes. 

The issue here is that the AI chatbots use the information to curate specific and detailed responses to queries, with "subtle" attribution to the source. What's more, no compensation is provided to content creators for using their work to train these models. 

OpenAI recently admitted that it's literally "impossible" to create tools like ChatGPT without copyrighted material from the internet while submitting its defense to the House of Lords communications and digital select committee. For an AI chatbot to provide users with accurate information, it has to refer to vast resources already existing on the internet. However, the twist is that virtually everything on the internet right now is copyrighted.

Because copyright today covers virtually every sort of human expression – including blogposts, photographs, forum posts, scraps of software code, and government documents – it would be impossible to train today’s leading AI models without using copyrighted materials.

OpenAI

OpenAI indicated that limiting its training data set to copyright-free material would create AI chatbots that cannot meet the average user's minimum requirements. Per the company's submission and defense strategy, it's apparent that "fair use" of copyrighted content is its entire lifeline. 

Fair use of copyright resources creates a gray area, ultimately presenting a scenario where chatbots can obtain and use copyrighted information without necessarily seeking permission from the owner first. "Legally, copyright law does not forbid training," OpenAI added.

There's no AI without copyrighted content

OpenAI and ChatGPT

(Image credit: Daniel Rubino)

OpenAI, one of the most sought-after companies when it comes to generative AI has openly admitted that it's next to impossible to create AI-powered chatbots like ChatGPT without using copyrighted material to train the models. This is despite having unlimited access to Microsoft resources, on top of its initial multi-billion dollar investment in the technology

In the past few months, ChatGPT has suffered several setbacks, including reports that it's getting dumber and a decline in its user base. This is amid speculations that OpenAI is running on fumes and on the verge of bankruptcy. Granted, it's quite costly a fair to run a chatbot daily. Figuratively speaking, it's to the tune of 700,000 dollars per day and one water bottle per query for cooling. A report highlighted that generative AI could consume energy to power a small county by 2027 for a year.

While the matter is still in court, it'll be interesting to see how things pan out. President Biden issued an Executive Order addressing safety and privacy concerns revolving around AI, but guardrails for the technology remain a major concern among most users.

AI chatbots have been spotted having lucid hallucinations, erroneously recommending a Food Bank as a tourist attraction, and even asking readers to take part in a poll to determine the cause of a woman's unfortunate passing. If this happened while the chatbots had access to copyrighted material, it raises a lot of concern about how much damage the technology would cause when restricted to copyright-free data. In the meantime, Google's Bard could potentially rise up the ranks having unlimited access to the entire internet.

What are your thoughts on AI chatbots using copyrighted resources without compensation and sweeping the issue under the rug as "fair use"? Let us know in the comments.

Kevin Okemwa
Contributor

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. You'll also catch him occasionally contributing at iMore about Apple and AI. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.

  • naddy69
    Copyright laws are some of the clearest, easiest to enforce laws on the books. You simply can't use/reprint/distribute copyrighted material without the written consent of and/or paying the copyright holder. Period.

    It is important to note that it is up to the copyright holder to defend the copyright. If a copyright holder knowingly lets someone use the material without written consent and/or payment, the copyrighted material becomes public domain and is no longer copyrighted.

    This is why we are seeing these lawsuits. I guarantee you that more will come.

    "OpenAI recently admitted that it's literally "impossible" to create tools like ChatGPT without copyrighted material from the internet"

    Then you better re-think your business model. All of this "AI" junk is going to be seriously derailed by this. "Fair Use" does not mean using any amount - that YOU deem acceptable - of copyrighted material for free. The copyright holders determine this, not you. You will HAVE to pay up.

    That's the whole purpose of copyrights. It is - literally - the Right To Copy.

    Which means you will have to charge everyone that uses "AI", every time they use it. Or you will have to NOT include copyrighted material from everyone who sues you. And if you are not paying the copyright holders, the number of people suing you is only going to grow.

    "Legally, copyright law does not forbid training," OpenAI added.

    Really? What if schools used illegally copied books for "training" students? Do you really think they could get away with that? The schools BUY the required books and lend them to the students. In college, each student BUYS the required books.

    In neither case are the students provided free, bootleg copies by the school/college. The copyright holders ARE PAID for their copyrighted materials. Period.

    Otherwise it does not get used by the school/college. Period.
    Reply
  • taynjack
    naddy69 said:
    Copyright laws are some of the clearest, easiest to enforce laws on the books. You simply can't use/reprint/distribute copyrighted material without the written consent of and/or paying the copyright holder. Period.

    It is important to note that it is up to the copyright holder to defend the copyright. If a copyright holder knowingly lets someone use the material without written consent and/or payment, the copyrighted material becomes public domain and is no longer copyrighted.

    This is why we are seeing these lawsuits. I guarantee you that more will come.

    "OpenAI recently admitted that it's literally "impossible" to create tools like ChatGPT without copyrighted material from the internet"

    Then you better re-think your business model. All of this "AI" junk is going to be seriously derailed by this. "Fair Use" does not mean using any amount - that YOU deem acceptable - of copyrighted material for free. The copyright holders determine this, not you. You will HAVE to pay up.

    That's the whole purpose of copyrights. It is - literally - the Right To Copy.

    Which means you will have to charge everyone that uses "AI", every time they use it. Or you will have to NOT include copyrighted material from everyone who sues you. And if you are not paying the copyright holders, the number of people suing you is only going to grow.

    "Legally, copyright law does not forbid training," OpenAI added.

    Really? What if schools used illegally copied books for "training" students? Do you really think they could get away with that? The schools BUY the required books and lend them to the students. In college, each student BUYS the required books.

    In neither case are the students provided free, bootleg copies by the school/college. The copyright holders ARE PAID for their copyrighted materials. Period.

    Otherwise it does not get used by the school/college. Period.
    The part that makes this so interesting to me is, do you read something on the internet then request permission to use the information someone has put in their free blogpost, free instagram, free twitter, or free website? After reading multiple websites about say a medical condition you have, do you write a letter and ask the publisher if it is okay to use their suggested mode of treatment? Or after reviewing several fancy websites, does a person then get citations from every website owner where they found a cool feature they like and want to use in a similar way on their own website? Of course not, no one does this.

    Anyone can literally read from several websites, review and revise that information, and add their perspective on their own website with their own perspective on the subject matter. Of course, they have to provide citations for any direct quotes, they can't copy and paste images or creative works of others without permission, and they can't name their products with the same name as another existing product. Yet, anyone can and almost everyone does, review other people's work at the very least supported by searches across the internet then adapt and repackage that information into their own creations. Artists do this all the time. Web and App developers as well as practically any designer scours the internet for ideas and trends as well as research their competition to then provide something better. Just look at automobiles that all have many similar features and design similarities. For all of us there are quotes, anecdotes, solutions and principles that I'm certain we have all learned on the internet, that we couldn't possibly attribute to the original author(s) that taught us the subject matter. We likely all use this information to be better at our jobs which means we benefit monetarily from the information, without citation, that is out there for free and on the internet.

    This, I believe, is roughly the grey area that Openai is attributing their use of the internet to train A.I. The A.I. scours the internet for information that is widely available for free, then when someone asks it's opinion, it takes all that information it has studied and spits out its interpretation. Take the A.I. summary in Amazon reviews. They could pay humans to read all the reviews of a product and then produce a summary. No one would bat an eye at this. (Of course, Amazon owns its reviews, so this is a terrible example.) Anyone can do this, but obviously not to the extreme leap in power of A.I. Is it right? Is it wrong? I'm glad I'm not the one responsible for figuring out where that line is.
    Reply