OpenAI’s regulatory troubles are just beginning

OpenAI managed to satisfy Italian data authorities and lift the country’s effective ban on ChatGPT last week, but the battle against European regulators is far from over.

Earlier this year, OpenAI’s popular and controversial ChatGPT chatbot ran into a major legal snag: an effective ban in Italy. Italy’s data protection authority (GPDP) accused OpenAI of violating EU data protection rules, and the company agreed to limit access to the service in Italy while it attempted to resolve the issue. On April 28, ChatGPT returned to the country, with OpenAI lightly addressing GPDP’s concerns without making major changes to its service – an apparent victory.

That’s what the GPDP said it “welcomes” the changes ChatGPT has made. The company’s legal troubles — and those of companies building similar chatbots — are likely just getting started. Regulators in several countries are looking into how these AI tools collect and produce information, citing a range of concerns from unlicensed companies collecting training data to chatbots’ tendency to spread misinformation. In the EU, they apply the General Data Protection Regulation (GDPR), one of the world’s strongest legal privacy frameworks, the effects of which are likely to extend far beyond Europe. Meanwhile, lawmakers on the bloc are drafting a law that will specifically target AI — likely ushering in a new era of regulation for systems like ChatGPT.

ChatGPT’s various misinformation, copyright, and data protection issues have put a target on its back

ChatGPT is one of the most popular examples of generative AI – a general term for tools that produce text, images, video and audio based on user prompts. The service reportedly became one of the fastest growing consumer applications in history after reaching 100 million monthly active users just two months after launching in November 2022 (OpenAI has never confirmed these numbers). People use it to translate, write text in different languages college essaysand generate code. But critics, including regulators, have pointed to ChatGPT’s unreliable output, confusing copyright issues, and shady data protection practices.

Italy was the first country to take a step. On March 31, the company highlighted four ways OpenAI violated the GDPR according to the company: allowing ChatGPT to provide inaccurate or misleading information, failing to notify users of its data collection practices, failing to comply with any of the six possible legal justifications for the processing of personal data and the failure to adequately prevent children under the age of 13 from using the service. It ordered OpenAI to immediately stop using personal information collected from Italian citizens in its training data for ChatGPT.

No other country has taken such measures. But since March, at least three EU countries — Germany, FranceAnd Spain – have started their own investigation into ChatGPT. Meanwhile, across the Atlantic Ocean, Canada evaluates privacy issues under its Personal Information Protection and Electronic Documents Act, or PIPEDA. The European Data Protection Board (EDPB) even has one dedicated task force to help coordinate investigations. And if these agencies demand changes from OpenAI, they could affect how the service works for users around the world.

Regulators’ concerns can be broadly divided into two categories: where ChatGPT’s training data comes from and how OpenAI delivers information to its users.

ChatGPT uses OpenAI’s GPT-3.5 or GPT-4 large language models (LLMs), which are trained on large amounts of human-produced text. OpenAI is coy about exactly what training text is used, but says it is based on “a variety of licensed, crafted, and publicly available data sources, which may include publicly available personal information.”

This potentially creates huge problems under the GDPR. The law came into effect in 2018 and covers any service that collects or processes data from EU citizens, regardless of where the responsible organization is located. GDPR rules require companies to have explicit consent before collecting personal data, have a legal justification for why it is being collected, and be transparent about how it is used and stored.

European regulators argue that the secrecy surrounding OpenAI’s training data means there is no way to confirm whether the personal information entered into it was initially given with the user’s consent, and the GPDP specifically argued that OpenAI was “in the first place” no legal basis” for collecting it. . OpenAI and others have gotten away with little research so far, but this claim adds a big question mark to future data scraping efforts.

Then there’s the GDPR”right to be forgotten”, which allows users to demand that companies correct or completely delete their personal data. Open AI preemptively updated its privacy policy to facilitate those requests, but there have been debate about whether it is technically possible to deal with it, given its complexity separate specific data once it’s turned into these big language models.

OpenAI also collects information directly from users. Like any internet platform, it collects a set of standard user data (e.g. name, contact details, card details, etc.). But more importantly, it records interactions users have with ChatGPT. If mentioned in a FAQ, this data can be reviewed by OpenAI staff and used to train future versions of the model. Given the intimate questions people ask ChatGPT – using the bot as a therapist or doctor – that means the company collects all kinds of sensitive data.

At least some of this data may have been collected from minors, because while OpenAI’s policy states that it “does not knowingly collect personal information from children under the age of 13,” there is no strict age verification gate. That doesn’t sit well with EU rules, which prohibit the collection of data from people under the age of 13 and (in some countries) require parental consent for minors under the age of 16. claimed that ChatGPT’s lack of age filters exposes minors to “definitely inappropriate responses regarding their level of development and self-awareness.”

OpenAI retains wide latitude to use that data, which has worried some regulators, and storing it poses a security risk. Companies like Samsung and JPMorgan have banned employees from using generative AI tools for fear they will upload sensitive data. And in fact, Italy announced its ban shortly after ChatGPT suffered a serious data breach, exposing users’ chat history and email addresses.

ChatGPT’s tendency to providing false information can also be a problem. The GDPR regulation stipulates that all personal data must be accurate, something the GPDP emphasized in its announcement. Depending on how that’s defined, this can cause problems for most AI text generators, which are prone to “hallucinations”: A cute industry term for factually incorrect or irrelevant answers to a question. This has already had some real impacts elsewhere, as one regional Australian mayor has done threatened to sue OpenAI for libel after ChatGPT falsely claimed he served time in jail for bribery.

ChatGPT’s popularity and current dominance in the AI ​​market make it a particularly attractive target, but there’s no reason why its competitors and collaborators, such as Google with Bard or Microsoft with its OpenAI-powered Azure AI, shouldn’t be scrutinized as well will be taken. Before ChatGPT, Italy banned the chatbot platform replica for collecting information about minors – and so far it has remained prohibited.

While GDPR is a powerful set of laws, it was not created to address AI-specific issues. Arranges that Doing, However, may be on the horizon.

In 2021, the EU submitted its first draft of it Artificial Intelligence Act (AIA), legislation that will work alongside the GDPR. The law regulates AI tools based on their perceived risk, from “minimal” (things like spam filters) to “high” (AI tools for law enforcement or education) or “unacceptable” and therefore prohibited (like a social credit system). After the explosion of large language models like ChatGPT last year, lawmakers are now racing to add rules for “base models” and “General Purpose AI Systems (GPAIs)” — two terms for large-scale AI systems that include LLMs — and possibly they classify as “risky” services.

The provisions of the AIA go beyond data protection. A recently proposed amendment would force companies to disclose all copyrighted material used to develop generative AI tools. That could one day expose classified datasets and leave more companies vulnerable to infringement proceedings, which are already affecting some services.

Laws specifically designed to regulate AI may not take effect in Europe until late 2024

But passing it can take a while. EU legislators reached a tentative AI Act deal on April 27. A committee will vote on the draft on May 11, and the final proposal is expected in mid-June. Then the European Council, Parliament and the Commission will have to do that resolve any remaining disputes before implementing the law. If all goes well, it could be passed by the second half of 2024, a little behind the official target of the European elections in May 2024.

For now, the row between Italy and OpenAI offers an early look at how regulators and AI companies might negotiate. The GPDP offered to lift the ban if OpenAI complied various proposed resolutions by April 30. That included informing users how ChatGPT stores and processes their data, asking for explicit permission to use this data, facilitating requests to correct or remove false personal information generated by ChatGPT, and requiring Italian users to confirm that they are over the age of 18 when registering for an account. OpenAI did not catch on all of those provisions, but it complied enough to appease Italian regulators and get access to ChatGPT reinstated in Italy.

OpenAI still has goals to meet. It has until September 30 to create a tougher age limit to keep out minors under 13 and require parental consent for older teen minors. If it fails, it may see itself blocked again. But it’s an example of what Europe considers acceptable behavior for an AI company – at least until new laws are on the books.