C utting-edge artificial intelligence systems can help you escape a parking fine, write an academic essay, or fool you into believing Pope Francis is a fashionista. But the virtual libraries behind this breathtaking technology are vast – and there are concerns they are operating in breach of personal data and copyright laws.
The enormous datasets used to train the latest generation of these AI systems, like those behind ChatGPT and Stable Diffusion, are likely to contain billions of images scraped from the internet, millions of pirated ebooks, the entire proceedings of 16 years of the European parliament and the whole of English-language Wikipedia.
But the industry’s voracious appetite for big data is starting to cause problems, as regulators and courts around the world crack down on researchers hoovering up content without consent or notice. In response, AI labs are fighting to keep their datasets secret, or even daring regulators to push the issue.
In Italy, ChatGPT has been banned from operating after the country’s data protection regulator said there was no legal basis to justify the collection and “massive storage” of personal data in order to train the GPT AI. On Tuesday, the Canadian privacy commissioner followed suit with an investigation into the company in response to a complaint alleging “the collection, use and disclosure of personal information without consent”.
Britain’s data watchdog expressed its own concerns. “Data protection law still applies when the personal information that you’re processing comes from publicly accessible sources,” said Stephen Almond, the director of technology and innovation at the Information Commissioner’s Office.
Michael Wooldridge, a professor of computer science at the University
Read more on theguardian.com