What OpenAI whistleblower revealed about company weeks before ...

6 hours ago

India Today

Just weeks before his death in November, Suchir Balaji, a former OpenAI researcher, levelled serious allegations against the artificial intelligence company.

Suchir Balaji - Figure 1 — Photo India Today

Suchir Balaji was part of OpenAI's generative AI projects.

New Delhi,UPDATED: Dec 15, 2024 06:52 IST

Suchir Balaji, a former researcher at OpenAI, was found dead in a San Francisco apartment in late November. Prior to his death, he had made serious allegations against the artificial intelligence company, accusing it of copyright violations and unethical business practices.

Balaji, an Indian-American researcher who worked at OpenAI for over four years, was involved in the development of the GPT-4 model, a cornerstone of OpenAI's generative AI products. His body was recovered on November 26, but news of his death came to light on Friday. Police suspect that it was suicide.

In an interview with The New York Times in October, just weeks before his death, the 26-year-old researcher accused OpenAI of using copyrighted material without authorisation to train ChatGPT. He alleged that technologies like these were damaging the internet ecosystem.

Photo India Today

He claimed that Sam Altman’s company sourced vast amounts of digital data from the internet to train its AI models without adhering to fair use provisions. This data allegedly included content from websites, books, and other copyrighted materials, which were used to enhance the AI's capabilities.

Balaji argued that OpenAI’s practices were destroying the commercial viability of individuals, businesses, and internet services that created these vast amounts of digital data. He pointed out that the ChatGPT models could create substitutes that directly competed with the original data sources, thereby undermining the fair use argument.

“This is not a sustainable model for the internet ecosystem as a whole,” he told The Times.

Balaji also accused OpenAI of making unauthorised copies of copyrighted data, as well as creating similar versions of the originals. He explained that OpenAI could teach its system to generate an exact copy of the data or train it to produce text that is not a direct copy in any way.

“The outputs aren’t exact copies of the inputs, but they are also not fundamentally novel. There are occasionally circumstances where an output looks like an input,” he was quoted as saying.

The bigger issue is that as AI technologies replace existing internet services, they often produce false or entirely fabricated information--referred to by researchers as "hallucinations", he pointed out.

“If you believe what I believe, you have to just leave the company,” he told The Times.

Balaji’s revelations were central to many lawsuits filed against OpenAI for copyright violations.

Published On:

Dec 15, 2024