Is ChatGPT Becoming Dumber? New Study Claims AI Chatbot’s Performance Is Deteriorating

-


OpenAI’s ChatGPT has gained widespread popularity and sparked an AI race due to its impressive performance as an artificial intelligence chatbot.

Renowned figures in the tech industry and authors alike have showered it with accolades, deeming it a groundbreaking achievement in the world of AI.

ChatGPT’s abilities have been so impressive that some even question if it has surpassed the Turing test – the ultimate benchmark for measuring a machine’s capability to emulate human intelligence.

The language model has demonstrated remarkable proficiency across various fields, showcasing its prowess in math (89th percentile), law (90th percentile), and GRE verbal (99th percentile). 

Moreover, a study by researchers from New York University’s medical school earlier this month highlighted ChatGPT’s ability to provide medical advice that closely resembles responses from human medical staff. However, not all researchers are entirely convinced that ChatGPT is consistently reliable in critical decision-making scenarios. 

(Photo : STEFANI REYNOLDS/AFP via Getty Images)
This photo illustration shows the ChatGPT logo at an office in Washington, DC, on March 15, 2023.

ChatGPT Deteriorating Performance

Lingjiao Chen, Matei Zaharia, and James Zhu from Stanford University and the University of California, Berkeley, have echoed concerns expressed by some users, suggesting that ChatGPT’s performance may not be entirely consistent and may even be deteriorating in some instances, Science X Network reported.

Their investigation discovered considerable variations in the performance and behavior of GPT-3.5 and GPT-4. Particularly noteworthy was the significant decline in responses to specific tasks over the four-month period, or from March to June.

The researchers concentrated on evaluating ChatGPT’s aptitude in math problem solving and computer code generation. Their discoveries revealed a sharp decline in GPT-4’s accuracy rate for prime number problems, plunging from 97.6% in March to a mere 2.4% in June.

ChatGPT’s role in aiding coders with programming and debugging tasks also encountered obstacles. In March, GPT-4 demonstrated an impressive ability to complete accurate, ready-to-run scripts in over 50% of cases.

However, this success rate dramatically dropped to 10% by June. Similarly, ChatGPT-3.5 experienced a notable decline in accuracy, decreasing from 22% in March to a mere 2% in June, according to the study.

The researchers faced challenges pinpointing a definitive cause for these inconsistencies, but they speculated that system modifications and upgrades might be contributing factors. The opaque nature of these language models makes it difficult to fully comprehend the reasons behind such performance fluctuations.

Read Also: Can ChatGPT, Other Large Language Models Flag Fake News?

Conspiracy Theories

Interestingly, conspiracy theorists have floated accusations that OpenAI is potentially experimenting with smaller versions of LLMs to save costs. Others have suggested that OpenAI could intentionally be downgrading the GPT-4 to drive users toward purchasing GitHub’s LLM accessory, CoPilot.

OpenAI refuted such claims. In a tweet, Peter Welinder, OpenAI’s VP of Product, clarified that they are continually striving to improve ChatGPT, making each new version smarter than its predecessor.

However, some remain concerned about the potential impact of “drift” in the model’s results. To address these apprehensions, observers urge OpenAI to be more transparent by disclosing training material sources, code, and other structural elements of ChatGPG 4.0. 

The study’s findings were published in arXiv. 

Related Article: Antrophic Announces Second Generation AI Chatbot: Claude 2

Byline

ⓒ 2023 TECHTIMES.com All rights reserved. Do not reproduce without permission.





Source link

Latest news

My Favorite Apple Watch Bands (and Which Ones to Avoid)

Apple Watches are smart little gadgets, but they don't always fit the vibe. There were plenty of nights...

I Tried to Offset Horrible Heating Bills With a Bitcoin Miner

But once you've set up your device, Heatbit will track and file your mining revenue to your phone,...

Sony’s PS5 Price Hikes Prove This Console Generation Is Far From Over. Good.

If you’ve been holding off on picking up a PlayStation 5 in hopes of a price cut, bad...

With One Million Displaced, Lebanon Turns to Digital Wallets for Aid

Since March, Israeli attacks on Beirut and the occupation of southern Lebanon have displaced over 1 million people....

The Hack That Exposed Syria’s Sweeping Security Failures

When a wave of unusual activity swept through Syrian government accounts on X in March, it first looked...

I’ve Been Testing the New Sonos Play for Weeks. It’s My New Favorite

It’s smaller and more portable than the brutish Move, yet large enough to sound much fuller than the...

Must read

You might also likeRELATED
Recommended to you