Is ChatGPT Becoming Dumber? New Study Claims AI Chatbot’s Performance Is Deteriorating

-


OpenAI’s ChatGPT has gained widespread popularity and sparked an AI race due to its impressive performance as an artificial intelligence chatbot.

Renowned figures in the tech industry and authors alike have showered it with accolades, deeming it a groundbreaking achievement in the world of AI.

ChatGPT’s abilities have been so impressive that some even question if it has surpassed the Turing test – the ultimate benchmark for measuring a machine’s capability to emulate human intelligence.

The language model has demonstrated remarkable proficiency across various fields, showcasing its prowess in math (89th percentile), law (90th percentile), and GRE verbal (99th percentile). 

Moreover, a study by researchers from New York University’s medical school earlier this month highlighted ChatGPT’s ability to provide medical advice that closely resembles responses from human medical staff. However, not all researchers are entirely convinced that ChatGPT is consistently reliable in critical decision-making scenarios. 

(Photo : STEFANI REYNOLDS/AFP via Getty Images)
This photo illustration shows the ChatGPT logo at an office in Washington, DC, on March 15, 2023.

ChatGPT Deteriorating Performance

Lingjiao Chen, Matei Zaharia, and James Zhu from Stanford University and the University of California, Berkeley, have echoed concerns expressed by some users, suggesting that ChatGPT’s performance may not be entirely consistent and may even be deteriorating in some instances, Science X Network reported.

Their investigation discovered considerable variations in the performance and behavior of GPT-3.5 and GPT-4. Particularly noteworthy was the significant decline in responses to specific tasks over the four-month period, or from March to June.

The researchers concentrated on evaluating ChatGPT’s aptitude in math problem solving and computer code generation. Their discoveries revealed a sharp decline in GPT-4’s accuracy rate for prime number problems, plunging from 97.6% in March to a mere 2.4% in June.

ChatGPT’s role in aiding coders with programming and debugging tasks also encountered obstacles. In March, GPT-4 demonstrated an impressive ability to complete accurate, ready-to-run scripts in over 50% of cases.

However, this success rate dramatically dropped to 10% by June. Similarly, ChatGPT-3.5 experienced a notable decline in accuracy, decreasing from 22% in March to a mere 2% in June, according to the study.

The researchers faced challenges pinpointing a definitive cause for these inconsistencies, but they speculated that system modifications and upgrades might be contributing factors. The opaque nature of these language models makes it difficult to fully comprehend the reasons behind such performance fluctuations.

Read Also: Can ChatGPT, Other Large Language Models Flag Fake News?

Conspiracy Theories

Interestingly, conspiracy theorists have floated accusations that OpenAI is potentially experimenting with smaller versions of LLMs to save costs. Others have suggested that OpenAI could intentionally be downgrading the GPT-4 to drive users toward purchasing GitHub’s LLM accessory, CoPilot.

OpenAI refuted such claims. In a tweet, Peter Welinder, OpenAI’s VP of Product, clarified that they are continually striving to improve ChatGPT, making each new version smarter than its predecessor.

However, some remain concerned about the potential impact of “drift” in the model’s results. To address these apprehensions, observers urge OpenAI to be more transparent by disclosing training material sources, code, and other structural elements of ChatGPG 4.0. 

The study’s findings were published in arXiv. 

Related Article: Antrophic Announces Second Generation AI Chatbot: Claude 2

Byline

ⓒ 2023 TECHTIMES.com All rights reserved. Do not reproduce without permission.





Source link

Latest news

Early Deals From the Amazon Spring Sale That Passed Our BS Test

After a relatively quiet few months, Amazon is bringing back another of its famously invented shopping holidays. The...

The Folding Brompton Electric T-Line Is a Stylish Commuter Dream

Disappointingly, Brompton hasn’t given the T Line Electric any more speed for US-based riders, so I’m afraid you’re...

Our Favorite Red Light Hair Growth Device Is Currently on Sale

The iRestore Elite Helmet + Battery is on sale, from March 15 through March 31, dropping to $1,879...

The iGarden M1 Pro Max 100 Is the Closest Thing to a Luxury Car for Your Pool

iGarden’s claims of providing 10 hours of running time in floor-only mode were accurate in my testing; however,...

I Tried the Razer Gaming Mouse That Costs $1,337

The original Boomslang came from the era of Xbox’s bulky Duke controller. We had just barely made it...

The Chemex Coffee Maker Isn’t Just Pretty, It’s Also Forgiving

Coffee is the original biohack and the nation’s most popular productivity tool. As we've battled the changeover to...

Must read

You might also likeRELATED
Recommended to you