Is ChatGPT Becoming Dumber? New Study Claims AI Chatbot’s Performance Is Deteriorating

-


OpenAI’s ChatGPT has gained widespread popularity and sparked an AI race due to its impressive performance as an artificial intelligence chatbot.

Renowned figures in the tech industry and authors alike have showered it with accolades, deeming it a groundbreaking achievement in the world of AI.

ChatGPT’s abilities have been so impressive that some even question if it has surpassed the Turing test – the ultimate benchmark for measuring a machine’s capability to emulate human intelligence.

The language model has demonstrated remarkable proficiency across various fields, showcasing its prowess in math (89th percentile), law (90th percentile), and GRE verbal (99th percentile). 

Moreover, a study by researchers from New York University’s medical school earlier this month highlighted ChatGPT’s ability to provide medical advice that closely resembles responses from human medical staff. However, not all researchers are entirely convinced that ChatGPT is consistently reliable in critical decision-making scenarios. 

(Photo : STEFANI REYNOLDS/AFP via Getty Images)
This photo illustration shows the ChatGPT logo at an office in Washington, DC, on March 15, 2023.

ChatGPT Deteriorating Performance

Lingjiao Chen, Matei Zaharia, and James Zhu from Stanford University and the University of California, Berkeley, have echoed concerns expressed by some users, suggesting that ChatGPT’s performance may not be entirely consistent and may even be deteriorating in some instances, Science X Network reported.

Their investigation discovered considerable variations in the performance and behavior of GPT-3.5 and GPT-4. Particularly noteworthy was the significant decline in responses to specific tasks over the four-month period, or from March to June.

The researchers concentrated on evaluating ChatGPT’s aptitude in math problem solving and computer code generation. Their discoveries revealed a sharp decline in GPT-4’s accuracy rate for prime number problems, plunging from 97.6% in March to a mere 2.4% in June.

ChatGPT’s role in aiding coders with programming and debugging tasks also encountered obstacles. In March, GPT-4 demonstrated an impressive ability to complete accurate, ready-to-run scripts in over 50% of cases.

However, this success rate dramatically dropped to 10% by June. Similarly, ChatGPT-3.5 experienced a notable decline in accuracy, decreasing from 22% in March to a mere 2% in June, according to the study.

The researchers faced challenges pinpointing a definitive cause for these inconsistencies, but they speculated that system modifications and upgrades might be contributing factors. The opaque nature of these language models makes it difficult to fully comprehend the reasons behind such performance fluctuations.

Read Also: Can ChatGPT, Other Large Language Models Flag Fake News?

Conspiracy Theories

Interestingly, conspiracy theorists have floated accusations that OpenAI is potentially experimenting with smaller versions of LLMs to save costs. Others have suggested that OpenAI could intentionally be downgrading the GPT-4 to drive users toward purchasing GitHub’s LLM accessory, CoPilot.

OpenAI refuted such claims. In a tweet, Peter Welinder, OpenAI’s VP of Product, clarified that they are continually striving to improve ChatGPT, making each new version smarter than its predecessor.

However, some remain concerned about the potential impact of “drift” in the model’s results. To address these apprehensions, observers urge OpenAI to be more transparent by disclosing training material sources, code, and other structural elements of ChatGPG 4.0. 

The study’s findings were published in arXiv. 

Related Article: Antrophic Announces Second Generation AI Chatbot: Claude 2

Byline

ⓒ 2023 TECHTIMES.com All rights reserved. Do not reproduce without permission.





Source link

Latest news

Here are the 5 Startup Battlefield finalists at Tech Zone Daily Disrupt 2025

After two days full of live demos and pitches, it’s time to announce the five finalists at this...

‘Silicon Valley’ star Thomas Middleditch makes a surprise appearance at Tech Zone Daily Disrupt 2025

If you were wandering the Expo Hall at Tech Zone Daily Disrupt 2025, or watching our pitch stage...

Unlisted connects homeowners with prospective buyers before they even put their homes up for sale and is part of Tech Zone Daily Disrupt 2025

Katie Hill’s dream home isn’t on the market — but one day, it will be, and she wants...

Inside CampusAI’s mission to close the AI training gap for everyday workers — check it out at Tech Zone Daily Disrupt 2025

As companies push to increase efficiency and stay competitive, they’re encouraging, or in some cases outright requiring, workers...

Super Teacher is building an AI tutor for elementary schools — catch it at Disrupt 2025

Tutoring is one of the most effective tools for improving a child’s education, yet very few kids in...

Mappa’s AI voice analysis helps you find the best job candidates and will show off its tech at Tech Zone Daily Disrupt 2025

Even after reviewing resumes, cover letters, and interviews, choosing the right candidate for a job can be a...

Must read

You might also likeRELATED
Recommended to you