OpenAI’s ChatGPT Fools a Third of Users Despite 52% Wrong Answers, Purdue Study Finds

-


OpenAI’s ChatGPT has emerged as a popular tool for various uses in the rapidly evolving artificial intelligence landscape. 

However, a recently published Purdue University study sheds light on a critical element of ChatGPT’s performance that deserves attention: its accuracy in answering software engineering questions. 

The study, titled “Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions,” delves deep into the quality and usability of ChatGPT’s responses, uncovering some intriguing and, at times, problematic findings.

Exposing ChatGPT With Programmer Questions

The Purdue team meticulously examined ChatGPT’s answers to 517 questions sourced from Stack Overflow, a well-known Q&A platform for programmers. 

The assessment spanned various criteria, including correctness, consistency, comprehensiveness, and conciseness. The results were both enlightening and concerning. 

ChatGPT answered approximately 52% of software engineering questions incorrectly, raising significant questions about its accuracy and reliability as a programming resource.

The study unveiled another interesting aspect of ChatGPT’s behavior: verbosity. A staggering 77% of ChatGPT’s responses were deemed excessively wordy, potentially impacting the clarity and efficiency of its solutions. 

However, amid these inaccuracies and rambling words, users surprisingly continued to prefer ChatGPT’s responses 39.34% of the time. As the study reveals, this preference is attributed to ChatGPT’s comprehensive and well-articulated language style.

Read Also: Google’s Med-PaLM 2 Deployment Draws Scrutiny from US Senate Over Healthcare AI Risks

Moreover, the research highlighted a distinctive trait of ChatGPT’s approach – a propensity for conceptual errors. The model seems to struggle to grasp the underlying context of questions, leading to a higher frequency of errors stemming from a lack of conceptual understanding.

Even when an answer contained glaring inaccuracies, participants in the study often marked the response as preferred, indicating the influence of ChatGPT’s polite, authoritative style.

However, the authors acknowledge ChatGPT’s limitations, particularly regarding reasoning. The model often provides solutions or code snippets without clearly understanding their implications, hinting at the challenge of incorporating reasoning into language models like ChatGPT.

A Closer Look

As News18 reports, the Purdue study also delved into the linguistic and sentiment aspects of ChatGPT’s responses. 

Surprisingly, the model’s answers exhibited more formal language, analytic thinking, and positive sentiments compared to responses from Stack Overflow. 

This inclination towards positivity might contribute to user trust in ChatGPT’s answers, even when they contain inaccuracies.

What This Study Holds

The implications of this study extend beyond the confines of ChatGPT’s performance. The observed decline in usage of traditional platforms like Stack Overflow suggests that ChatGPT’s popularity is altering the landscape of seeking programming assistance online. 

In response to these findings, the researchers offer valuable recommendations. Platforms like Stack Overflow could benefit from enhancing the detection of negative sentiments and toxicity in answers and providing more precise guidelines for structuring answers effectively. 

The study emphasizes that while ChatGPT can be useful, users should be aware of the potential risks associated with seemingly accurate answers.

Stay posted here at Tech Zone Daily.

Related Article: ChatGPT Rising in Popularity: US Workers Embrace AI Chatbot Despite Employers’ Concerns

 

ⓒ 2023 TECHTIMES.com All rights reserved. Do not reproduce without permission.

Tags:





Source link

Latest news

Pinwheel introduces a smartwatch for kids that includes an AI chatbot

As a parent, it can be daunting to hand over a smart device to their tween when a...

Castelion raises $350M Series B to scale hypersonic missile business

Hypersonic weapons startup Castelion has raised a $350 million Series B led by Lightspeed Venture Partners and Altimeter...

Writer CEO May Habib to take the AI stage at Disrupt 2025

AI agents are reshaping how work gets done across industries, and at Tech Zone Daily Disrupt 2025, one...

Israeli quantum startup Qedma just raised $26 million, with IBM joining in

Despite their imposing presence, quantum computers are delicate beasts, and their errors are among the main bottlenecks that...

From Sensual Butt Songs to Santa’s Alleged Coke Habit: AI Slop Music Is Getting Harder to Avoid

AI slop is flooding every single digital platform, and music streaming services are no exception—so much so, even...

A Former Chocolatier Shares the 7 Kitchen Scales She Recommends

Don’t underestimate the benefits of a mechanical scale. While it may not be exactly convenient in size (it’s...

Must read

You might also likeRELATED
Recommended to you