OpenAI’s ChatGPT Fools a Third of Users Despite 52% Wrong Answers, Purdue Study Finds

-


OpenAI’s ChatGPT has emerged as a popular tool for various uses in the rapidly evolving artificial intelligence landscape. 

However, a recently published Purdue University study sheds light on a critical element of ChatGPT’s performance that deserves attention: its accuracy in answering software engineering questions. 

The study, titled “Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions,” delves deep into the quality and usability of ChatGPT’s responses, uncovering some intriguing and, at times, problematic findings.

Exposing ChatGPT With Programmer Questions

The Purdue team meticulously examined ChatGPT’s answers to 517 questions sourced from Stack Overflow, a well-known Q&A platform for programmers. 

The assessment spanned various criteria, including correctness, consistency, comprehensiveness, and conciseness. The results were both enlightening and concerning. 

ChatGPT answered approximately 52% of software engineering questions incorrectly, raising significant questions about its accuracy and reliability as a programming resource.

The study unveiled another interesting aspect of ChatGPT’s behavior: verbosity. A staggering 77% of ChatGPT’s responses were deemed excessively wordy, potentially impacting the clarity and efficiency of its solutions. 

However, amid these inaccuracies and rambling words, users surprisingly continued to prefer ChatGPT’s responses 39.34% of the time. As the study reveals, this preference is attributed to ChatGPT’s comprehensive and well-articulated language style.

Read Also: Google’s Med-PaLM 2 Deployment Draws Scrutiny from US Senate Over Healthcare AI Risks

Moreover, the research highlighted a distinctive trait of ChatGPT’s approach – a propensity for conceptual errors. The model seems to struggle to grasp the underlying context of questions, leading to a higher frequency of errors stemming from a lack of conceptual understanding.

Even when an answer contained glaring inaccuracies, participants in the study often marked the response as preferred, indicating the influence of ChatGPT’s polite, authoritative style.

However, the authors acknowledge ChatGPT’s limitations, particularly regarding reasoning. The model often provides solutions or code snippets without clearly understanding their implications, hinting at the challenge of incorporating reasoning into language models like ChatGPT.

A Closer Look

As News18 reports, the Purdue study also delved into the linguistic and sentiment aspects of ChatGPT’s responses. 

Surprisingly, the model’s answers exhibited more formal language, analytic thinking, and positive sentiments compared to responses from Stack Overflow. 

This inclination towards positivity might contribute to user trust in ChatGPT’s answers, even when they contain inaccuracies.

What This Study Holds

The implications of this study extend beyond the confines of ChatGPT’s performance. The observed decline in usage of traditional platforms like Stack Overflow suggests that ChatGPT’s popularity is altering the landscape of seeking programming assistance online. 

In response to these findings, the researchers offer valuable recommendations. Platforms like Stack Overflow could benefit from enhancing the detection of negative sentiments and toxicity in answers and providing more precise guidelines for structuring answers effectively. 

The study emphasizes that while ChatGPT can be useful, users should be aware of the potential risks associated with seemingly accurate answers.

Stay posted here at Tech Zone Daily.

Related Article: ChatGPT Rising in Popularity: US Workers Embrace AI Chatbot Despite Employers’ Concerns

 

ⓒ 2023 TECHTIMES.com All rights reserved. Do not reproduce without permission.

Tags:





Source link

Latest news

A Computer Science Professor Invented the Emoticon After a Joke Went Wrong

On September 19, 1982, Carnegie Mellon University computer science research assistant professor Scott Fahlman posted a message to...

Apple’s Most Overlooked App Just Got a Lot Better

As sentences go, “Apple Intelligence now works in Apple Shortcuts” isn't the most likely to inspire a lot...

A $100 Million AI Super PAC Targeted New York Democrat Alex Bores. He Thinks It Backfired

It turns out that when an AI-friendly super PAC with $100 million in backing from Silicon Valley bigwigs...

This Hacker Conference Installed a Literal Anti-Virus Monitoring System

Elevated levels of CO2 lead to reduced cognitive ability and facilitate transmission of airborne viruses, which can linger...

Instead of an AI Health Coach, You Could Just Have Friends

Someone needs to say it. Someone has to speak up in defense of being mid. I am a...

A Viral Chinese Wristband Claims to Zap You Awake. The Public Says ‘No Thanks’

Forget coffee, you can now stay alert by strapping on a wristband that lightly zaps you awake. That’s...

Must read

You might also likeRELATED
Recommended to you