AI chatbots are sycophants — researchers say it’s harming science

-


AI’s inclination to be helpful affects many of the tasks that researchers use LLMs for.Credit: Smith Collection/Gado/Getty

Artificial intelligence (AI) models are 50% more sycophantic than humans, an analysis published this month has found.

The study, which was posted as a preprint1 on the arXiv server, tested how 11 widely used large language models (LLMs) responded to more than 11,500 queries seeking advice, including many describing wrongdoing or harm.

AI Chatbots — including ChatGPT and Gemini — often cheer users on, give them overly flattering feedback and adjust responses to echo their views, sometimes at the expense of accuracy. Researchers analysing AI behaviours say that this propensity for people-pleasing, known as sycophancy, is affecting how they use AI in scientific research, in tasks from brainstorming ideas and generating hypotheses to reasoning and analyses.

“Sycophancy essentially means that the model trusts the user to say correct things,” says Jasper Dekoninck, a data science PhD student at the Swiss Federal Institute of Technology in Zurich. “Knowing that these models are sycophantic makes me very wary whenever I give them some problem,” he adds. “I always double-check everything that they write.”

Marinka Zitnik, a researcher in biomedical informatics at Harvard University in Boston, Massachusetts, says that AI sycophancy “is very risky in the context of biology and medicine, when wrong assumptions can have real costs”.

People pleasers

In a study posted on the preprint server arXiv on 6 October2, Dekoninck and his colleagues tested whether AI sycophancy affects the technology’s performance in solving mathematical problems. The researchers designed experiments using 504 mathematical problems from competitions held this year, altering each theorem statement to introduce subtle errors. They then asked four LLMs to provide proofs for these flawed statements.

The authors considered a model’s answer to be sycophantic if it failed to detect the errors in a statement and went on to hallucinate a proof for it.

GPT-5 showed the least sycophantic behaviour, generating sycophantic answers 29% of the time. DeepSeek-V3.1 was the most sycophantic, generating sycophantic answers 70% of the time. Although the LLMs have the capability to spot the errors in the mathematical statements, they “just assumed what the user says is correct”, says Dekoninck.

When Dekoninck and his team changed the prompts to ask each LLM to check whether a statement was correct before proving it, DeepSeek’s sycophantic answers fell by 34%.

The study is “not really indicative of how these systems are used in real-world performance, but it gives an indication that we need to be very careful with this”, says Dekoninck.

Simon Frieder, a PhD student studying mathematics and computer science at the University of Oxford, UK, says the work “shows that sycophancy is possible”. But he adds that AI sycophancy tends to appear most clearly when people are using AI chatbots to learn, so future studies should explore “errors that are typical for humans that learn math”.

Unreliable assistance

Researchers told Nature that AI sycophancy creeps into many of the tasks that they use LLMs for.

Yanjun Gao, an AI researcher at the University of Colorado Anschutz Medical Campus in Aurora, uses ChatGPT to summarize papers and organize her thoughts, but says the tools sometimes mirror her inputs without checking the sources. “When I have a different opinion than what the LLM has said, it follows what I said instead of going back to the literature” to try to understand it, she adds.

Zitnik and her colleagues have observed similar patterns when using their multi-agent systems, which integrate several LLMs to carry out complex, multi-step processes such as analysing large biological data sets, identifying drug targets and generating hypotheses.



Source link

Latest news

Gear News of the Week: There’s Yet Another New AI Browser, and Fujifilm Debuts the X-T30 III

An increasingly popular solution is the inclusion of a solar panel to keep that battery topped up, enabling...

Amazon Explains How Its AWS Outage Took Down the Web

The cloud giant Amazon Web Services experienced DNS resolution issues on Monday leading to cascading outages that took...

Don’t Let the Fuzzy Rats Win: Tips from a Squirrel Hater Who’s Seen It All

Squirrels: Are they just rats with better PR? Be advised that this is not safe reading material for...

OpenAI’s Atlas Wants to Be the Web’s Tour Guide. I’m Not Convinced It Needs One

The oddest, and most memorable, interaction I had with ChatGPT Atlas occurred as I scrolled around on Bluesky...

The Pepsi Man Is Coming to Save Samsung From Boring Design

Samsung has one of the biggest product line ups of any tech brand, yet when it comes to...

The Best Couples’ Sex Toys to Spice Up the Bedroom or Long Distance Fun

Other Sex Toys to ConsiderHere are a few other toys that aren't as great as the picks above...

Must read

You might also likeRELATED
Recommended to you