OpenAI pushes AI agent capabilities with new developer API

-



Developers using the Responses API can access the same models that power ChatGPT Search: GPT-4o search and GPT-4o mini search. These models can browse the web to answer questions and cite sources in their responses.

That’s notable because OpenAI says the added web search ability dramatically improves the factual accuracy of its AI models. On OpenAI’s SimpleQA benchmark, which aims to measure confabulation rate, GPT-4o search scored 90 percent, while GPT-4o mini search achieved 88 percent—both substantially outperforming the larger GPT-4.5 model without search, which scored 63 percent.

Despite these improvements, the technology still has significant limitations. Aside from issues with CUA properly navigating websites, the improved search capability doesn’t completely solve the problem of AI confabulations, with GPT-4o search still making factual mistakes 10 percent of the time.

Alongside the Responses API, OpenAI released the open source Agents SDK, providing developers free tools to integrate models with internal systems, implement safeguards, and monitor agent activities. This toolkit follows OpenAI’s earlier release of Swarm, a framework for orchestrating multiple agents.

These are still early days in the AI agent field, and things will likely improve rapidly. However, at the moment, the AI agent movement remains vulnerable to unrealistic claims, as demonstrated earlier this week when users discovered that Chinese startup Butterfly Effect’s Manus AI agent platform failed to deliver on many of its promises, highlighting the persistent gap between promotional claims and practical functionality in this emerging technology category.



Source link

Latest news

A Gene Editing Therapy Cut Cholesterol Levels by Half

In a step toward the wider use of gene editing, a treatment that uses Crispr successfully slashed high...

How startups can lure good talent fairly without big tech bank accounts 

Startups have never been able to offer the same sizable salaries as big tech companies. Now with companies...

Trump’s Hatred of EVs Is Making Gas Cars More Expensive

This story originally appeared on Mother Jones and is part of the Climate Desk collaboration.As President Donald Trump...

Gear News of the Week: Fairphone Lands in the US, and WhatsApp Is Finally on the Apple Watch

The only smartphone manufacturer with a 10/10 iFixit repairability score is finally bringing its products to the US,...

Do Not Jump Into an Ice Bath Before Your 12-Mile Run, and Other Cold Plunge Tips

You’d think cold plunging would be a straightforward task. Strip down to your swim suit, take a controlled...

Unpicking How to Measure the Complexity of Knots

The duo kept their program running in the background for over a decade. During that time, a couple...

Must read

You might also likeRELATED
Recommended to you