OpenAI pushes AI agent capabilities with new developer API

-



Developers using the Responses API can access the same models that power ChatGPT Search: GPT-4o search and GPT-4o mini search. These models can browse the web to answer questions and cite sources in their responses.

That’s notable because OpenAI says the added web search ability dramatically improves the factual accuracy of its AI models. On OpenAI’s SimpleQA benchmark, which aims to measure confabulation rate, GPT-4o search scored 90 percent, while GPT-4o mini search achieved 88 percent—both substantially outperforming the larger GPT-4.5 model without search, which scored 63 percent.

Despite these improvements, the technology still has significant limitations. Aside from issues with CUA properly navigating websites, the improved search capability doesn’t completely solve the problem of AI confabulations, with GPT-4o search still making factual mistakes 10 percent of the time.

Alongside the Responses API, OpenAI released the open source Agents SDK, providing developers free tools to integrate models with internal systems, implement safeguards, and monitor agent activities. This toolkit follows OpenAI’s earlier release of Swarm, a framework for orchestrating multiple agents.

These are still early days in the AI agent field, and things will likely improve rapidly. However, at the moment, the AI agent movement remains vulnerable to unrealistic claims, as demonstrated earlier this week when users discovered that Chinese startup Butterfly Effect’s Manus AI agent platform failed to deliver on many of its promises, highlighting the persistent gap between promotional claims and practical functionality in this emerging technology category.



Source link

Latest news

This AI Model Can Intuit How the Physical World Works

The original version of this story appeared in Quanta Magazine.Here’s a test for infants: Show them a glass...

Lenovo’s Legion Go 2 Is a Good Handheld for Power Users

The detachable controllers go a long way towards making the device more portable and usable. The screen has...

Why Tehran Is Running Out of Water

This story originally appeared on Bulletin of the Atomic Scientists and is part of the Climate Desk collaboration.During...

Move Over, MIPS—There’s a New Bike Helmet Safety Tech in Town

Over the course of several hours and a few dozen trail miles, I had little to say about...

Security News This Week: Oh Crap, Kohler’s Toilet Cameras Aren’t Really End-to-End Encrypted

An AI image creator startup left its database unsecured, exposing more than a million images and videos its...

Gevi’s Espresso Machine Works Fine, but There Are Better Options at This Price Point

The coffee gadget market has caused a massive proliferation of devices for all tastes, preferences, and budgets, but...

Must read

You might also likeRELATED
Recommended to you