OpenAI pushes AI agent capabilities with new developer API

-



Developers using the Responses API can access the same models that power ChatGPT Search: GPT-4o search and GPT-4o mini search. These models can browse the web to answer questions and cite sources in their responses.

That’s notable because OpenAI says the added web search ability dramatically improves the factual accuracy of its AI models. On OpenAI’s SimpleQA benchmark, which aims to measure confabulation rate, GPT-4o search scored 90 percent, while GPT-4o mini search achieved 88 percent—both substantially outperforming the larger GPT-4.5 model without search, which scored 63 percent.

Despite these improvements, the technology still has significant limitations. Aside from issues with CUA properly navigating websites, the improved search capability doesn’t completely solve the problem of AI confabulations, with GPT-4o search still making factual mistakes 10 percent of the time.

Alongside the Responses API, OpenAI released the open source Agents SDK, providing developers free tools to integrate models with internal systems, implement safeguards, and monitor agent activities. This toolkit follows OpenAI’s earlier release of Swarm, a framework for orchestrating multiple agents.

These are still early days in the AI agent field, and things will likely improve rapidly. However, at the moment, the AI agent movement remains vulnerable to unrealistic claims, as demonstrated earlier this week when users discovered that Chinese startup Butterfly Effect’s Manus AI agent platform failed to deliver on many of its promises, highlighting the persistent gap between promotional claims and practical functionality in this emerging technology category.



Source link

Latest news

US Customs and Border Protection Quietly Revokes Protections for Pregnant Women and Infants

US Customs and Border Protection (CBP) has quietly rescinded several internal policies that were designed to protect some...

Celsius Founder Alex Mashinsky Sentenced to 12 Years in Prison

Under the applicable sentencing guidelines, Mashinsky could have faced up to 30 years in prison. But federal judges...

Ex-Synapse CEO reportedly trying to raise $100M for his new humanoid robotics venture

Sankaet Pathak’s last startup, fintech Synapse, filed for bankruptcy in 2024 amid issues with partner Evolve Bank &...

Social media startup Fizz sues Instacart and Partiful for trademark infringement over new Fizz app

Social media startup Fizz is suing grocery delivery giant Instacart and party planning app Partiful for trademark infringement,...

Broadcom Sends Cease-and-Desist Letters to VMware Perpetual License Holders

Broadcom has been sending cease-and-desist letters to owners of VMware perpetual licenses with expired support contracts, Ars Technica...

Donald Trump’s UK Trade Deal Could Secure Jaguar’s Resurrection

Visiting the plant again today, and from where he held a video conference with Trump, Starmer said the...

Must read

You might also likeRELATED
Recommended to you