OpenAI pushes AI agent capabilities with new developer API

-



Developers using the Responses API can access the same models that power ChatGPT Search: GPT-4o search and GPT-4o mini search. These models can browse the web to answer questions and cite sources in their responses.

That’s notable because OpenAI says the added web search ability dramatically improves the factual accuracy of its AI models. On OpenAI’s SimpleQA benchmark, which aims to measure confabulation rate, GPT-4o search scored 90 percent, while GPT-4o mini search achieved 88 percent—both substantially outperforming the larger GPT-4.5 model without search, which scored 63 percent.

Despite these improvements, the technology still has significant limitations. Aside from issues with CUA properly navigating websites, the improved search capability doesn’t completely solve the problem of AI confabulations, with GPT-4o search still making factual mistakes 10 percent of the time.

Alongside the Responses API, OpenAI released the open source Agents SDK, providing developers free tools to integrate models with internal systems, implement safeguards, and monitor agent activities. This toolkit follows OpenAI’s earlier release of Swarm, a framework for orchestrating multiple agents.

These are still early days in the AI agent field, and things will likely improve rapidly. However, at the moment, the AI agent movement remains vulnerable to unrealistic claims, as demonstrated earlier this week when users discovered that Chinese startup Butterfly Effect’s Manus AI agent platform failed to deliver on many of its promises, highlighting the persistent gap between promotional claims and practical functionality in this emerging technology category.



Source link

Latest news

Prezent raises $20M to build AI for slide decks

Prezent, a startup empowering customers to build slide decks using generative AI, has raised $20 million as it...

Aletiq secures $6.5M for its SaaS tool focused on product lifecycle management

Aletiq has raised a €6 million funding round led by Point Nine a few months ago (around $6.5...

Factorial snaps up $120M from General Catalyst to boost its HR sales and marketing

While Rippling and Deel duke it out in the field and in the courtroom alleging illegal sales and...

KitchenAid Promo Codes and Coupons for March 2025

KitchenAid’s strategy is one that maybe we should all live by—if it ain’t broke, don’t fix it. Remaining...

Light pollution threatens fleet of world-class telescopes in Atacama Desert

A massive green hydrogen plant proposed for construction in Chile could increase light pollution at one of...

Roku Tests Showing Ads Before the Home Screen Loads

Owners of smart TVs and streaming sticks running Roku OS are already subject to video advertisements on the...

Must read

You might also likeRELATED
Recommended to you