OpenAI has unveiled HealthBench, a new benchmark designed to assess AI models in healthcare using real-world applicability and physician judgment. The benchmark includes 5,000 simulated health conversations, each evaluated against physician-created rubrics to ensure accuracy and relevance.

Developed in collaboration with 262 physicians across 60 countries, HealthBench spans 49 languages and 26 medical specialties. Each model response is graded based on criteria such as expertise-tailored communication, emergency referrals, and response depth, with evaluations conducted using GPT-4.1.

“Our findings show that large language models have improved significantly and already outperform experts in writing responses,” OpenAI stated. However, the company noted room for improvement, particularly in context-seeking and reliability.

HealthBench is now publicly available on GitHub, aligning with OpenAI’s broader efforts to advance AI in high-impact fields like healthcare.

The Larger Trend: Project Stargate Faces Delays

The announcement comes as Project Stargate, a $500 billion AI infrastructure initiative involving OpenAI CEO Sam Altman, Oracle’s Larry Ellison, and SoftBank’s Masayoshi Son, encounters setbacks.

Originally touted as a potential game-changer for healthcare—including AI-driven cancer vaccines—the project is now facing delays due to economic uncertainty and U.S. tariffs on critical tech components. SoftBank’s pledged $100 billion investment has yet to materialize, with financing discussions still pending.

Despite challenges, OpenAI remains focused on AI-driven healthcare innovation, with HealthBench serving as a key step toward ensuring real-world benefits.

2 responses to “OpenAI Launches HealthBench to Evaluate AI Models in Healthcare”

  1. […] lawsuit is one of several targeting AI companies, including OpenAI and Microsoft, as the industry grapples with how to balance innovation with intellectual property […]

  2. […] announcement underscores Meta’s aggressive push to catch up to rivals like OpenAI and Google in the AI arms race. After facing criticism for the tepid reception of its Llama 4 models […]

Leave a comment

Trending