Anthropic unveils Claude 3, surpassing GPT-4 and Gemini Ultra in benchmark tests – VentureBeat

7 minutes, 27 seconds Read

Join leaders in Boston on March 27 for an exclusive night of networking, insights, and conversation. Request an invite here.

Anthropic, a leading artificial intelligence startup, unveiled its Claude 3 series of AI models today, designed to meet the diverse needs of enterprise customers with a balance of intelligence, speed and cost efficiency. The lineup includes three models: Opus, Sonnet and the upcoming Haiku.


World’s Leading High-rise Marketplace

The star of the lineup is Opus, which Anthropic claims is more capable than any other openly available AI system on the market, even outperforming leading models from rivals OpenAI and Google.

“Opus is capable of the widest range of tasks and performs them exceptionally well,” said Anthropic co-founder and CEO Dario Amodei in an interview with VentureBeat. 

Amodei explained that Opus outperforms top AI models like GPT-4, GPT-3.5 and Gemini Ultra on a wide range of benchmarks. This includes topping the leaderboard on academic benchmarks like GSM-8k for mathematical reasoning and MMLU for expert-level knowledge. 

VB Event

The AI Impact Tour – Boston

We’re excited for the next stop on the AI Impact Tour in Boston on March 27th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on best practices for data infrastructure and integration, data validation methods, anomaly detection for security applications, and more. Space is limited, so request an invite today.

Request an invite

“It seems to outperform everyone and get scores that we haven’t seen before on some tasks,” Amodei said.

Credit: Anthropic

While companies like Anthropic and Google have not disclosed the full parameters of their leading models, the reported benchmark results from both companies imply Opus either matches or surpasses major alternatives like GPT-4 and Gemini in core capabilities.

This, at least on paper, establishes a new high watermark for commercially available conversational AI.

Engineered for complex tasks requiring advanced reasoning, Opus stands out in Anthropic’s lineup for its superior performance.

Mid-range, speedy options are available

Sonnet, the mid-range model, offers businesses a more cost-effective solution for routine data analysis and knowledge work, maintaining high performance without the premium price tag of the flagship model.

Meanwhile, Haiku is designed to be swift and economical, suited for applications such as consumer-facing chatbots, where responsiveness and cost are crucial factors.

Amodei told VentureBeat he expects Haiku to launch publicly in a matter of “weeks, not months.”

Credit: Anthropic

New visual capabilities unlock new use cases

Each of the models unveiled today supports image input, a feature in high demand, especially for applications like text recognition in images.

“We haven’t focused as much on output modalities, because there’s less demand for that on the enterprise side,” Anthropic president and co-founder Daniela Amodei told VentureBeat, highlighting the company’s strategic focus on the most sought-after features by businesses.

In addition, Claude 3 models demonstrate sophisticated computer vision abilities on par with other state-of-the-art models. This new modality opens up use cases where enterprises need to extract information from images, documents, charts and diagrams.

“A lot of [customer] data is either highly unstructured, or in some sort of visual format,” explained Daniela. “Just the process of having to manually copy that information to even be able to have it interact with a generative AI tool is quite cumbersome.”

Fields like legal services, financial analysis, logistics and quality assurance could benefit from AI systems that understand real-world visuals and text.

Walking the tightrope of bias in AI

Anthropic’s announcement comes on the heels of controversy surrounding Google’s new chatbot Gemini, which highlighted the difficulties tech companies face in releasing models that avoid perpetuating social bias.

Last week, people found that prompting Gemini to generate historical images resulted in depictions that appeared to overcorrect racial portrayals. For example, asking for pictures of Vikings or Nazi soldiers produced images of racially diverse groups that are unlikely to reflect historical reality.

Google responded by disabling Gemini’s image generation capabilities and issuing an apology, saying it had “missed the mark” in trying to increase diversity. However, experts say the situation illustrates the constant balancing act around bias in AI.

Constitutional AI helps but isn’t perfect

Dario Amodei emphasized in his interview with VentureBeat the difficulty of steering AI models, calling it an “inexact science.” He said the company has teams dedicated to assessing and mitigating various risks from their models.

“Our hypothesis is that being at the frontier of AI development is the most effective way to steer the trajectory of AI development towards a positive outcome for society,” said Dario.

However, Daniela Amodei acknowledged that perfectly bias-free AI is likely unattainable with current methods.

“It’s almost impossible to create a perfectly neutral, generative AI tool, I think, both technically, but also because not everybody even agrees on what neutral is,” she said.

Part of Anthropic’s strategy is an approach called Constitutional AI, where models are aligned to follow principles defined in a “constitution.” But Dario Amodei admits even this technique isn’t perfect.

“We aim for models to be fair and ideologically and politically neutral, [but] you know, we haven’t got it perfectly,” he said. “I don’t think, you know, anyone has got it perfectly.”

Nonetheless, Dario believes Anthropic’s constitution of widely agreed upon values helps safeguard against skewing models towards any partisan agenda, in contrast to accusations facing Gemini.

“Our goal is not to promote any particular political or ideological viewpoint,” he said. “We want our models to be suitable for everyone.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

This post was originally published on 3rd party site mentioned in the title of this site

Similar Posts