Google’s Gemini AI is getting a chatty new voice mode – The Verge

author
3 minutes, 29 seconds Read

/

Ads


World’s Leading High-rise Marketplace

Gemini won’t mind if you interrupt it.

p>span:first-child]:text-gray-13 [&_.duet–article-byline-and]:text-gray-13″>

a:hover]:text-gray-63 [&>a:hover]:shadow-underline-black dark:[&>a:hover]:text-gray-bd dark:[&>a:hover]:shadow-underline-gray [&>a]:shadow-underline-gray-63 dark:[&>a]:text-gray-bd dark:[&>a]:shadow-underline-gray”>Illustration: The Verge

Google’s Gemini AI assistant is getting new voice chat capabilities for Gemini Advanced subscribers this year. The feature, called Gemini Live, will enable two-way spoken conversation with the chatbot, smart assistant capabilities, and vision features — a lot like what OpenAI is working on for ChatGPT.

Google says Gemini Live will adapt to users’ speech patterns and offer more succinct, conversational responses than the long-winded text-based replies it usually generates. The feature will offer 10 voice options, and the company says it’ll be able to use smartphone cameras to see and interpret real-time video.

That includes the capabilities like those the company showed off while discussing the Project Astra multimodal AI features the company showed off at its I/O developer conference today. In that video (see above), Gemini was asked to announce when it saw “something that makes sound” via a phone’s camera. When a speaker sitting on a desk came into view, Gemini piped up with, “I see a speaker,” then, when further prompted, properly identified the speaker’s upper tweeter.

a:hover]:text-gray-63 [&>a:hover]:shadow-underline-black dark:[&>a:hover]:text-gray-bd dark:[&>a:hover]:shadow-underline-gray [&>a]:shadow-underline-gray-63 dark:[&>a]:text-gray-bd dark:[&>a]:shadow-underline-gray”>GIF: Google

Users will also be able to use Gemini Live for digital assistant tasks like having it update personal calendars using information from, say, a concert flyer you take a picture of. The company says it can also dig through users’ Gmail accounts to gather travel plan information like flight itineraries or look up information like restaurants near their hotel.

Google’s Gemini Live feature is clearly intended to serve a similar purpose as OpenAI’s GPT-4o, which that company just announced yesterday. That chatbot model will also be able to pull off natural, back-and-forth conversation and can be interrupted, just as Gemini Live will. Also like Gemini, GPT-4o features will be rolled out over time; ChatGPT Plus subscribers will get to test early alpha versions of the new Voice Mode “in the coming weeks.”

Update May 14th: Added GIF and video from Google I/O.

This post was originally published on 3rd party site mentioned in the title of this site

Similar Posts