Gemini Live first look: Better than talking to Siri, but worse than I expected
Google launched Gemini live during its Made by Google event in Mountain View, California on Tuesday. The feature allows you to have semi-natural spoken conversations, rather than typed conversations, with an AI chatbot powered by Google’s latest big language model. TechCrunch tested it out firsthand.
Gemini Live is Google’s answer to OpenAI’s Advanced Voice Mode, a nearly identical feature to ChatGPT which exists in limited alpha testing. OpenAI beat Google to the punch by demoing the feature first, but Google finalized the feature first.
In my experience, these low-latency, verbal features feel much more natural than texting with ChatGPT or even talking with Siri or Alexa. I found that Gemini Live responded to questions in less than two seconds, and was able to get back to work fairly quickly when interrupted. Gemini Live isn’t perfect, but it’s the best way to use your phone hands-free that I’ve seen yet.
how it works
The feature lets you choose from 10 voices before you talk to Gemini live, while OpenAI only offers three. Google worked with voice actors to create each of the voices. I appreciated the variety there, and found each voice to sound very human.
In one example, a Google product manager verbally asked Gemini Live to find family-friendly wineries near Mountain View that had outdoor areas and playgrounds nearby, so kids could potentially come along. It’s a more complex task than asking Siri — or Google Search, frankly — to find one, but Gemini successfully recommended a location that met the criteria: Cooper-Garrod Vineyards in Saratoga.
That being said, Gemini Live is missing something. It seems to have been envisioning a nearby playground called Henry Elementary School Playground, which is “10 minutes away” from that vineyard. There are other playgrounds nearby in Saratoga, but the closest one, Henry Elementary School, is more than a two-hour drive away. There is a Henry Ford Elementary School in Redwood City, but it’s 30 minutes away.
Google loved to show how users can stop Gemini Live in mid-sentence, and the AI will immediately turn around. The company says this allows users to control the conversation. In practice, this feature doesn’t work perfectly. Sometimes Google’s project manager and Gemini Live are talking to each other, and the AI can’t understand what’s being said.
Notably, according to product manager Leland Rechis, Google is not allowing Gemini Live to sing or mimic any voices other than 10. The company is doing this to avoid conflicts with copyright law. Furthermore, Rechis said Google is not focusing Gemini Live on understanding emotional tones in a user’s voice – something that OpenAI pointed out during its demo.
Overall, the feature is a great way to dig deeper into a topic, which is something you can do more naturally than a simple Google search. Google notes that Gemini Live is a step towards Project Astra, a fully multimodal AI model that the company debuted during Google I/O. For now, Gemini Live is only capable of voice conversations, however, Google wants to add real-time video understanding in the future.