Google launches Gemini, the AI model it hopes will take down GPT-4
7004
Saturday, 09 December, 2023, 19:54
t’s the beginning of a new era of AI at Google, says CEO Sundar Pichai: the Gemini era. Gemini is Google’s latest large language model, which Pichai first teased at the I/O developer conference in June and is now launching to the public. To hear Pichai and Google DeepMind CEO Demis Hassabis describe it, it’s a huge leap forward in an AI model that will ultimately affect practically all of Google’s products. “One of the powerful things about this moment,” Pichai says, “is you can work on one underlying technology and make it better and it immediately flows across our products.” Gemini is more than a single AI model. There’s a lighter version called Gemini Nano that is meant to be run natively and offline on Android devices. There’s a beefier version called Gemini Pro that will soon power lots of Google AI services and is the backbone of Bard starting today. And there’s an even more capable model called Gemini Ultra that is the most powerful LLM Google has yet created and seems to be mostly designed for data centers and enterprise applications. Google is launching the model in a few ways right now: Bard is now powered by Gemini Pro, and Pixel 8 Pro users will get a few new features thanks to Gemini Nano. (Gemini Ultra is coming next year.) Developers and enterprise customers will be able to access Gemini Pro through Google Generative AI Studio or Vertex AI in Google Cloud starting on December 13th. Gemini is only available in English for now, with other languages evidently coming soon. But Pichai says the model will eventually be integrated into Google’s search engine, its ad products, the Chrome browser, and more, all over the world. It is the future of Google, and it’s here not a moment too soon.OpenAI launched ChatGPT a year and a week ago, and the company and product immediately became the biggest things in AI. Now, Google — the company that created much of the foundational technology behind the current AI boom, that has called itself an “AI-first” organization for nearly a decade, and that was clearly and embarrassingly caught off guard by how good ChatGPT was and how fast OpenAI’s tech has taken over the industry — is finally ready to fight back. So, let’s just get to the important question, shall we? OpenAI’s GPT-4 versus Google’s Gemini: ready, go. This has very clearly been on Google’s mind for a while. “We’ve done a very thorough analysis of the systems side by side, and the benchmarking,” Hassabis says. Google ran 32 well-established benchmarks comparing the two models, from broad overall tests like the Multi-task Language Understanding benchmark to one that compares two models’ ability to generate Python code. “I think we’re substantially ahead on 30 out of 32” of those benchmarks, Hassabis says, with a bit of a smile on his face. “Some of them are very narrow. Some of them are larger.”In those benchmarks (which really are mostly very close) Gemini’s clearest advantage comes from its ability to understand and interact with video and audio. This is very much by design: multimodality has been part of the Gemini plan from the beginning. Google hasn’t trained separate models for images and voice, the way OpenAI created DALL-E and Whisper; it built one multisensory model from the beginning. “We’ve always been interested in very, very general systems,” Hassabis says. He’s especially interested in how to mix all of those modes — to collect as much data as possible from any number of inputs and senses and then give responses with just as much variety.Right now, Gemini’s most basic models are text in and text out, but more powerful models like Gemini Ultra can work with images, video, and audio. And “it’s going to get even more general than that,” Hassabis says. “There’s still things like action, and touch — more like robotics-type things.” Over time, he says, Gemini will get more senses, become more aware, and become more accurate and grounded in the process. “These models just sort of understand better about the world around them.” These models still hallucinate, of course, and they still have biases and other problems. But the more they know, Hassabis says, the better they’ll get. “These models just sort of understand better about the world around them.”