Google Launches Gemini, Its Largest and Most Capable AI Model
The pace of progress for AI innovation, especially generative AI (GenAI), is only accelerating as businesses are striving to find new ways to harness the power of this rapidly evolving technology. The year 2023 could go down in the tech annals as the year of GenAI.
The AI wars just got a lot more intense this week as Google officially launched its much-awaited Google Gemini 1.0. According to Google, Gemini is their most capable and flexible AI model yet, with the ability to efficiently run on everything from mobile devices to data centers. Gemini’s capabilities enhance the ability of developers and enterprise customers to build and scale AI.
Google Gemini 1.0 is available in three different sizes - Gemini Ultra, Gemini Pro, and Gemini Nano. The Gemini Ultra is the largest and most capable model designed for highly complex tasks such as advanced coding. The Gemini Pro is best used for scaling across a wide range of tasks, while the Gemini Nano version is ideally used for on-device tasks.
According to a note from Google and Alphabet CEO Sundar Pichai, ” This new era of (Gemini) models represents one of the biggest science and engineering efforts we’ve undertaken as a company. I’m genuinely excited for what’s ahead, and for the opportunities Gemini will unlock for people everywhere.”
While developers and enterprises have already made astounding advances in the field of GenAI, there is a lot more potential. Commenting on Gemini, Pichai added that the momentum has been “incredible” and “we’re only beginning to scratch the surface of what’s possible”.
The launch of Gemini comes just a week after Amazon launched Amazon Q - an AI assistant designed to help customers automate a range of tasks. Google has been exploring new ways to harness the power of GenAI and this is evident in its acquisition of DeepMind earlier this year.
One of the challenges with multimodal models is that they require training separate components for different modalities. However, Gemini is designed to be natively multimodal. This means that it's trained from the start on different modalities, and all it requires is some fine-tuning to further refine its effectiveness. This allows it to seamlessly work with different types of information including text, code, audio, image, and video.
Google has been testing the Gemini capabilities against other leading language language models (LLMs). Google claims that Gemini outscored the best LLMs based on widely-used academic benchmark testing, and even outperformed human experts on the Massive Multitask Language Understanding (MMLU).
Gemini also offers advanced coding capabilities in some of the most widely used programming languages including Java, Python, and C++. Using a specialized version of Gemini, Google was able to create a more advanced code generation system, AlphaCode 2, that significantly outperforms its previous version
Google is also aware of the potential risks of advanced AI including cyber-offense, autonomy, and data bias. To help identify and mitigate these risks, Google has conducted its most comprehensive safety evaluation and novel research on Gemini. It has also added dedicated safety classifiers to filter content that doesn't meet Google’s safety standards and policies.
The Nano and Pro models are set to be immediately incorporated into the Google Pixel 8 pro smartphone and Google Bard. Google experts Gemini to help make Bard more intuitive and better in tasks that involve planning. In the coming months, Gemini will be added to other Google products including Chrome, Search, and Ads.
OpenAI and Google’s long-time industry rival Microsoft have held a dominant position in the world of GenAI, but the arrival of Google Gemini is set to up the ante in AI competition which has been escalating over the last year. The big players in this space are expected to invest heavily in improving their AI solutions which makes the next year an exciting time for the industry.