Google presents Gemini 3, the AI ​​model that continues the challenge against ChatGPT: the features

Credit: Google

Google has officially released Gemini 3, an update that marks a new chapter in the competition for supremacy in artificial intelligence, positioning itself as a direct response to the recent launches of OpenAI and Anthropic. We are talking about the most sophisticated model ever created by the Mountain View laboratories, designed not only to chat in text form with the user but to act as «a real thought partner», to use the expression used by Google in its official press release. The big news lies in the diversification of the offer: on the one hand there is Gemini 3 Pro, already available, optimized for multimodal understanding and speed; on the other hand, Gemini 3 Deep Think will arrive shortly, a variant equipped with deep reasoning capabilities to solve complex scientific and mathematical problems. Let’s take a closer look at the features of Gemini 3.

The characteristics of the AI ​​Gemini 3

Getting to the heart of the technical specifications, you should know that Gemini 3 Pro represents a generational leap compared to version 2.5, surpassing it in every significant metric. The model reached the top of the LMArena ranking with an Elo score of 1501 (a comparative rating system based on human preferences), but what is most surprising is its performance in rigorous academic tests. In Humanity’s Last Exam benchmark, designed to test expert-level reasoning, the model scored 37.5% without the aid of external tools, surpassing previous record holders. By translating benchmark numbers into more concrete aspects, all this means that Google’s new model can handle complex nuances, abandoning the clichéd and flattery-filled responses typical of previous chatbots in favor of a more direct, factual and, if necessary, critical analysis. Its multimodal nature has been further refined: it does not just read texts, but is also capable of processing video, audio and images with unprecedented precision, as demonstrated by the score of 87.6% on Video-MMMU.

For those looking for even higher performance, there’s Gemini 3’s Deep Think mode, which will be available to Ultra plan subscribers after further security testing. Designed to “think” before responding, this mode achieved very interesting results in several benchmarks. In the ARC-AGI test, which evaluates the ability to solve problems never seen before, it reaches 45.1%, a value that indicates a capacity for abstraction and generalization that begins to simulate deductive human thought. Imagine being able to provide the model with entire video lessons or complex academic articles: Deep Think will not simply summarize them, but will be able to generate data visualizations, interactive flashcards or personalized study plans, acting as a sort of dedicated university tutor.

Google introduced the concept of vibe and agentic coding, a methodology that allows you to develop software based more on intention and natural description than on rigid syntax, entrusting AI with the task of translating the idea into working code. To support this vision, Google Antigravity was born, an agentic development platform that Google spoke about in these terms:

Google Antigravity transforms AI assistance from a tool in a developer’s toolkit to an active partner. While the heart of Google Antigravity is a familiar AI IDE experience, its agents have been elevated to a dedicated interface and given direct access to the editor, terminal, and browser. Now agents can autonomously plan and execute complex, end-to-end software tasks on your behalf, while validating their own code.

These agentic capabilities, the ability of AI to act as an autonomous entity pursuing a goal, extend well beyond programming. Thanks to better long-term planning, verified by the Vending-Bench 2 benchmark (a complex resource management simulation), it appears that Gemini 3 can handle complex daily tasks. This means that, to give a trivial example, it will be possible to delegate the management of your email on Gmail to him, asking him not only to read, but to organize and respond to messages or, another example, to plan complex travel itineraries by cross-referencing data from different sources. The objective is therefore to move the interaction from the insertion of a prompt to the delegation of a complex task.

The question of security and reliability of Google’s “smartest model”.

One aspect that Google has strongly underlined concerns the security and reliability of the model. In a landscape where AI hallucinations are still a problem, Gemini 3 showed progress in the SimpleQA Verified test, achieving a factual accuracy of 72.1%. Furthermore, the model was trained to better resist “prompt injection”, a technique used by cyber criminals to trick AI into performing unexpected actions (an increasingly insidious type of cyber attack), and also to avoid complacency, which is the tendency of AI to confirm the user’s opinions even when they are incorrect. Since all the tests we referred to were performed internally by Google, we will need to test the new Gemini 3 in the field to see if the model has been given all the improvements we talked about which, at least on paper, appear to be quite succulent.