Gemini 3 Flash, Google's new AI model that aims for efficiency, is available

Credit: Google.

Gemini 3 Flash was officially released on December 17th. Google’s new efficient AI-powered model brings the reasoning and understanding capabilities of frontier models to a version optimized for speed of response and computational efficiency. Gemini 3 Flash represents an important step, as it maintains the technological foundations of Gemini 3 Pro, but drastically reduces latency and costs, making it suitable for both everyday use and complex and automated workflows. Let’s take a closer look at the features of Gemini 3 Flash.

The features of Gemini 3 Flash

After the debut of Gemini 3 Pro and Gemini 3 Deep Think, which took place last month, Google has completed the “family” with Gemini 3 Flash, which retains advanced reasoning capabilities, multimodal understanding (i.e. the ability to interpret text, images, audio and video in an integrated way) and agent coding functions while adding greater efficiency to all of this. The latest model, in fact, is designed to respond faster and consume fewer resources, dynamically adapting the amount of “thinking” needed depending on the complexity of the task. In everyday cases it uses on average about 30% fewer tokens than Gemini 2.5 Pro, thus increasing efficiency and precision.

Benchmark results help put these claims into context. Tests like GPQA Diamond or Humanity’s Last Exam assess advanced academic skills, while MMMU Pro measures multimodal understanding. In all these tests, Gemini 3 Flash achieves scores comparable to those of Gemini 3 Pro and surpasses previous models, demonstrating that reducing costs does not necessarily imply a decrease in quality of output. Another indicator cited by Google is LMArena’s Elo score, a chess-inspired rating system that compares model performance based on user preferences. In this context, Gemini 3 Flash goes beyond the Pareto frontier, an economic concept that describes the limit beyond which improving one variable, such as speed, leads to a worsening of another, such as quality or price.

In the graph you can see how Gemini 3 Flash has surpassed the Pareto frontier in terms of performance, cost and speed. Credit: Google.

Speed remains the most obvious strong point of the model. According to data provided by Google, Gemini 3 Flash is approximately three times faster than Gemini 2.5 Pro, with significantly lower costs: 0.5 dollars per million tokens for input and 3 dollars for output. Inference, the ability of an AI model to spot patterns and deduce information from previously unencountered data, has been optimized to support high-frequency streams. It is no coincidence that on SWE-bench Verified, a benchmark that evaluates the ability to solve real programming problems, Gemini 3 Flash achieves superior results even to Gemini 3 Pro, reaching a score of 78%.

The table showing the scores achieved in the various benchmarks by Gemini 3 Flash, compared with other AI models, from Google and also from competitors. Credit: Google.

Where is Gemini 3 Flash available

The characteristics of Gemini 3 Flash justify the interest of companies and developers in the new efficient “Big G” model. Platforms like Google AI Studio, Vertex AI and Gemini Enterprise already offer access to the model, while companies like JetBrains, Figma and Bridgewater are using it to accelerate design, analysis and automation processes. For “mere mortals” Gemini 3 Flash is already available in the Gemini app also in the AI Mode of Google search.