Gemma 4 Lands to Great Reviews

On release, there were a few people suggesting that there was nothing special about Gemma 4. This was missing the point - weight, Gemma 4 outperforms the current landscape of models and has tuning pipelines. This is the true value of Gemma 4.

Elijah Snyder

03 Apr 2026 — 3 min read

Gemma 4 performance quadrant, via Google DeepMind

Punching Above Weight

Small Weight Models Available

Gemma 4 released with E2B and E4B versions being equivalent to larger parameter models. These are specifically for edge devices and mobile devices.

We are quickly bridging the gap to LLM powered apps rather than chatbots everywhere, and this is the true goal of working with LLMs

The next generation of mobile apps will have the ability to use LLMs to augment answers. Whether that's through natural language querying, voice querying, object identification, or causal generation, the landscape should be more than just chatbots and gemma 4 can help deliver those solutions.

Trainable and Extensible

Similar to other Gemma models (functionGemma, embeddingGemma, shieldGemma, etc) you are able to train these models and customize them for your specific goal.

A YouTube video from Google explains how quickly you can get started.

Fine tuning Gemma in Google Cloud

There are plenty of tutorials on fine-tuning Gemma models available.

Advocating fine-tuning and extending original model functionality is a theme you'll see here on KeySmash.dev in the future.

You can, of course, do similar fine-tuning with models like Qwen but Google has been far ahead of the pack by offering these open weight models with training for quite some time already. The training landscape is mature and a wide variety of businesses have been using Gemma models with these fine-tuned variants.

FunctionGemma specifically deserves a mention. The goal here was to provide a model you could fine tune that would help you to have a portion of your workflow dedicated to early understanding of if a tool or intent needed to be called. For custom tooling or pipelines with consistent interfaces, this would be a faster way to quickly decide on if a tool was needed in the current model workflow.

These types of innovations on small models and spurpose-built models is what makes Gemma unique.

Early Critiques

Over social media, a few users have reported that the 31B variant doesn't perform much better than other models. They seem to be expecting the Gemma 4 series to outperform significantly beyond its weights. While in most cases it does, it is unlikely you're going to see a massive boost in something like coding. The models have limited parameters - the efficiency and performance-by-weight is the true goal here. It is not meant to contend directly with frontier models like Anthropic's Opus or even Google's own Gemini Flash 3.1.

My personal early critique is that they should've tried to manage to cram 26B-A4B Q4 in under 16G of RAM - this is a sweet spot for most gaming desktops and, with a GUI, the current RAM requirement of 15.6G simply will not fit. It would've been really great to have this model on the average consumer "gaming machine".

A table of Gemma 4 Inference Memory requirements showing that at Q4 E2B requires 3.2G of RAM, E4B requires 5, 31B requires 17.4, and 26B-A4B requires 15.6G — A table of Gemma 4 Inference Memory Requirements, via Google

Final Thoughts

The Gemma 4 models are absolutely state of the art models with the most performance per weight. While users may be disappointed they don't have the raw coding performance of larger models, they may be slightly missing the point. You can extend these models through fine-tuning the entire suite of models.

Stick around for more discussion about models that can run on the edge and on mobile.

Gemma 4 Lands to Great Reviews

Elijah Snyder

Punching Above Weight

Small Weight Models Available

Trainable and Extensible

Early Critiques

Final Thoughts

Read more

Community Inference - For Good or Bad

Using Qwen3.5 on Fedora Workstation, ASUS ROG Flow Edition

Keysmash Local Tech Stack, March 2026

Indexed Zero

Punching Above Weight

Small Weight Models Available

Trainable and Extensible

Early Critiques

Final Thoughts

Sign up for keysmash

Read more

Community Inference - For Good or Bad

Using Qwen3.5 on Fedora Workstation, ASUS ROG Flow Edition

Keysmash Local Tech Stack, March 2026

Indexed Zero