Benchmark Reference

Gemma 4 Benchmarks
with source transparency

This page separates release-day ranking claims from the latest available leaderboard snapshot so ranking statements stay date-scoped and auditable.

Model specs Hardware guide

Core benchmark table

Primary rows are sourced from official model documentation where available.

Benchmark	Description	Gemma 4 31B	26B A4B	E4B	E2B	Gemma 3 27B
AIME 2026	Competition math reasoning	89.2% Google AI for Developers - Gemma 4 Model Card Official · confidence high	88.3% Hugging Face Blog - Gemma 4 Official · confidence high Published by Hugging Face with Google-provided benchmark table.	Pending official publication	Pending official publication	20.8% Google AI for Developers - Gemma 4 Model Card Official · confidence high
tau2-bench	Agentic tool-use accuracy	86.4% Google DeepMind - Gemma 4 Official · confidence high	68.2% Hugging Face Blog - Gemma 4 Official · confidence high Value from HF benchmark table (Tau2 average over 3).	Pending official publication	Pending official publication	6.6% Google DeepMind - Gemma 4 Official · confidence high
Arena AI ELO	Text arena conversation quality	1452 (release claim) Google Keyword - Gemma 4: Byte for byte, the most capable open models Official · confidence high Google launch post (Apr 2, 2026) states 31B as #3 open model.	1441 (release claim) Hugging Face Blog - Gemma 4 Official · confidence high HF launch blog states 26B A4B estimated arena score 1441.	Pending official publication	Pending official publication	Pending official publication
OmniDocBench 1.5	Document OCR/edit distance (lower is better)	0.131 Hugging Face Blog - Gemma 4 Official · confidence high	0.149 Hugging Face Blog - Gemma 4 Official · confidence high	Pending official publication	Pending official publication	0.365 Hugging Face Blog - Gemma 4 Official · confidence high

Cross-generation uplift

AIME 2026

Gemma 3 baseline20.8%

Gemma 489.2%

+328.8% uplift

tau2-bench

Gemma 3 baseline6.6%

Gemma 486.4%

+1209.1% uplift

Arena ranking context

We keep two timelines visible: launch-day claims (Apr 2, 2026 post, data as of Apr 1) and a later leaderboard snapshot to avoid stale absolute ranking claims.

Gemma 4 31B Dense

ELO 1452

#3 open model

Snapshot: 2026-04-01 · checked 2026-04-11

Google Keyword - Gemma 4: Byte for byte, the most capable open models

Gemma 4 26B MoE

ELO 1441

#6 open model

Snapshot: 2026-04-01 · checked 2026-04-11

Google Keyword - Gemma 4: Byte for byte, the most capable open models

Model	Arena AI ELO	Global Rank	Open Rank	License	Context
GLM-5.1 n/a	1458.51	#26 overall	#1 open	Open (commercial limits vary by provider)	~203K
GLM-5 n/a	1457.14	#27 overall	#2 open	Open (commercial limits vary by provider)	~200K
Kimi 2.5 n/a	1452.60	#28 overall	#3 open	Open (commercial limits vary by provider)	~128K
Gemma 4 31B Dense 31B dense	1451.16	#29 overall	#4 open	Apache 2.0	256K
Qwen 3.5 32B 32B	1445.18	#34 overall	#5 open	Apache 2.0	~128K
GLM-4.7 n/a	1442.65	#37 overall	#6 open	Open (commercial limits vary by provider)	n/a
Gemma 4 26B MoE 26B MoE (4B active)	1437.89	#43 overall	#7 open	Apache 2.0	256K

Latest snapshot date in this table: 2026-04-07 · checked 2026-04-11

Source: Hugging Face - lmarena-ai/leaderboard-dataset (text_style_control/latest)

Ranking can drift between snapshots. Keep date qualifiers in any homepage, model, or blog claim.

Competitive watch: gemma4.wiki has started publishing benchmark-oriented pages. We are tracking this as a content velocity signal.

Sources used on this page

Google Keyword - Gemma 4: Byte for byte, the most capable open models

official · checked 2026-04-11

Google AI for Developers - Gemma 4 Model Card

official · checked 2026-04-11

Google DeepMind - Gemma 4

official · checked 2026-04-11

Arena AI - Text Arena Leaderboard

official · checked 2026-04-11

Hugging Face - lmarena-ai/leaderboard-dataset (text_style_control/latest)

official · checked 2026-04-11

Google AI for Developers - Prompt Formatting

official · checked 2026-04-08

Google AI for Developers - Function Calling

official · checked 2026-04-08

vLLM Docs - Gemma 4

official · checked 2026-04-08

Unsloth Docs - Gemma 4 Fine-tuning

official · checked 2026-04-08

Hugging Face Blog - Gemma 4

media · checked 2026-04-11

Hugging Face - onnx-community/gemma-4-E2B-it-ONNX

official · checked 2026-04-11

gemma4.wiki - Gemma 4 Models

community · checked 2026-04-11

VentureBeat - Gemma 4 coverage

media · checked 2026-04-08

DEV Community - Gemma 4 deployment guide

community · checked 2026-04-08

Spheron Network - Gemma 4 deployment

community · checked 2026-04-08

Hacker News - Gemma 4 hardware reports

community · checked 2026-04-08

AMD Day-0 support for Gemma 4

official · checked 2026-04-08

Run Gemma 4

Gemma 4 Benchmarkswith source transparency

Core benchmark table

Cross-generation uplift

Arena ranking context

Sources used on this page

Gemma 4 Benchmarks
with source transparency