Benchmark Reference

Gemma 4 Benchmarks
with source transparency

This page separates release-day ranking claims from the latest available leaderboard snapshot so ranking statements stay date-scoped and auditable.

Core benchmark table

Primary rows are sourced from official model documentation where available.

BenchmarkDescriptionGemma 4 31B26B A4BE4BE2BGemma 3 27B
AIME 2026Competition math reasoning
89.2%

Official · confidence high

88.3%

Official · confidence high

Published by Hugging Face with Google-provided benchmark table.

Pending official publicationPending official publication
20.8%

Official · confidence high

tau2-benchAgentic tool-use accuracy
86.4%

Official · confidence high

68.2%

Official · confidence high

Value from HF benchmark table (Tau2 average over 3).

Pending official publicationPending official publication
6.6%

Official · confidence high

Arena AI ELOText arena conversation quality
1452 (release claim)

Official · confidence high

Google launch post (Apr 2, 2026) states 31B as #3 open model.

1441 (release claim)

Official · confidence high

HF launch blog states 26B A4B estimated arena score 1441.

Pending official publicationPending official publicationPending official publication
OmniDocBench 1.5Document OCR/edit distance (lower is better)
0.131

Official · confidence high

0.149

Official · confidence high

Pending official publicationPending official publication
0.365

Official · confidence high

Cross-generation uplift

AIME 2026

Gemma 3 baseline20.8%
Gemma 489.2%

+328.8% uplift

tau2-bench

Gemma 3 baseline6.6%
Gemma 486.4%

+1209.1% uplift

Arena ranking context

We keep two timelines visible: launch-day claims (Apr 2, 2026 post, data as of Apr 1) and a later leaderboard snapshot to avoid stale absolute ranking claims.

Gemma 4 31B Dense

ELO 1452

#3 open model

Snapshot: 2026-04-01 · checked 2026-04-11

Google Keyword - Gemma 4: Byte for byte, the most capable open models

Gemma 4 26B MoE

ELO 1441

#6 open model

Snapshot: 2026-04-01 · checked 2026-04-11

Google Keyword - Gemma 4: Byte for byte, the most capable open models
ModelArena AI ELOGlobal RankOpen RankLicenseContext
GLM-5.1
n/a
1458.51#26 overall#1 openOpen (commercial limits vary by provider)~203K
GLM-5
n/a
1457.14#27 overall#2 openOpen (commercial limits vary by provider)~200K
Kimi 2.5
n/a
1452.60#28 overall#3 openOpen (commercial limits vary by provider)~128K
Gemma 4 31B Dense
31B dense
1451.16#29 overall#4 openApache 2.0256K
Qwen 3.5 32B
32B
1445.18#34 overall#5 openApache 2.0~128K
GLM-4.7
n/a
1442.65#37 overall#6 openOpen (commercial limits vary by provider)n/a
Gemma 4 26B MoE
26B MoE (4B active)
1437.89#43 overall#7 openApache 2.0256K

Latest snapshot date in this table: 2026-04-07 · checked 2026-04-11

Source: Hugging Face - lmarena-ai/leaderboard-dataset (text_style_control/latest)

Ranking can drift between snapshots. Keep date qualifiers in any homepage, model, or blog claim.

Competitive watch: gemma4.wiki has started publishing benchmark-oriented pages. We are tracking this as a content velocity signal.