Benchmark Reference

Gemma 4 Benchmarks
with source transparency

This page separates official and community evidence for each score. Unknown values stay explicitly marked as pending instead of being backfilled with assumptions.

Core benchmark table

Primary rows are sourced from official model documentation where available.

BenchmarkDescriptionGemma 4 31B26B A4BE4BE2BGemma 3 27B
AIME 2026Competition math reasoning
89.2%

Official · confidence high

Pending official publicationPending official publicationPending official publication
20.8%

Official · confidence high

tau2-benchAgentic tool-use accuracy
86.4%

Official · confidence high

Pending official publicationPending official publicationPending official publication
6.6%

Official · confidence high

Arena AI ELOGeneral conversation quality
1452

Official · confidence high

Pending official publicationPending official publicationPending official publicationPending official publication
OmniDocBench 1.5Document OCR/edit distance (lower is better)
0.131

Community · confidence medium

Media/community summary; validate against official updates when published.

Pending official publicationPending official publicationPending official publication
0.365

Community · confidence medium

Cross-generation uplift

AIME 2026

Gemma 3 baseline20.8%
Gemma 489.2%

+328.8% uplift

tau2-bench

Gemma 3 baseline6.6%
Gemma 486.4%

+1209.1% uplift

Arena ELO competitor context

Competitor rows below are community/media references and should be treated as indicative.

Gemma 4 31B · 31BELO 1452

Official source: Google DeepMind - Gemma 4

Qwen 3.5 27B · 27BELO 1403

Community source: VentureBeat - Gemma 4 coverage

DeepSeek-V3.2 · ~600B MoEELO 1425

Community source: VentureBeat - Gemma 4 coverage