Back to blog

Gemma 4 Tool Calling Troubleshooting (Ollama, vLLM, OpenCode)

A practical troubleshooting guide for Gemma 4 tool calling failures, including invalid JSON, parser drift, and template mismatches across Ollama, vLLM, and OpenCode.

April 6, 20263 min read
Gemma 4
Tool Calling
Ollama
vLLM
OpenCode

If your Gemma 4 agent can chat but fails as soon as tools are involved, you are not alone.

The most common failure pattern is:

  • model returns malformed tool JSON
  • parser rejects the payload
  • the app retries or silently degrades
  • you think "Gemma 4 tool calling is broken"

In practice, most failures are integration mismatches, not one single model bug.

What Usually Breaks

Based on active community reports in Ollama and vLLM, failures cluster in four buckets:

  1. Streaming parser drift: partial chunks are interpreted as complete JSON.
  2. Template mismatch: chat template expects one tool format, runtime parses another.
  3. Escaping edge cases: quotes/backticks/slashes break strict JSON parsing.
  4. Version regressions: one runtime release fixes one path but breaks another.

Fast Triage Checklist (10 Minutes)

Before changing prompts, pin down where the failure happens.

  1. Verify the model and runtime versions (ollama -v, vllm --version).
  2. Reproduce with a single minimal tool (echo style schema, one string field).
  3. Disable streaming once and retest.
  4. Capture raw model output before your app parser modifies it.
  5. Compare output against your tool schema validator directly.

If it succeeds without streaming, your issue is usually parser/chunk handling.

Ollama Path: Practical Fixes

Symptom

Logs like tool call parsing failed or content containing quasi-JSON fragments.

Fix Strategy

  1. Update first: Gemma 4 tool parsing behavior changed rapidly across recent releases.
  2. Enforce deterministic output for tool turns:
    • lower temperature
    • keep tool schema concise
    • avoid deeply nested unions until stable
  3. Split system prompts:
    • one short policy prompt
    • one explicit tool contract section
  4. Validate server-side even if UI says "tool parsed".

Minimal Contract Pattern

Use direct constraints in system prompt:

When calling tools:
- Return exactly one JSON object
- Do not wrap in markdown
- Do not include explanation text outside JSON

This is basic, but it removes many parser-ambiguity failures.

vLLM Path: Practical Fixes

Symptom

Tool usage fails in streaming mode with invalid escapes or broken JSON assembly.

Fix Strategy

  1. Temporarily run non-streaming to confirm model output is valid.
  2. If non-streaming works, patch or upgrade your streaming assembly path.
  3. Keep tool args shallow in early deployment.
  4. Add a "repair-or-retry" layer:
    • strict parse
    • safe JSON repair for trivial escaping
    • one deterministic retry

Production Rule

Never let UI-level parse success be your only success signal. Always validate against JSON schema before execution.

OpenCode / Agent Framework Integration

Agent frameworks add another failure surface: post-processors.

Common mistakes:

  • normalizing quotes in a way that corrupts JSON
  • mixing "assistant text mode" and "tool mode" in one turn
  • allowing hidden retries that mutate payloads

Recommended guardrail pipeline:

  1. Raw model payload snapshot
  2. Strict parse
  3. Schema validation
  4. Only then invoke tool

If step 1 is unavailable in your stack, add it first. Debugging without raw payload is guesswork.

Known Good Operating Profile

For stable tool calling with Gemma 4, start with:

  • deterministic decoding for tool turns
  • one tool per turn (initially)
  • no markdown in tool responses
  • short argument schemas
  • fallback to non-streaming when parser confidence is low

Then reintroduce complexity gradually.

Decision Tree

SituationMost likely causeFirst move
Non-streaming works, streaming failsChunk assembly/parserFix streaming parser path
Both fail with malformed argumentsPrompt/template mismatchTighten tool contract prompt
Works in one runtime, fails in anotherRuntime parser implementation differencesPin version and compare raw output
Intermittent failures only on complex argsEscaping/nesting complexityFlatten schema and retry

Final Takeaway

Do not treat tool calling as "model quality only". It is a full-stack protocol problem.

For Gemma 4, the winning approach is:

  • stabilize protocol first
  • isolate runtime behavior second
  • optimize prompt style last

That order reduces incident count the fastest.

Sources