Gemma 4 Tool Calling Troubleshooting (Ollama, vLLM, OpenCode)
A practical troubleshooting guide for Gemma 4 tool calling failures, including invalid JSON, parser drift, and template mismatches across Ollama, vLLM, and OpenCode.
A practical troubleshooting guide for Gemma 4 tool calling failures, including invalid JSON, parser drift, and template mismatches across Ollama, vLLM, and OpenCode.
If your Gemma 4 agent can chat but fails as soon as tools are involved, you are not alone.
The most common failure pattern is:
In practice, most failures are integration mismatches, not one single model bug.
Based on active community reports in Ollama and vLLM, failures cluster in four buckets:
Before changing prompts, pin down where the failure happens.
ollama -v, vllm --version).echo style schema, one string field).If it succeeds without streaming, your issue is usually parser/chunk handling.
Logs like tool call parsing failed or content containing quasi-JSON fragments.
Use direct constraints in system prompt:
When calling tools:
- Return exactly one JSON object
- Do not wrap in markdown
- Do not include explanation text outside JSON
This is basic, but it removes many parser-ambiguity failures.
Tool usage fails in streaming mode with invalid escapes or broken JSON assembly.
Never let UI-level parse success be your only success signal. Always validate against JSON schema before execution.
Agent frameworks add another failure surface: post-processors.
Common mistakes:
Recommended guardrail pipeline:
If step 1 is unavailable in your stack, add it first. Debugging without raw payload is guesswork.
For stable tool calling with Gemma 4, start with:
Then reintroduce complexity gradually.
| Situation | Most likely cause | First move |
|---|---|---|
| Non-streaming works, streaming fails | Chunk assembly/parser | Fix streaming parser path |
| Both fail with malformed arguments | Prompt/template mismatch | Tighten tool contract prompt |
| Works in one runtime, fails in another | Runtime parser implementation differences | Pin version and compare raw output |
| Intermittent failures only on complex args | Escaping/nesting complexity | Flatten schema and retry |
Do not treat tool calling as "model quality only". It is a full-stack protocol problem.
For Gemma 4, the winning approach is:
That order reduces incident count the fastest.