0 0 0 Python

fix: preserve tool_calls/tool_responses in VLM message formatting

_format_messages_for_vlm_template runs every message through mlx-vlm's
get_message_json(), which only returns {role, content} — stripping
tool_calls, tool_call_id, and tool_responses fields. This makes tool
results invisible to the chat template, causing VLM models (Gemma 4
etc.) to loop retrying tool calls on text-only turns.

Pass role:tool messages and assistant messages carrying tool_calls or
tool_responses through verbatim. Only user/system/plain-assistant
messages need VLM image-token formatting.

Fixes #788

Craig Tollifson committed 4d ago

92aab5c3d6176fec5f5a76efe66cddc44c5f7193

Parent: a045b6a