Tool Calling¶
"Tool calling" is the trick that turns a chat model into something that can act on the world — look up the weather, move a camera, trigger a quest. In Tryll, tool calling is explicitly detection-only: the server tells your client that a tool should be called and with what arguments, and your client decides whether to actually run it. This page explains why that split exists, how small local models are coaxed into producing tool calls, and where the edges are.
The shape of a tool call in Tryll¶
sequenceDiagram
participant C as Client
participant S as Server
participant TC as ToolCall node
participant M as Model
C->>S: CreateAgent(graph with ToolCall node + tools[])
C->>S: SendMessage("turn on the porch light")
S->>TC: Execute
TC->>M: Generate (non-streaming) with tool-schema prompt
M-->>TC: "{"tool": "set_light", "args": {"name":"porch","on":true}}"
TC-->>S: parsed call + ToolCallNotification (if notify_client)
S-->>C: ToolCallNotification{tool_name, arguments_json}
TC->>TC: record the call on the current turn
TC-->>S: exit "tool_called"
Note over S,C: graph continues — typically to END
Three things to notice:
- The model never talks to tools directly. It only produces text. The node parses that text for a tool-call pattern.
- The server never executes the tool. It fires a notification to the client and moves on. What to run, how to run it, and whether to feed the result back into the conversation is your call.
- A failed parse is not an error. If no tool pattern is found,
the node takes the
no_tool_calledroute. That is a perfectly normal control flow — branch the graph on it.
Why detection-only?¶
Small local models in 2025 are not uniformly good at tool calls. Different model families were fine-tuned on different prompt shapes and different output grammars. Crucially, none of them know about your tools.
Running the tool server-side would force Tryll to:
- sandbox arbitrary code,
- make network / filesystem policy decisions for your app, and
- embed a universal tool-dispatcher that can never match the specifics of a game engine or a desktop app.
Detection-only punts all of that back to the client, where the code
to "turn on the porch light" already lives. The server's job is to
reliably extract {tool_name, arguments} from the model — which is
the hard part in small models anyway.
It also means tool calls compose naturally with the rest of Tryll.
A graph can put a
guardrail in front
of a ToolCall node, a CannedResponse on the rejection path, and
so on. Nothing about the graph structure is special because tools
are in it.
The four format families¶
Different model families were fine-tuned on different "tool grammars". Tryll ships four built-in tool-call formats and picks one per node:
| Format | Who it fits | How the model emits a call |
|---|---|---|
chatml |
Qwen 2.x, Phi-3, ChatML-finetuned Mistral | <tool_call>…json…</tool_call> block |
llama3 |
Meta Llama-3 / 3.1 / 3.2 Instruct | bare JSON with name and parameters |
mistral |
Mistral / Mixtral instruct | [TOOL_CALLS][…json array…] |
generic |
Anything else | bare {"tool": "...", "args": {...}} object |
The format controls three things together, which is why they are bundled: (1) where the tool schema goes in the prompt (system message vs. user preamble), (2) how each tool definition is rendered, and (3) what pattern to look for in the output.
Format resolution order:
node param "tool_call_format"
↓ empty?
models.json "tool_call_format" on this model
↓ empty?
"chatml" as a safe default
Why "bundled"? Because mixing breaks¶
A Llama-3 model fine-tuned to emit bare JSON can be coaxed into
emitting <tool_call> blocks, but the reliability drops and the
arguments often go wrong. Matching format to model family is the
single biggest lever on tool-call reliability. When in doubt:
chatml for most models, llama3 for Meta Instruct models. See the
model family → format table in the
ToolCall node reference.
What the node actually does¶
On each turn the ToolCall node:
- Builds a prompt with the tool schema baked into either the system message or the current user message (format-dependent).
- Runs a non-streaming
Generateagainst the node's model. Temperature is typically low for tool calls (the default is to inherit sampling params; override them to 0 for determinism). - Parses the raw text for tool-call patterns — tagged first
(
<tool_call>/[TOOL_CALLS]/{"tool":), falling back to a bare-JSON scan that accepts balanced{…}objects containing the configured name field. - For each parsed call, records the call on the current turn. If
notify_client = true, also sends aToolCallNotificationwire frame so your client can react. - Exits via
tool_called(one or more parsed) orno_tool_called.
One option worth calling out: generate_on_no_tool
(experimental). When the model did not emit a tool call, this flag
decides whether the "residual" text (everything that was not a tool
call) is emitted back to the client as a normal AnswerText. Set to
true if your graph puts the ToolCall node in place of a
Generate node; set to false if a separate Generate node takes
over after no_tool_called.
Tool-call records do not come back next turn¶
An important design decision: a ToolCall node's projection leaves
any prior tool-call records out when it rebuilds the prompt for the
next turn. The model does not see previous tool calls. This is
intentional:
- Models get confused when they see old tool calls in the prompt; many will imitate the format instead of responding in natural language.
- You control the conversation. If you want to inject "we ran get_weather and it returned sunny" into the next turn, append it as a normal user message in your app logic, not as a tool record.
If you want a full "call tool, get result, let model speak again"
loop, you build it out of two turns: one that fires the call, your
client runs the tool, and then your next SendMessage includes the
result phrased as user text.
What makes tool calls reliable (or unreliable)¶
From most to least impactful:
- Match the format to the model. 80% of reliability comes from this alone.
- Keep the tool list short. Small models get overwhelmed past 5–8 tools. If you have many, put a routing / guardrail node in front to select a subset.
- Write tool descriptions like prompts. The description is what the model sees. "Turn a named light on or off" beats "light controller function".
- Validate on the client. A small model will hallucinate
argument names eventually. Treat
arguments_jsonas untrusted input. - Use low temperature. Tool calls are a classification /
structure task, not a creative one.
temperature=0is often the right answer.
Edges and pitfalls¶
- Arguments are flat strings. Even numeric or boolean values come
back to the client as string-encoded JSON scalars inside
arguments_json. Parse and coerce in your client. - The model can invent tools. A
ToolCallnode will happily parse{"tool": "nuke_from_orbit"}if the model emits it. The client must check the tool name against the allow-list before acting. - Multiple calls per turn are possible. Mistral's array format
explicitly supports it; the others occasionally produce multiple
blocks. The node records one entry per parsed call on the turn and
fires one
ToolCallNotificationper call. - No streaming. Tool-call generation is non-streaming by design —
the parser needs the full output. Do not expect
AnswerTextframes from aToolCallnode unlessgenerate_on_no_tool(experimental) fires.