Skip to content

Tool Calling

"Tool calling" is the trick that turns a chat model into something that can act on the world — look up the weather, move a camera, trigger a quest. In Tryll, tool calling is explicitly detection-only: the server tells your client that a tool should be called and with what arguments, and your client decides whether to actually run it. This page explains why that split exists, how small local models are coaxed into producing tool calls, and where the edges are.

The shape of a tool call in Tryll

sequenceDiagram
    participant C as Client
    participant S as Server
    participant TC as ToolCall node
    participant M as Model

    C->>S: CreateAgent(graph with ToolCall node + tools[])
    C->>S: SendMessage("turn on the porch light")
    S->>TC: Execute
    TC->>M: Generate (non-streaming) with tool-schema prompt
    M-->>TC: "{"tool": "set_light", "args": {"name":"porch","on":true}}"
    TC-->>S: parsed call + ToolCallNotification (if notify_client)
    S-->>C: ToolCallNotification{tool_name, arguments_json}
    TC->>TC: record the call on the current turn
    TC-->>S: exit "tool_called"
    Note over S,C: graph continues — typically to END

Three things to notice:

  1. The model never talks to tools directly. It only produces text. The node parses that text for a tool-call pattern.
  2. The server never executes the tool. It fires a notification to the client and moves on. What to run, how to run it, and whether to feed the result back into the conversation is your call.
  3. A failed parse is not an error. If no tool pattern is found, the node takes the no_tool_called route. That is a perfectly normal control flow — branch the graph on it.

Why detection-only?

Small local models in 2025 are not uniformly good at tool calls. Different model families were fine-tuned on different prompt shapes and different output grammars. Crucially, none of them know about your tools.

Running the tool server-side would force Tryll to:

  • sandbox arbitrary code,
  • make network / filesystem policy decisions for your app, and
  • embed a universal tool-dispatcher that can never match the specifics of a game engine or a desktop app.

Detection-only punts all of that back to the client, where the code to "turn on the porch light" already lives. The server's job is to reliably extract {tool_name, arguments} from the model — which is the hard part in small models anyway.

It also means tool calls compose naturally with the rest of Tryll. A graph can put a guardrail in front of a ToolCall node, a CannedResponse on the rejection path, and so on. Nothing about the graph structure is special because tools are in it.

The four format families

Different model families were fine-tuned on different "tool grammars". Tryll ships four built-in tool-call formats and picks one per node:

Format Who it fits How the model emits a call
chatml Qwen 2.x, Phi-3, ChatML-finetuned Mistral <tool_call>…json…</tool_call> block
llama3 Meta Llama-3 / 3.1 / 3.2 Instruct bare JSON with name and parameters
mistral Mistral / Mixtral instruct [TOOL_CALLS][…json array…]
generic Anything else bare {"tool": "...", "args": {...}} object

The format controls three things together, which is why they are bundled: (1) where the tool schema goes in the prompt (system message vs. user preamble), (2) how each tool definition is rendered, and (3) what pattern to look for in the output.

Format resolution order:

node param "tool_call_format"
     ↓ empty?
models.json "tool_call_format" on this model
     ↓ empty?
"chatml" as a safe default

Why "bundled"? Because mixing breaks

A Llama-3 model fine-tuned to emit bare JSON can be coaxed into emitting <tool_call> blocks, but the reliability drops and the arguments often go wrong. Matching format to model family is the single biggest lever on tool-call reliability. When in doubt: chatml for most models, llama3 for Meta Instruct models. See the model family → format table in the ToolCall node reference.

What the node actually does

On each turn the ToolCall node:

  1. Builds a prompt with the tool schema baked into either the system message or the current user message (format-dependent).
  2. Runs a non-streaming Generate against the node's model. Temperature is typically low for tool calls (the default is to inherit sampling params; override them to 0 for determinism).
  3. Parses the raw text for tool-call patterns — tagged first (<tool_call> / [TOOL_CALLS] / {"tool":), falling back to a bare-JSON scan that accepts balanced {…} objects containing the configured name field.
  4. For each parsed call, records the call on the current turn. If notify_client = true, also sends a ToolCallNotification wire frame so your client can react.
  5. Exits via tool_called (one or more parsed) or no_tool_called.

One option worth calling out: generate_on_no_tool (experimental). When the model did not emit a tool call, this flag decides whether the "residual" text (everything that was not a tool call) is emitted back to the client as a normal AnswerText. Set to true if your graph puts the ToolCall node in place of a Generate node; set to false if a separate Generate node takes over after no_tool_called.

Tool-call records do not come back next turn

An important design decision: a ToolCall node's projection leaves any prior tool-call records out when it rebuilds the prompt for the next turn. The model does not see previous tool calls. This is intentional:

  • Models get confused when they see old tool calls in the prompt; many will imitate the format instead of responding in natural language.
  • You control the conversation. If you want to inject "we ran get_weather and it returned sunny" into the next turn, append it as a normal user message in your app logic, not as a tool record.

If you want a full "call tool, get result, let model speak again" loop, you build it out of two turns: one that fires the call, your client runs the tool, and then your next SendMessage includes the result phrased as user text.

What makes tool calls reliable (or unreliable)

From most to least impactful:

  1. Match the format to the model. 80% of reliability comes from this alone.
  2. Keep the tool list short. Small models get overwhelmed past 5–8 tools. If you have many, put a routing / guardrail node in front to select a subset.
  3. Write tool descriptions like prompts. The description is what the model sees. "Turn a named light on or off" beats "light controller function".
  4. Validate on the client. A small model will hallucinate argument names eventually. Treat arguments_json as untrusted input.
  5. Use low temperature. Tool calls are a classification / structure task, not a creative one. temperature=0 is often the right answer.

Edges and pitfalls

  • Arguments are flat strings. Even numeric or boolean values come back to the client as string-encoded JSON scalars inside arguments_json. Parse and coerce in your client.
  • The model can invent tools. A ToolCall node will happily parse {"tool": "nuke_from_orbit"} if the model emits it. The client must check the tool name against the allow-list before acting.
  • Multiple calls per turn are possible. Mistral's array format explicitly supports it; the others occasionally produce multiple blocks. The node records one entry per parsed call on the turn and fires one ToolCallNotification per call.
  • No streaming. Tool-call generation is non-streaming by design — the parser needs the full output. Do not expect AnswerText frames from a ToolCall node unless generate_on_no_tool (experimental) fires.