Skip to content

First Inference in Python

Install the Python client, connect to a running Tryll server, and stream a reply from a local language model — all in about twenty lines of code.

Before you start

  • The Tryll server is running on 127.0.0.1:9100. If not, see Run the Tryll Server.
  • At least one language model is listed in the server's data/models.json with status Local or Loaded. You can check with any client, or bring your own — see Use Your Own Local Model.
  • Python 3.10+.

Step 1 — install the client

pip install tryll-client

That installs the tryll_client package with no external Python dependencies beyond the standard library.

Step 2 — write the script

Save this as first-inference.py:

from tryll_client import (
    TryllClient, GraphDescription, NodeType, InferenceEngine,
)

# Connect and configure the session.
client = TryllClient.connect("127.0.0.1", 9100)
client.configure_session(InferenceEngine.LlamaCpp)

# Pick a model — use the actual name from your server's models.json.
MODEL = "My Local Model"

# Build the simplest possible graph: one Generate node.
graph = (
    GraphDescription()
    .add_node("answer", NodeType.Generate)
    .wire("answer", "default", "END")
    .set_start_node("answer")
    .set_default_model_name(MODEL)
)

agent = client.create_agent(graph)

# Ask a question and print the full reply.
reply = agent.send_message("In one sentence: what is Tryll?")
print(reply)

agent.destroy()
client.shutdown()

Step 3 — run it

python first-inference.py

Want the client to start the server for you?

Use TryllClient.run_and_connect — it spawns the server, waits for it to be ready, and returns a ConnectedSession that shuts everything down automatically:

from pathlib import Path
from tryll_client import TryllClient, GraphDescription, NodeType, InferenceEngine

with TryllClient.run_and_connect(
    exe=Path("path/to/tryll_server.exe"),
    port=9100,
) as session:
    session.client.configure_session(InferenceEngine.LlamaCpp)
    agent = session.client.create_agent(graph)
    reply = agent.send_message("In one sentence: what is Tryll?")
    print(reply)
# server is stopped automatically when the `with` block exits

For lower-level control, or if you need to manage the server lifetime independently, see Auto-launch the Server.

Expected output (your model's wording will vary):

Tryll is a local small-language-model inference server with C++, Python, and Unreal clients for building on-device AI agents.

send_message blocks until the server's TurnComplete arrives and returns the full reply as a single string. The server still streams tokens internally (you can see this in the server log); per-token UI updates are available in the C++ and Unreal clients via the SendText / OnAnswerText callbacks.

What you built

  • A minimal session with one agent.
  • A one-node graph: Generate → END.
  • A blocking send_message call that returns the full reply.

The graph is deliberately minimal; everything else in Tryll (retrieval, tool calls, guardrails) plugs into exactly this shape by inserting extra nodes before Generate.

Where to go next

Troubleshooting

  • ConnectionRefusedError — the server is not running on the expected host/port, or a firewall blocked it. Re-check Run the Tryll Server.
  • TryllError: 4002 — the model name in your script does not match any entry in models.json. List models with client.list_models().
  • Script hangs after send_message — the model is loading for the first time. Watch the server log; the next turn will be fast.