First Inference in Python¶
Install the Python client, connect to a running Tryll server, and stream a reply from a local language model — all in about twenty lines of code.
Before you start¶
- The Tryll server is running on
127.0.0.1:9100. If not, see Run the Tryll Server. - At least one language model is listed in the server's
data/models.jsonwith statusLocalorLoaded. You can check with any client, or bring your own — see Use Your Own Local Model. - Python 3.10+.
Step 1 — install the client¶
That installs the tryll_client package with no external Python
dependencies beyond the standard library.
Step 2 — write the script¶
Save this as first-inference.py:
from tryll_client import (
TryllClient, GraphDescription, NodeType, InferenceEngine,
)
# Connect and configure the session.
client = TryllClient.connect("127.0.0.1", 9100)
client.configure_session(InferenceEngine.LlamaCpp)
# Pick a model — use the actual name from your server's models.json.
MODEL = "My Local Model"
# Build the simplest possible graph: one Generate node.
graph = (
GraphDescription()
.add_node("answer", NodeType.Generate)
.wire("answer", "default", "END")
.set_start_node("answer")
.set_default_model_name(MODEL)
)
agent = client.create_agent(graph)
# Ask a question and print the full reply.
reply = agent.send_message("In one sentence: what is Tryll?")
print(reply)
agent.destroy()
client.shutdown()
Step 3 — run it¶
Want the client to start the server for you?
Use TryllClient.run_and_connect — it spawns the server, waits for it to
be ready, and returns a ConnectedSession that shuts everything down
automatically:
from pathlib import Path
from tryll_client import TryllClient, GraphDescription, NodeType, InferenceEngine
with TryllClient.run_and_connect(
exe=Path("path/to/tryll_server.exe"),
port=9100,
) as session:
session.client.configure_session(InferenceEngine.LlamaCpp)
agent = session.client.create_agent(graph)
reply = agent.send_message("In one sentence: what is Tryll?")
print(reply)
# server is stopped automatically when the `with` block exits
For lower-level control, or if you need to manage the server lifetime independently, see Auto-launch the Server.
Expected output (your model's wording will vary):
Tryll is a local small-language-model inference server with C++, Python, and Unreal clients for building on-device AI agents.
send_message blocks until the server's TurnComplete arrives and
returns the full reply as a single string. The server still streams
tokens internally (you can see this in the server log); per-token UI
updates are available in the C++ and Unreal clients via the
SendText / OnAnswerText callbacks.
What you built¶
- A minimal session with one agent.
- A one-node graph:
Generate → END. - A blocking
send_messagecall that returns the full reply.
The graph is deliberately minimal; everything else in Tryll
(retrieval, tool calls, guardrails) plugs into exactly this shape
by inserting extra nodes before Generate.
Where to go next¶
- How-to Guides — ready-made recipes: RAG, tool calls, streaming to UI, guardrails.
- Concepts — the mental model. Start with Architecture at a Glance.
- Python Client API
— surface map of
tryll_client.
Troubleshooting¶
ConnectionRefusedError— the server is not running on the expected host/port, or a firewall blocked it. Re-check Run the Tryll Server.TryllError: 4002— the model name in your script does not match any entry inmodels.json. List models withclient.list_models().- Script hangs after
send_message— the model is loading for the first time. Watch the server log; the next turn will be fast.