Getting Started¶

In about fifteen minutes you will run the Tryll server, connect a client, and stream a reply from a local language model. Pick the client that matches your project:

First inference in Python Fastest path. pip install tryll-client, a 20-line script, streamed tokens on stdout.
First inference in C++ Clone the repo, build with CMake, run the test-chat demo.
First inference in Unreal ★ Hero tutorial. Install the plugin, add UTryllAgentComponent to a sample actor, see streaming text in the Output Log.

Pre-flight checklist¶

Before you start, have these ready:

A running Tryll server listening on 127.0.0.1:9100. The C++ demo (tryll_test_chat) and the Unreal plugin can start one for you automatically — see Auto-launch the Server. For Python, or to run the server manually, see Run the Tryll Server.
At least one language model available. The server reads its catalog from data/models.json. The tutorials pick a small default you can either download via DownloadModel or register by path — see Use Your Own Local Model.
Disk space. A small quantised chat model (3B-7B parameters) is typically 1.5–5 GB on disk plus similar RAM / VRAM to load.
A modern Windows machine. Linux and macOS work for the Python client but the pre-built server currently ships for Windows x64; other platforms require building from source.

What you will build¶

Each tutorial ends at the same milestone: the user sends one message, the server runs it through a minimal graph (just a Generate node), and the client prints the streamed reply. Everything else — RAG, tool calls, guardrails — builds on top of this pattern.

After the tutorial¶

The next step depends on what you want to build. Some good starting points in How-to Guides:

Ground answers in your own data. Create a Simple RAG Assistant
Let the model act on the world. Define and Handle Tool Calls
Filter unwanted prompts. Use Canned Responses and Guardrails

If you want the why before the how, start with Architecture at a Glance.