Getting Started¶
In about fifteen minutes you will run the Tryll server, connect a client, and stream a reply from a local language model. Pick the client that matches your project:
-
First inference in Python Fastest path.
pip install tryll-client, a 20-line script, streamed tokens on stdout. -
First inference in C++ Clone the repo, build with CMake, run the
test-chatdemo. -
First inference in Unreal ★ Hero tutorial. Install the plugin, add
UTryllAgentComponentto a sample actor, see streaming text in the Output Log.
Pre-flight checklist¶
Before you start, have these ready:
- A running Tryll server listening on
127.0.0.1:9100. The C++ demo (tryll_test_chat) and the Unreal plugin can start one for you automatically — see Auto-launch the Server. For Python, or to run the server manually, see Run the Tryll Server. - At least one language model available. The server reads its
catalog from
data/models.json. The tutorials pick a small default you can either download viaDownloadModelor register by path — see Use Your Own Local Model. - Disk space. A small quantised chat model (3B-7B parameters) is typically 1.5–5 GB on disk plus similar RAM / VRAM to load.
- A modern Windows machine. Linux and macOS work for the Python client but the pre-built server currently ships for Windows x64; other platforms require building from source.
What you will build¶
Each tutorial ends at the same milestone: the user sends one
message, the server runs it through a minimal graph (just a
Generate node), and the client prints the streamed reply.
Everything else — RAG, tool calls, guardrails — builds on top of
this pattern.
After the tutorial¶
The next step depends on what you want to build. Some good starting points in How-to Guides:
- Ground answers in your own data. Create a Simple RAG Assistant
- Let the model act on the world. Define and Handle Tool Calls
- Filter unwanted prompts. Use Canned Responses and Guardrails
If you want the why before the how, start with Architecture at a Glance.