Tryll¶
On-device small-language-model inference for games and tools. Tryll is a standalone C++ server that runs GGUF models locally and exposes them to your application through first-party clients for C++, Python, and Unreal Engine 5. Build chat, retrieval-augmented generation, and tool-calling agents — all running on the user's machine, all orchestrated by workflow graphs you compose yourself.
Start here¶
-
Python
Pip-install the client and see streaming tokens in 20 lines.
-
C++
Clone, configure, build, and run a tiny chat demo.
-
Unreal Engine 5
Drop a
UTryllAgentComponenton an actor and print streaming text to the output log.
What Tryll gives you¶
- A local model host. One server process, many clients. The server loads models once and shares them across every session that needs them. CPU fallback, Vulkan / CUDA / ROCm optional. Model management →
- Workflow graphs as first-class objects. Your agents are not
prompt templates — they are typed graphs of
nodes (
Generate,Retrieve,ToolCall,CannedResponse,HumanMessageGuardrail) wired by exit routes. Workflows, Graphs, and Nodes → - Streaming, RAG, and tool calls built in. Token-by-token streaming, HNSW vector search via USearch, and detection-only tool calling across four model-family format grammars.
- Three first-party clients. Thin, idiomatic wrappers over the same FlatBuffers wire protocol. C++ · Python · Unreal
Explore the docs¶
- Getting Started — tutorials for your first inference on each platform.
- How-to Guides — task recipes: RAG assistants, tool calls, streaming to UI, local model setup, guardrails.
- Concepts — the mental model: architecture, agents, workflows, RAG, projection.
- Reference — every field, every enum, every error code.
Feature highlights¶
- Streaming first. Every
Generatenode streams tokens to the client as they arrive. Stream answers to a UI → - Retrieval-augmented generation. Build an embedded vector index from your own data and ground the model in it. Create a simple RAG assistant →
- Tool calling across model families. ChatML, Llama-3, Mistral, and Generic formats, plus client-side notifications. Define and handle tool calls →
- Guardrails and canned responses. Short-circuit jailbreaks before the model runs. Use canned responses and guardrails →
- Explicit model pool. Pin models to keep them hot or let on-demand eviction recycle memory. Pin and unpin models →
Status and versioning¶
Tryll is in pre-release — expect breaking changes until the 1.0 API freeze. Release notes are not published yet. Report problems via the repository issue tracker.