Skip to content

Tryll

On-device small-language-model inference for games and tools. Tryll is a standalone C++ server that runs GGUF models locally and exposes them to your application through first-party clients for C++, Python, and Unreal Engine 5. Build chat, retrieval-augmented generation, and tool-calling agents — all running on the user's machine, all orchestrated by workflow graphs you compose yourself.

Start here

What Tryll gives you

  • A local model host. One server process, many clients. The server loads models once and shares them across every session that needs them. CPU fallback, Vulkan / CUDA / ROCm optional. Model management →
  • Workflow graphs as first-class objects. Your agents are not prompt templates — they are typed graphs of nodes (Generate, Retrieve, ToolCall, CannedResponse, HumanMessageGuardrail) wired by exit routes. Workflows, Graphs, and Nodes →
  • Streaming, RAG, and tool calls built in. Token-by-token streaming, HNSW vector search via USearch, and detection-only tool calling across four model-family format grammars.
  • Three first-party clients. Thin, idiomatic wrappers over the same FlatBuffers wire protocol. C++ · Python · Unreal

Explore the docs

  • Getting Started — tutorials for your first inference on each platform.
  • How-to Guides — task recipes: RAG assistants, tool calls, streaming to UI, local model setup, guardrails.
  • Concepts — the mental model: architecture, agents, workflows, RAG, projection.
  • Reference — every field, every enum, every error code.

Feature highlights

Status and versioning

Tryll is in pre-release — expect breaking changes until the 1.0 API freeze. Release notes are not published yet. Report problems via the repository issue tracker.