Skip to content

Glossary

Canonical definitions for every term used across the Tryll documentation. Every page links the first mention of a term below to its anchor on this page. Subsequent mentions render as plain text.

Terms are grouped by topic so related concepts sit next to each other.


Sessions, agents, and turns

Agent

A server-side object that owns one conversation: its dialog, its compiled workflow, its retained knowledge, and a handle to the models it needs. Each agent belongs to exactly one session and is referenced by a numeric agent_id. Created via CreateAgentRequest, retired via DestroyAgentRequest.

Session

A single configured connection between a client and the Tryll server. The client opens a TCP socket, sends a ConfigureSessionRequest, and receives a session_id. All subsequent requests from that socket are scoped to the session; closing the socket ends the session and destroys every agent it owns.

Turn

One request–response cycle initiated by a user message. A turn begins when the client sends SendMessageRequest and ends when the server emits TurnCompleteResponse. Between those two frames the client may receive zero or more AnswerTextResponse chunks (see streaming). Only one turn is in flight per agent at a time.

Dialog

The ordered list of interactions belonging to an agent — the conversation as the server sees it. Each turn appends exactly one interaction. Used by the projection stage to build the next prompt.

Interaction

A single user-and-assistant exchange within a dialog. Holds the user message, the assistant response, any retrieved knowledge components, and (when diagnostics are on) per-node debug information.


Graphs, nodes, and workflows

Workflow

The overall per-turn behaviour of an agent — what happens between receiving the user message and emitting TurnCompleteResponse. A workflow is defined by a graph of nodes and is compiled from a GraphDescription when the agent is created.

Graph

A declarative description of a workflow: a list of nodes plus the directed edges between their exit routes. Built client-side with the graph builder APIs, validated server-side at agent creation. Compilation failures surface as error 3003 (see error codes).

Node

A single unit of computation within a graph. Each node has a typed parameter struct, one or more named exit routes, and well-defined side effects on the dialog. The six built-in node types are Generate, Instruction, Retrieve, Tool call, Canned response, and Human-message guardrail.

Exit route

A named output of a node that the compiled graph wires to the next node. For example, HumanMessageGuardrailNode has triggered and not_triggered routes; ToolCallNode has tool_called and no_tool_called. A graph has exactly one terminal node whose exits lead to end-of-turn.

Projection

The stage that turns the dialog plus retained state into a prompt for a language model. For Generate nodes, projection also renders the node's Mustache template against the current turn's instruction components and knowledge components, then splices the result into the prompt at the configured placement. A token budget with hysteresis ensures the prompt fits inside the model's context window. See the projection concept page.

Mustache template

A Mustache source string attached to a Generate node via the template param. Before each model call, projection renders it against a context that includes {{user_message}}, the {{#instructions}} list from instruction components, and the {{#knowledge}} list from knowledge components. The rendered text is spliced into the prompt at the position specified by the placement param. See the Use Mustache Templates how-to.


Knowledge, retrieval, and storage

RAG

Retrieval-Augmented Generation. An approach that injects topically relevant text snippets into the prompt so the language model can ground its answer on content the model itself does not "know". In Tryll, RAG is implemented by a Retrieve node in front of a Generate node.

Retrieval

The act of selecting snippets from a knowledge source that are most similar (by cosine distance) to the current query embedding. Produces knowledge components that the workflow may or may not present to the model.

Retrieve node

A node type that runs a vector similarity search over an embedded string storage and attaches the top-k matches to the current interaction as knowledge components. See the Retrieve reference.

Knowledge base

Synonym for "vector index used by a Retrieve node". In Tryll, a knowledge base is realised as an embedded string storage.

Instruction component

A named instruction string attached to the current interaction by an Instruction node. Projection exposes instruction components to the downstream Generate node's Mustache template as: {{instruction_<name>}} (flat string) and entries in the {{#instructions}} list (each with {{name}} and {{text}}).

Knowledge component

A single piece of retrieved or attached text plus metadata (source, score, chunk id). Multiple components may attach to one interaction. Projection exposes them to the Generate node's Mustache template as the {{#knowledge}} list and {{#knowledge_<source>}} direct-lookup sections.

Chunk

A bounded slice of source text that has been offline-embedded and stored in an embedded string storage. Chunking happens at index build time, not at query time; each retrieval result is one chunk.

Embedding

A dense floating-point vector that represents the semantic content of a piece of text. Produced by an embedding model; compared via cosine distance inside an HNSW index.

Embedding model

A model whose output is an embedding rather than generated text. Declared in the model catalog with "purpose": "embedding". Referenced by name from RetrieveNode parameters or from an embedded string storage config file.

HNSW

Hierarchical Navigable Small World — the approximate-nearest-neighbour index structure used by Tryll's embedded string storage. Optimised for fast cosine-distance search over large vector sets.

String storage

A named, session-owned container of plain strings. Created via CreateStringStorageRequest (inline array or file path). Consumed by canned-response and human-message-guardrail nodes, which pick or match one string per invocation. Not embedded, not indexed. See the string storage reference.

Embedded string storage

A session-owned vector index built from strings plus a precomputed HNSW file. Two construction paths: a directory containing a *.kb.json config plus a *.usearch file (Path A), or an inline array of strings that the server embeds on demand (Path B). Consumed by Retrieve nodes. See the embedded string storage reference.

Canned response

A fixed reply selected from a string storage. The canned-response node emits one string from its backing storage (random, first-match, or indexed pick) and terminates the turn without invoking a language model. Useful for scripted answers and fallback paths.

Guardrail

A pattern-matching check that gates workflow flow. The human-message-guardrail node runs the user message against a string storage of patterns and routes the turn to match or no_match accordingly. Common use: short-circuit unsafe or off-topic inputs to a canned response.


Models and inference

Language model

A model that takes a text prompt and emits generated text token-by-token. In the Tryll catalog, language models have "purpose": "language". See the models concept page.

SLM

Small Language Model — a language model small enough to run on consumer hardware (CPU, integrated GPU, or a single consumer GPU). Tryll is designed around SLMs: on-device inference, no cloud round-trip, and aggressive KV-cache management.

Inference engine

The backend runtime that executes a model. Chosen per-session on ConfigureSessionRequest as a value of the InferenceEngine enum. LlamaCpp is shipped today; OnnxGenAI, WindowsML, OpenVino, and TensorRtLlm are enum slots reserved for future engines. Each model variant targets exactly one engine.

GGUF

The binary file format used by llama.cpp for quantised language models. Tryll downloads GGUF files from HuggingFace when the active model variant targets the llama.cpp inference engine.

Model catalog

The server's models.json file — the authoritative list of models Tryll can download, load, and run. Each catalog entry declares a name, a purpose (language or embedding), default sampling parameters, and one or more variants. See the model management reference.

Model variant

One concrete packaging of a catalogued model for one inference engine. A model named Qwen2.5-0.5B-Instruct may ship variants for llama.cpp (GGUF) and for onnxruntime-genai (ONNX). The active variant is determined by the server's configured default engine or by explicit selection at LoadModelRequest time.

Variant

Short form of model variant.

Pinned retention

A model-loading policy that keeps the model resident in memory for the lifetime of the server process. Suitable for the single hot model on a given machine. Opposite of OnDemand retention.

OnDemand retention

A model-loading policy that loads the model when an agent needs it and unloads it when no agent still needs it. Trades memory for latency. Opposite of Pinned retention.

Context (KV)

The KV cache — the per-model tensor of cached attention keys and values for every token the model has processed this turn. Tryll reuses as much KV state as possible across turns when the prompt prefix is stable; this is what makes conversational inference fast.


Prompts, tokens, and streaming

Prompt

The concatenated input passed to the language model at generation time: system preamble, projected dialog, attached knowledge, tool descriptions, and the current user message. Produced by projection.

Token

The atomic unit the language model reads and emits. Each model has its own tokenizer; token counts for the same text differ between models. Token budgets in projection are measured in tokens of the active model.

Token budget

A configurable per-agent cap on how many tokens the projected prompt may use, reserving headroom for the generated response. Exceeding the budget triggers dialog truncation with hysteresis.

Streaming

The mode in which the server emits generated text in chunks rather than waiting for the full response. Each chunk arrives as an AnswerTextResponse frame; the final chunk carries the is_final = true flag and is immediately followed by TurnCompleteResponse. See how to stream answers to a UI.


Tool calling

Tool call

A structured instruction emitted by a language model indicating it wants to invoke a named external function. In Tryll, tool calls are detected only — the server parses them from model output and returns them to the client, which is responsible for execution. See the tool calling concept page.

Tool-call format

The specific textual convention the language model uses to express a tool call. Tryll supports multiple families (Llama-style, ChatML-style, JSON, XML); the active format is configured on the tool-call node.


Transport

Wire protocol

The byte-level contract between the Tryll client and server: TCP transport, 4-byte little-endian length prefix, FlatBuffers payload, 1 MiB max frame. See the wire protocol reference.


See also:

  • Concept map — narrative explanations of the terms above.
  • Error codes — numeric error catalog used across all responses.