Glossary¶
Canonical definitions for every term used across the Tryll documentation. Every page links the first mention of a term below to its anchor on this page. Subsequent mentions render as plain text.
Terms are grouped by topic so related concepts sit next to each other.
Sessions, agents, and turns¶
Agent¶
A server-side object that owns one conversation: its
dialog, its compiled workflow, its retained
knowledge, and a handle to the models it needs. Each
agent belongs to exactly one session and is referenced by a
numeric agent_id. Created via CreateAgentRequest, retired via
DestroyAgentRequest.
Session¶
A single configured connection between a client and the Tryll server.
The client opens a TCP socket, sends a ConfigureSessionRequest, and
receives a session_id. All subsequent requests from that socket are
scoped to the session; closing the socket ends the session and destroys
every agent it owns.
Turn¶
One request–response cycle initiated by a user message. A turn begins
when the client sends SendMessageRequest and ends when the server
emits TurnCompleteResponse. Between those two frames the client may
receive zero or more AnswerTextResponse chunks (see
streaming). Only one turn is in flight per agent at a
time.
Dialog¶
The ordered list of interactions belonging to an agent — the conversation as the server sees it. Each turn appends exactly one interaction. Used by the projection stage to build the next prompt.
Interaction¶
A single user-and-assistant exchange within a dialog. Holds the user message, the assistant response, any retrieved knowledge components, and (when diagnostics are on) per-node debug information.
Graphs, nodes, and workflows¶
Workflow¶
The overall per-turn behaviour of an agent — what happens
between receiving the user message and emitting TurnCompleteResponse.
A workflow is defined by a graph of nodes and is
compiled from a GraphDescription when the agent is created.
Graph¶
A declarative description of a workflow: a list of
nodes plus the directed edges between their
exit routes. Built client-side with the graph builder
APIs, validated server-side at agent creation. Compilation failures
surface as error 3003 (see error codes).
Node¶
A single unit of computation within a graph. Each node has a typed parameter struct, one or more named exit routes, and well-defined side effects on the dialog. The six built-in node types are Generate, Instruction, Retrieve, Tool call, Canned response, and Human-message guardrail.
Exit route¶
A named output of a node that the compiled graph
wires to the next node. For example, HumanMessageGuardrailNode has
triggered and not_triggered routes; ToolCallNode has
tool_called and no_tool_called. A graph has exactly one terminal
node whose exits lead to end-of-turn.
Projection¶
The stage that turns the dialog plus retained state into a
prompt for a language model. For Generate
nodes, projection also renders the node's Mustache template
against the current turn's instruction components and
knowledge components, then splices the result
into the prompt at the configured placement. A token budget with
hysteresis ensures the prompt fits inside the model's context window.
See the projection concept page.
Mustache template¶
A Mustache source string attached to a
Generate node via the template param. Before each model call,
projection renders it against a context that includes
{{user_message}}, the {{#instructions}} list from
instruction components, and the
{{#knowledge}} list from knowledge components.
The rendered text is spliced into the prompt at the position
specified by the placement param. See the
Use Mustache Templates how-to.
Knowledge, retrieval, and storage¶
RAG¶
Retrieval-Augmented Generation. An approach that injects topically relevant text snippets into the prompt so the language model can ground its answer on content the model itself does not "know". In Tryll, RAG is implemented by a Retrieve node in front of a Generate node.
Retrieval¶
The act of selecting snippets from a knowledge source that are most similar (by cosine distance) to the current query embedding. Produces knowledge components that the workflow may or may not present to the model.
Retrieve node¶
A node type that runs a vector similarity search over an
embedded string storage and attaches the
top-k matches to the current interaction as
knowledge components. See the
Retrieve reference.
Knowledge base¶
Synonym for "vector index used by a Retrieve node". In Tryll, a knowledge base is realised as an embedded string storage.
Instruction component¶
A named instruction string attached to the current
interaction by an Instruction node.
Projection exposes instruction components to the downstream
Generate node's Mustache template as:
{{instruction_<name>}} (flat string) and entries in the
{{#instructions}} list (each with {{name}} and {{text}}).
Knowledge component¶
A single piece of retrieved or attached text plus metadata
(source, score, chunk id). Multiple components may attach to one
interaction. Projection exposes them to
the Generate node's Mustache template as the
{{#knowledge}} list and {{#knowledge_<source>}} direct-lookup
sections.
Chunk¶
A bounded slice of source text that has been offline-embedded and stored in an embedded string storage. Chunking happens at index build time, not at query time; each retrieval result is one chunk.
Embedding¶
A dense floating-point vector that represents the semantic content of a piece of text. Produced by an embedding model; compared via cosine distance inside an HNSW index.
Embedding model¶
A model whose output is an embedding
rather than generated text. Declared in the model catalog
with "purpose": "embedding". Referenced by name from RetrieveNode
parameters or from an embedded string storage
config file.
HNSW¶
Hierarchical Navigable Small World — the approximate-nearest-neighbour index structure used by Tryll's embedded string storage. Optimised for fast cosine-distance search over large vector sets.
String storage¶
A named, session-owned container of plain strings. Created via
CreateStringStorageRequest (inline array or file path). Consumed by
canned-response and
human-message-guardrail nodes,
which pick or match one string per invocation. Not embedded, not
indexed. See the string storage reference.
Embedded string storage¶
A session-owned vector index built from strings plus a precomputed
HNSW file. Two construction paths: a directory containing a
*.kb.json config plus a *.usearch file (Path A), or an inline
array of strings that the server embeds on demand (Path B). Consumed
by Retrieve nodes. See the
embedded string storage reference.
Canned response¶
A fixed reply selected from a string storage. The canned-response node emits one string from its backing storage (random, first-match, or indexed pick) and terminates the turn without invoking a language model. Useful for scripted answers and fallback paths.
Guardrail¶
A pattern-matching check that gates workflow flow. The
human-message-guardrail node runs
the user message against a string storage of
patterns and routes the turn to match or no_match accordingly.
Common use: short-circuit unsafe or off-topic inputs to a
canned response.
Models and inference¶
Language model¶
A model that takes a text prompt and emits generated text
token-by-token. In the Tryll catalog, language models have
"purpose": "language". See the
models concept page.
SLM¶
Small Language Model — a language model small enough to run on consumer hardware (CPU, integrated GPU, or a single consumer GPU). Tryll is designed around SLMs: on-device inference, no cloud round-trip, and aggressive KV-cache management.
Inference engine¶
The backend runtime that executes a model. Chosen
per-session on ConfigureSessionRequest as a value of the
InferenceEngine enum. LlamaCpp is shipped today; OnnxGenAI,
WindowsML, OpenVino, and TensorRtLlm are enum slots reserved
for future engines. Each model variant targets
exactly one engine.
GGUF¶
The binary file format used by llama.cpp for quantised
language models. Tryll downloads GGUF files from
HuggingFace when the active model variant targets
the llama.cpp inference engine.
Model catalog¶
The server's models.json file — the authoritative list of models
Tryll can download, load, and run. Each catalog entry declares a name,
a purpose (language or embedding), default sampling parameters,
and one or more variants. See the
model management reference.
Model variant¶
One concrete packaging of a catalogued model for one
inference engine. A model named
Qwen2.5-0.5B-Instruct may ship variants for llama.cpp (GGUF) and
for onnxruntime-genai (ONNX). The active variant is determined by
the server's configured default engine or by explicit selection at
LoadModelRequest time.
Variant¶
Short form of model variant.
Pinned retention¶
A model-loading policy that keeps the model resident in memory for the lifetime of the server process. Suitable for the single hot model on a given machine. Opposite of OnDemand retention.
OnDemand retention¶
A model-loading policy that loads the model when an agent needs it and unloads it when no agent still needs it. Trades memory for latency. Opposite of Pinned retention.
Context (KV)¶
The KV cache — the per-model tensor of cached attention keys and values for every token the model has processed this turn. Tryll reuses as much KV state as possible across turns when the prompt prefix is stable; this is what makes conversational inference fast.
Prompts, tokens, and streaming¶
Prompt¶
The concatenated input passed to the language model at generation time: system preamble, projected dialog, attached knowledge, tool descriptions, and the current user message. Produced by projection.
Token¶
The atomic unit the language model reads and emits. Each model has its own tokenizer; token counts for the same text differ between models. Token budgets in projection are measured in tokens of the active model.
Token budget¶
A configurable per-agent cap on how many tokens the projected prompt may use, reserving headroom for the generated response. Exceeding the budget triggers dialog truncation with hysteresis.
Streaming¶
The mode in which the server emits generated text in chunks rather
than waiting for the full response. Each chunk arrives as an
AnswerTextResponse frame; the final chunk carries the
is_final = true flag and is immediately followed by
TurnCompleteResponse. See how to stream answers to a
UI.
Tool calling¶
Tool call¶
A structured instruction emitted by a language model indicating it wants to invoke a named external function. In Tryll, tool calls are detected only — the server parses them from model output and returns them to the client, which is responsible for execution. See the tool calling concept page.
Tool-call format¶
The specific textual convention the language model uses to express a tool call. Tryll supports multiple families (Llama-style, ChatML-style, JSON, XML); the active format is configured on the tool-call node.
Transport¶
Wire protocol¶
The byte-level contract between the Tryll client and server: TCP transport, 4-byte little-endian length prefix, FlatBuffers payload, 1 MiB max frame. See the wire protocol reference.
See also:
- Concept map — narrative explanations of the terms above.
- Error codes — numeric error catalog used across all responses.