Create a Simple RAG Assistant¶
Prepare a small knowledge base, index it into an
embedded string storage,
build a graph with a Retrieve node in front of a Generate node,
and get answers grounded in your data.
Prerequisites
- A session connected and configured with the
LlamaCppengine. - A language model available (e.g.,
"My Local Model"). - An embedding model available (e.g.,
"All-MiniLM-L6-v2 (Q4_K_M)"). Add one tomodels.jsonwithtype: "Embedding"if you do not have one yet.
Step 1 — prepare the knowledge base¶
CreateEmbeddedStringStorageRequest has two construction paths, both
documented in
Embedded String Storage → Creation paths:
- Path A (file-backed) — the client passes a
config_path; the server reads a JSON config that points at a records file and an optional pre-built.usearchindex. Best for stable corpora: the index is cached on disk and reused across runs. - Path B (inline strings) — the client sends a
strings[]array plus anembedding_modelover the wire; the server embeds and builds the index in memory. Nothing is written to disk; the storage is gone when the session ends. Best for small, ephemeral content generated client-side.
See Lifetime and Ownership for how an embedded storage stays alive across agents and what happens when you destroy one mid-session.
This how-to uses Path A. Two files drive it:
my-docs.kb.json — the records:
[
{
"id": "rule-001",
"text": "Players respawn at the nearest fast-travel beacon 10 seconds after death.",
"metadata": {"category": "gameplay"}
},
{
"id": "rule-002",
"text": "Damage to allies is reduced by 80% but not zero, to prevent griefing while allowing friendly fire cues.",
"metadata": {"category": "gameplay"}
}
]
my-docs.json — the config pointing at it:
{
"version": 1,
"embedding_model": "All-MiniLM-L6-v2 (Q4_K_M)",
"records_file": "my-docs.kb.json",
"index_file": "my-docs.kb.usearch"
}
Put both files alongside the server (say in data/rag/). The first
creation will embed every record and write my-docs.kb.usearch to
disk; subsequent runs skip the embedding pass.
Step 2 — create the embedded string storage¶
Call UTryllSubsystem::RequestCreateEmbeddedStringStorage
(C++) with ConfigPath = "data/rag/my-docs.json". The
completion callback receives {Name, RecordCount, bSuccess}.
This entry point is not BlueprintCallable in the current
plugin; call it from C++ game code.
First call builds the index; later calls with the same config reuse
the on-disk .usearch. See
Embedded String Storage for
the Path B (inline strings) variant.
Step 3 — build the graph¶
flowchart LR
retrieve["Retrieve<br>retrieve"]
gen["Generate<br>answer"]
refuse["CannedResponse<br>refuse"]
retrieve -- "found" --> gen
retrieve -- "not_found" --> refuse
gen -- "default" --> END
refuse -- "default" --> END
The not_found exit routes to a CannedResponse node that emits a
pre-written "I don't know" line. This makes the empty case explicit
in the graph — the model never runs when retrieval returns nothing,
which is faster and gives the client a deterministic response.
First, create a small string storage to back the CannedResponse
node (see Use Canned Responses and Guardrails
for more on canned responses):
Then build the graph. The Generate node receives a Mustache template
that renders the retrieved chunks as a system turn before the user's
question:
from tryll_client import GraphDescription, NodeType
RAG_TEMPLATE = (
"{{#knowledge}}"
"{{name}}:\n"
"{{#chunks}}- {{text}}\n{{/chunks}}"
"\n{{/knowledge}}"
)
graph = (
GraphDescription()
.add_node("retrieve", NodeType.Retrieve, {
"embedded_string_storage": "my_docs",
"embedding_model": "All-MiniLM-L6-v2 (Q4_K_M)",
"top_k": "3",
"threshold": "0.6",
})
.add_node("answer", NodeType.Generate, {
"template": RAG_TEMPLATE,
"placement": "before_user_as_system",
"system_prompt": "Answer using the context above. If the context does not contain the answer, say so.",
})
.add_node("refuse", NodeType.CannedResponse,
{"string_storage": "rag_not_found"})
.wire("retrieve", "found", "answer")
.wire("retrieve", "not_found", "refuse")
.wire("answer", "default", "END")
.wire("refuse", "default", "END")
.set_start_node("retrieve")
.set_default_model_name("My Local Model")
)
agent = client.create_agent(graph)
namespace TC = Tryll::Client;
constexpr std::string_view kRagTemplate =
"{{#knowledge}}"
"{{name}}:\n"
"{{#chunks}}- {{text}}\n{{/chunks}}"
"\n{{/knowledge}}";
TC::GraphDescription graph;
graph.AddNode("retrieve", TC::NodeType::Retrieve, {
{"embedded_string_storage", "my_docs"},
{"embedding_model", "All-MiniLM-L6-v2 (Q4_K_M)"},
{"top_k", "3"},
{"threshold", "0.6"},
})
.AddNode("answer", TC::NodeType::Generate, {
{"template", std::string{kRagTemplate}},
{"placement", "before_user_as_system"},
{"system_prompt", "Answer using the context above. If the context does not contain the answer, say so."},
})
.AddNode("refuse", TC::NodeType::CannedResponse, {
{"string_storage", "rag_not_found"},
})
.Wire("retrieve", "found", "answer")
.Wire("retrieve", "not_found", "refuse")
.Wire("answer", "default", "END")
.Wire("refuse", "default", "END")
.SetStartNode("retrieve")
.SetDefaultModelName("My Local Model");
auto agent = client.CreateAgent(graph);
Step 4 — ask a grounded question¶
Call UTryllAgentComponent::SendMessage with the prompt; bind
On Answer Text to append streaming chunks to your UI widget
and On Turn Complete to flip the "typing" indicator off.
You should get an answer that cites the "80% reduced friendly fire"
rule from record rule-002.
Verify it worked¶
Server log at info should show the found path:
For a query with no relevant records in the corpus, expect the
not_found path instead — the Generate node is skipped and the
client receives one of the rag_not_found lines:
If you see not_found unexpectedly on a query you believe should
hit the corpus, your threshold is too strict or the query is too
unrelated to anything in my-docs.kb.json.
Common pitfalls¶
- Embedding model mismatch. If the wire
embedding_modeldoes not match the one inmy-docs.json, creation fails with error 8002. Either omit the wire value (Path A) or ensure they match. - Stale index. If you edited
my-docs.kb.json, Tryll notices the newer mtime and rebuilds.usearch. Expect a one-time delay. - Answer ignores context. Usually the
placementis wrong for the model, or the template is not clear enough. See Use Mustache Templates for more options. - Top-K too small or threshold too tight. Start with
top_k=3andthreshold=0.6, tune down while watching the retriever log.