Create a Simple RAG Assistant¶

Prepare a small knowledge base, index it into an embedded string storage, build a graph with a Retrieve node in front of a Generate node, and get answers grounded in your data.

Prerequisites

A session connected and configured with the LlamaCpp engine.
A language model available (e.g., "My Local Model").
An embedding model available (e.g., "All-MiniLM-L6-v2 (Q4_K_M)"). Add one to models.json with type: "Embedding" if you do not have one yet.

Step 1 — prepare the knowledge base¶

CreateEmbeddedStringStorageRequest has two construction paths, both documented in Embedded String Storage → Creation paths:

Path A (file-backed) — the client passes a config_path; the server reads a JSON config that points at a records file and an optional pre-built .usearch index. Best for stable corpora: the index is cached on disk and reused across runs.
Path B (inline strings) — the client sends a strings[] array plus an embedding_model over the wire; the server embeds and builds the index in memory. Nothing is written to disk; the storage is gone when the session ends. Best for small, ephemeral content generated client-side.

See Lifetime and Ownership for how an embedded storage stays alive across agents and what happens when you destroy one mid-session.

This how-to uses Path A. Two files drive it:

my-docs.kb.json — the records:

[
  {
    "id": "rule-001",
    "text": "Players respawn at the nearest fast-travel beacon 10 seconds after death.",
    "metadata": {"category": "gameplay"}
  },
  {
    "id": "rule-002",
    "text": "Damage to allies is reduced by 80% but not zero, to prevent griefing while allowing friendly fire cues.",
    "metadata": {"category": "gameplay"}
  }
]

my-docs.json — the config pointing at it:

{
  "version": 1,
  "embedding_model": "All-MiniLM-L6-v2 (Q4_K_M)",
  "records_file": "my-docs.kb.json",
  "index_file":   "my-docs.kb.usearch"
}

Put both files alongside the server (say in data/rag/). The first creation will embed every record and write my-docs.kb.usearch to disk; subsequent runs skip the embedding pass.

Step 2 — create the embedded string storage¶

PythonC++Unreal

info = client.create_embedded_string_storage(
    name="my_docs",
    config_path="data/rag/my-docs.json",
)
print(f"{info.record_count} records, {info.embedding_dim} dims")

auto info = client.CreateEmbeddedStringStorage(
    "my_docs",
    "data/rag/my-docs.json",
    /*embeddingModel=*/"",                  // use config's value
    std::chrono::minutes{10});
std::cout << info.recordCount << " records, "
          << info.embeddingDim << " dims\n";

Call UTryllSubsystem::RequestCreateEmbeddedStringStorage (C++) with ConfigPath = "data/rag/my-docs.json". The completion callback receives {Name, RecordCount, bSuccess}. This entry point is not BlueprintCallable in the current plugin; call it from C++ game code.

First call builds the index; later calls with the same config reuse the on-disk .usearch. See Embedded String Storage for the Path B (inline strings) variant.

Step 3 — build the graph¶

flowchart LR
    retrieve["Retrieve<br>retrieve"]
    gen["Generate<br>answer"]
    refuse["CannedResponse<br>refuse"]
    retrieve -- "found"     --> gen
    retrieve -- "not_found" --> refuse
    gen    -- "default" --> END
    refuse -- "default" --> END

The not_found exit routes to a CannedResponse node that emits a pre-written "I don't know" line. This makes the empty case explicit in the graph — the model never runs when retrieval returns nothing, which is faster and gives the client a deterministic response.

First, create a small string storage to back the CannedResponse node (see Use Canned Responses and Guardrails for more on canned responses):

PythonC++

client.create_string_storage(
    name="rag_not_found",
    strings=[
        "I don't have information on that in my knowledge base.",
        "I couldn't find anything about that in the docs I have.",
    ],
)

client.CreateStringStorage("rag_not_found", {
    "I don't have information on that in my knowledge base.",
    "I couldn't find anything about that in the docs I have.",
});

Then build the graph. The Generate node receives a Mustache template that renders the retrieved chunks as a system turn before the user's question:

PythonC++

from tryll_client import GraphDescription, NodeType

RAG_TEMPLATE = (
    "{{#knowledge}}"
    "{{name}}:\n"
    "{{#chunks}}- {{text}}\n{{/chunks}}"
    "\n{{/knowledge}}"
)

graph = (
    GraphDescription()
    .add_node("retrieve", NodeType.Retrieve, {
        "embedded_string_storage": "my_docs",
        "embedding_model":         "All-MiniLM-L6-v2 (Q4_K_M)",
        "top_k":                   "3",
        "threshold":               "0.6",
    })
    .add_node("answer", NodeType.Generate, {
        "template":     RAG_TEMPLATE,
        "placement":    "before_user_as_system",
        "system_prompt": "Answer using the context above. If the context does not contain the answer, say so.",
    })
    .add_node("refuse", NodeType.CannedResponse,
              {"string_storage": "rag_not_found"})
    .wire("retrieve", "found",     "answer")
    .wire("retrieve", "not_found", "refuse")
    .wire("answer",   "default",   "END")
    .wire("refuse",   "default",   "END")
    .set_start_node("retrieve")
    .set_default_model_name("My Local Model")
)

agent = client.create_agent(graph)

namespace TC = Tryll::Client;

constexpr std::string_view kRagTemplate =
    "{{#knowledge}}"
    "{{name}}:\n"
    "{{#chunks}}- {{text}}\n{{/chunks}}"
    "\n{{/knowledge}}";

TC::GraphDescription graph;
graph.AddNode("retrieve", TC::NodeType::Retrieve, {
        {"embedded_string_storage", "my_docs"},
        {"embedding_model",         "All-MiniLM-L6-v2 (Q4_K_M)"},
        {"top_k",                   "3"},
        {"threshold",               "0.6"},
     })
     .AddNode("answer", TC::NodeType::Generate, {
         {"template",      std::string{kRagTemplate}},
         {"placement",     "before_user_as_system"},
         {"system_prompt", "Answer using the context above. If the context does not contain the answer, say so."},
     })
     .AddNode("refuse", TC::NodeType::CannedResponse, {
         {"string_storage", "rag_not_found"},
     })
     .Wire("retrieve", "found",     "answer")
     .Wire("retrieve", "not_found", "refuse")
     .Wire("answer",   "default",   "END")
     .Wire("refuse",   "default",   "END")
     .SetStartNode("retrieve")
     .SetDefaultModelName("My Local Model");

auto agent = client.CreateAgent(graph);

Step 4 — ask a grounded question¶

PythonC++Unreal

reply = agent.send_message(
    "What happens when players damage their teammates?"
)
print(reply)

agent.SendText("What happens when players damage their teammates?",
    [](std::string_view text, bool /*isDelta*/, bool /*isFinal*/)
    { std::cout << text << std::flush; });

Call UTryllAgentComponent::SendMessage with the prompt; bind On Answer Text to append streaming chunks to your UI widget and On Turn Complete to flip the "typing" indicator off.

You should get an answer that cites the "80% reduced friendly fire" rule from record rule-002.

Verify it worked¶

Server log at info should show the found path:

[info] Node retrieve: found k=3 top_distance=0.18
[info] Node answer: default

For a query with no relevant records in the corpus, expect the not_found path instead — the Generate node is skipped and the client receives one of the rag_not_found lines:

[info] Node retrieve: not_found
[info] Node refuse: default

If you see not_found unexpectedly on a query you believe should hit the corpus, your threshold is too strict or the query is too unrelated to anything in my-docs.kb.json.

Common pitfalls¶

Embedding model mismatch. If the wire embedding_model does not match the one in my-docs.json, creation fails with error 8002. Either omit the wire value (Path A) or ensure they match.
Stale index. If you edited my-docs.kb.json, Tryll notices the newer mtime and rebuilds .usearch. Expect a one-time delay.
Answer ignores context. Usually the placement is wrong for the model, or the template is not clear enough. See Use Mustache Templates for more options.
Top-K too small or threshold too tight. Start with top_k=3 and threshold=0.6, tune down while watching the retriever log.