Skip to content

Embedded String Storage

An embedded string storage is a session-owned, named, vector-indexed container of text chunks. It is the backing store for every Retrieve node in a graph — the building block of RAG in Tryll.

Under the hood, each storage holds:

  1. A list of records{ id, text } pairs, optionally with metadata.
  2. A precomputed embedding per record.
  3. An HNSW index over those embeddings, tuned for cosine distance.

Retrieval is always by embedding similarity, never by substring or keyword match. For substring / regex / exact-match use cases, see string storage.

Creation paths

CreateEmbeddedStringStorageRequest supports two construction paths. Exactly one is used per request.

Path Trigger What the server does
A (file-backed) config_path set Reads the config JSON, resolves records_file and optional index_file relative to the config's directory, loads records, loads or builds the HNSW index.
B (inline) strings non-empty Embeds every string with embedding_model and builds an in-memory HNSW index.

embedding_model is required for Path B and, when provided, overrides the config file for query-time embedding in Path A.

Path A — config_path

The config is a UTF-8 JSON file with this shape:

<!-- sample only; not a committed file -->
{
    "version": 1,
    "embedding_model": "All-MiniLM-L6-v2 (Q4_K_M)",
    "records_file": "rimworld.kb.json",
    "index_file": "rimworld.kb.usearch"
}
Field Type Required Description
version int No Schema version. Currently 1.
embedding_model string Yes Catalog name of the embedding model. Must exist in models.json with "purpose": "embedding".
records_file string Yes Relative path (from the config file's directory) to the records JSON.
index_file string No Relative path to a pre-built *.usearch HNSW index. If absent, the server builds the index at load time from the records and their embeddings.

The records file (*.kb.json by convention) is a JSON array of objects:

<!-- sample only; not a committed file -->
[
    {
        "id":   "ai_storytellers_1",
        "text": "The AI storyteller is the main mechanism used to ...",
        "metadata": { "source": "…", "link": "…" }
    },
    {
        "id":   "ai_storytellers_2",
        "text": "…"
    }
]

Per record:

Field Type Required Description
id string Yes Stable identifier; returned on retrieval so clients can map back to the source.
text string Yes The chunk text, already segmented to a size that fits within the embedding model's context window.
metadata object No Arbitrary extra fields. Parsed but currently unused by retrieval.

If an index_file is present, its dimension must match embedding_model's output dimension; a mismatch is rejected with error 8003 InvalidEmbeddedStringStorageData.

Path B — inline strings

For programmatic small-scale use, the client sends strings directly:

Field Type Required Description
name string Yes Unique name within the session.
strings [string] Yes The chunks. The server assigns sequential ids ("0", "1", …).
embedding_model string Yes Catalog name of the embedding model.

The server embeds every string at creation time and builds the index in memory. Nothing is written to disk; the storage is gone once the session ends.

Lifecycle

Step Frame Notes
Create CreateEmbeddedStringStorageRequest Sent before CreateAgentRequest if a Retrieve node in the graph will reference it by name. Server responds with CreateEmbeddedStringStorageResponse carrying record_count and embedding_dim.
Use (referenced by Retrieve node embedded_string_storage param) Scoped to the session.
Destroy DestroyEmbeddedStringStorageRequest Server responds with Ack. Agents holding the storage keep it alive — the call only drops the session's named reference.

See Lifetime and Ownership → EmbeddedStringStorage for the full ownership model and the separation between the on-disk index cache and the in-memory index object.

Consumption by Retrieve

A Retrieve node names the storage via its embedded_string_storage param. Per turn, the node:

  1. Embeds the user's message with the configured embedding_model.
  2. Queries the storage's HNSW index for the top_k closest records by cosine distance.
  3. Drops results whose distance exceeds threshold.
  4. Emits one knowledge component per surviving record onto the current interaction.

How those components appear in the prompt is controlled by the downstream Generate node's template (Mustache) and placement params — see Use Mustache Templates.

Validation and errors

The server validates at creation time:

  • Exactly one of config_path / strings must be set.
  • The named embedding_model must be in the catalog with "purpose": "embedding".
  • Path A: records_file must exist, parse, and be non-empty.
  • Path A with index_file: the index must load and its dimension must match the embedding model.
  • Path B: strings must be non-empty.
Code Cause
8001 Name empty, malformed, or reserved.
8002 A storage with this name already exists in the session.
8003 Content rejected: missing / corrupt config or records, empty inline array, unknown embedding model, or index-dimension mismatch.

Minimum working example

# Path B (inline)
client.create_embedded_string_storage(
    name="aquarium_kb",
    strings=[
        "Neon tetras thrive in soft, slightly acidic water.",
        "Goldfish are cold-water fish and should not share a tank with tropicals.",
    ],
    embedding_model="All-MiniLM-L6-v2 (Q4_K_M)",
)

# Path A (file-backed) — point the server at the config
client.create_embedded_string_storage(
    name="rimworld_kb",
    config_path="C:/tryll/data/rag/rimworld.json",
    embedding_model="All-MiniLM-L6-v2 (Q4_K_M)",
)
// Path B (inline)
client.CreateEmbeddedStringStorageFromStrings(
    "aquarium_kb",
    {
        "Neon tetras thrive in soft, slightly acidic water.",
        "Goldfish are cold-water fish and should not share a tank with tropicals.",
    },
    "All-MiniLM-L6-v2 (Q4_K_M)");

// Path A (file-backed)
auto info = client.CreateEmbeddedStringStorage(
    "rimworld_kb",
    "C:/tryll/data/rag/rimworld.json",
    /*embeddingModel=*/"All-MiniLM-L6-v2 (Q4_K_M)");
auto* subsystem = GetGameInstance()->GetSubsystem<UTryllSubsystem>();

// Path B (inline) — requires EmbeddingModel.
subsystem->RequestCreateEmbeddedStringStorageFromStrings(
    TEXT("aquarium_kb"),
    { TEXT("Neon tetras thrive in soft, slightly acidic water."),
      TEXT("Goldfish are cold-water fish and should not share a tank with tropicals.") },
    TEXT("All-MiniLM-L6-v2 (Q4_K_M)"));

// Path A (file-backed). EmbeddingModel is optional (overrides config).
subsystem->RequestCreateEmbeddedStringStorage(
    TEXT("rimworld_kb"),
    TEXT("C:/tryll/data/rag/rimworld.json"),
    TEXT("All-MiniLM-L6-v2 (Q4_K_M)"));

// Bind OnCreateEmbeddedStringStorageComplete(Name, RecordCount, bSuccess)
// to continue once the index is built.

See the full workflow in How to create a simple RAG assistant.

Client bindings

  • C++: Tryll::TryllClient::CreateEmbeddedStringStorage (file config) / CreateEmbeddedStringStorageFromStrings (inline) — TryllClient.h
  • Python: tryll_client.TryllClient.create_embedded_string_storage (pass either config_path=… or strings=…) — client.py
  • Unreal: UTryllSubsystem::RequestCreateEmbeddedStringStorageTryllSubsystem.h