Embedded String Storage¶

An embedded string storage is a session-owned, named, vector-indexed container of text chunks. It is the backing store for every Retrieve node in a graph — the building block of RAG in Tryll.

Under the hood, each storage holds:

A list of records — { id, text } pairs, optionally with metadata.
A precomputed embedding per record.
An HNSW index over those embeddings, tuned for cosine distance.

Retrieval is always by embedding similarity, never by substring or keyword match. For substring / regex / exact-match use cases, see string storage.

Creation paths¶

CreateEmbeddedStringStorageRequest supports two construction paths. Exactly one is used per request.

Path	Trigger	What the server does
A (file-backed)	`config_path` set	Reads the config JSON, resolves `records_file` and optional `index_file` relative to the config's directory, loads records, loads or builds the HNSW index.
B (inline)	`strings` non-empty	Embeds every string with `embedding_model` and builds an in-memory HNSW index.

embedding_model is required for Path B and, when provided, overrides the config file for query-time embedding in Path A.

Path A — `config_path`¶

The config is a UTF-8 JSON file with this shape:

<!-- sample only; not a committed file -->
{
    "version": 1,
    "embedding_model": "All-MiniLM-L6-v2 (Q4_K_M)",
    "records_file": "rimworld.kb.json",
    "index_file": "rimworld.kb.usearch"
}

Field	Type	Required	Description
`version`	int	No	Schema version. Currently `1`.
`embedding_model`	string	Yes	Catalog name of the embedding model. Must exist in `models.json` with `"purpose": "embedding"`.
`records_file`	string	Yes	Relative path (from the config file's directory) to the records JSON.
`index_file`	string	No	Relative path to a pre-built `*.usearch` HNSW index. If absent, the server builds the index at load time from the records and their embeddings.

The records file (*.kb.json by convention) is a JSON array of objects:

<!-- sample only; not a committed file -->
[
    {
        "id":   "ai_storytellers_1",
        "text": "The AI storyteller is the main mechanism used to ...",
        "metadata": { "source": "…", "link": "…" }
    },
    {
        "id":   "ai_storytellers_2",
        "text": "…"
    }
]

Per record:

Field	Type	Required	Description
`id`	string	Yes	Stable identifier; returned on retrieval so clients can map back to the source.
`text`	string	Yes	The chunk text, already segmented to a size that fits within the embedding model's context window.
`metadata`	object	No	Arbitrary extra fields. Parsed but currently unused by retrieval.

If an index_file is present, its dimension must match embedding_model's output dimension; a mismatch is rejected with error 8003 InvalidEmbeddedStringStorageData.

Path B — inline `strings`¶

For programmatic small-scale use, the client sends strings directly:

Field	Type	Required	Description
`name`	string	Yes	Unique name within the session.
`strings`	`[string]`	Yes	The chunks. The server assigns sequential `id`s (`"0"`, `"1"`, …).
`embedding_model`	string	Yes	Catalog name of the embedding model.

The server embeds every string at creation time and builds the index in memory. Nothing is written to disk; the storage is gone once the session ends.

Lifecycle¶

Step	Frame	Notes
Create	`CreateEmbeddedStringStorageRequest`	Sent before `CreateAgentRequest` if a Retrieve node in the graph will reference it by name. Server responds with `CreateEmbeddedStringStorageResponse` carrying `record_count` and `embedding_dim`.
Use	(referenced by Retrieve node `embedded_string_storage` param)	Scoped to the session.
Destroy	`DestroyEmbeddedStringStorageRequest`	Server responds with `Ack`. Agents holding the storage keep it alive — the call only drops the session's named reference.

See Lifetime and Ownership → EmbeddedStringStorage for the full ownership model and the separation between the on-disk index cache and the in-memory index object.

Consumption by Retrieve¶

A Retrieve node names the storage via its embedded_string_storage param. Per turn, the node:

Embeds the user's message with the configured embedding_model.
Queries the storage's HNSW index for the top_k closest records by cosine distance.
Drops results whose distance exceeds threshold.
Emits one knowledge component per surviving record onto the current interaction.

How those components appear in the prompt is controlled by the downstream Generate node's template (Mustache) and placement params — see Use Mustache Templates.

Validation and errors¶

The server validates at creation time:

Exactly one of config_path / strings must be set.
The named embedding_model must be in the catalog with "purpose": "embedding".
Path A: records_file must exist, parse, and be non-empty.
Path A with index_file: the index must load and its dimension must match the embedding model.
Path B: strings must be non-empty.

Code	Cause
`8001`	Name empty, malformed, or reserved.
`8002`	A storage with this name already exists in the session.
`8003`	Content rejected: missing / corrupt config or records, empty inline array, unknown embedding model, or index-dimension mismatch.

Minimum working example¶

PythonC++Unreal

# Path B (inline)
client.create_embedded_string_storage(
    name="aquarium_kb",
    strings=[
        "Neon tetras thrive in soft, slightly acidic water.",
        "Goldfish are cold-water fish and should not share a tank with tropicals.",
    ],
    embedding_model="All-MiniLM-L6-v2 (Q4_K_M)",
)

# Path A (file-backed) — point the server at the config
client.create_embedded_string_storage(
    name="rimworld_kb",
    config_path="C:/tryll/data/rag/rimworld.json",
    embedding_model="All-MiniLM-L6-v2 (Q4_K_M)",
)

// Path B (inline)
client.CreateEmbeddedStringStorageFromStrings(
    "aquarium_kb",
    {
        "Neon tetras thrive in soft, slightly acidic water.",
        "Goldfish are cold-water fish and should not share a tank with tropicals.",
    },
    "All-MiniLM-L6-v2 (Q4_K_M)");

// Path A (file-backed)
auto info = client.CreateEmbeddedStringStorage(
    "rimworld_kb",
    "C:/tryll/data/rag/rimworld.json",
    /*embeddingModel=*/"All-MiniLM-L6-v2 (Q4_K_M)");

auto* subsystem = GetGameInstance()->GetSubsystem<UTryllSubsystem>();

// Path B (inline) — requires EmbeddingModel.
subsystem->RequestCreateEmbeddedStringStorageFromStrings(
    TEXT("aquarium_kb"),
    { TEXT("Neon tetras thrive in soft, slightly acidic water."),
      TEXT("Goldfish are cold-water fish and should not share a tank with tropicals.") },
    TEXT("All-MiniLM-L6-v2 (Q4_K_M)"));

// Path A (file-backed). EmbeddingModel is optional (overrides config).
subsystem->RequestCreateEmbeddedStringStorage(
    TEXT("rimworld_kb"),
    TEXT("C:/tryll/data/rag/rimworld.json"),
    TEXT("All-MiniLM-L6-v2 (Q4_K_M)"));

// Bind OnCreateEmbeddedStringStorageComplete(Name, RecordCount, bSuccess)
// to continue once the index is built.

See the full workflow in How to create a simple RAG assistant.

Client bindings¶

C++: Tryll::TryllClient::CreateEmbeddedStringStorage (file config) / CreateEmbeddedStringStorageFromStrings (inline) — TryllClient.h
Python: tryll_client.TryllClient.create_embedded_string_storage (pass either config_path=… or strings=…) — client.py
Unreal: UTryllSubsystem::RequestCreateEmbeddedStringStorage — TryllSubsystem.h