Embedded String Storage¶
An embedded string storage is a session-owned, named, vector-indexed container of text chunks. It is the backing store for every Retrieve node in a graph — the building block of RAG in Tryll.
Under the hood, each storage holds:
- A list of records —
{ id, text }pairs, optionally with metadata. - A precomputed embedding per record.
- An HNSW index over those embeddings, tuned for cosine distance.
Retrieval is always by embedding similarity, never by substring or keyword match. For substring / regex / exact-match use cases, see string storage.
Creation paths¶
CreateEmbeddedStringStorageRequest supports two construction paths.
Exactly one is used per request.
| Path | Trigger | What the server does |
|---|---|---|
| A (file-backed) | config_path set |
Reads the config JSON, resolves records_file and optional index_file relative to the config's directory, loads records, loads or builds the HNSW index. |
| B (inline) | strings non-empty |
Embeds every string with embedding_model and builds an in-memory HNSW index. |
embedding_model is required for Path B and, when provided,
overrides the config file for query-time embedding in Path A.
Path A — config_path¶
The config is a UTF-8 JSON file with this shape:
<!-- sample only; not a committed file -->
{
"version": 1,
"embedding_model": "All-MiniLM-L6-v2 (Q4_K_M)",
"records_file": "rimworld.kb.json",
"index_file": "rimworld.kb.usearch"
}
| Field | Type | Required | Description |
|---|---|---|---|
version |
int | No | Schema version. Currently 1. |
embedding_model |
string | Yes | Catalog name of the embedding model. Must exist in models.json with "purpose": "embedding". |
records_file |
string | Yes | Relative path (from the config file's directory) to the records JSON. |
index_file |
string | No | Relative path to a pre-built *.usearch HNSW index. If absent, the server builds the index at load time from the records and their embeddings. |
The records file (*.kb.json by convention) is a JSON array of
objects:
<!-- sample only; not a committed file -->
[
{
"id": "ai_storytellers_1",
"text": "The AI storyteller is the main mechanism used to ...",
"metadata": { "source": "…", "link": "…" }
},
{
"id": "ai_storytellers_2",
"text": "…"
}
]
Per record:
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Stable identifier; returned on retrieval so clients can map back to the source. |
text |
string | Yes | The chunk text, already segmented to a size that fits within the embedding model's context window. |
metadata |
object | No | Arbitrary extra fields. Parsed but currently unused by retrieval. |
If an index_file is present, its dimension must match
embedding_model's output dimension; a mismatch is rejected with error
8003 InvalidEmbeddedStringStorageData.
Path B — inline strings¶
For programmatic small-scale use, the client sends strings directly:
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Unique name within the session. |
strings |
[string] |
Yes | The chunks. The server assigns sequential ids ("0", "1", …). |
embedding_model |
string | Yes | Catalog name of the embedding model. |
The server embeds every string at creation time and builds the index in memory. Nothing is written to disk; the storage is gone once the session ends.
Lifecycle¶
| Step | Frame | Notes |
|---|---|---|
| Create | CreateEmbeddedStringStorageRequest |
Sent before CreateAgentRequest if a Retrieve node in the graph will reference it by name. Server responds with CreateEmbeddedStringStorageResponse carrying record_count and embedding_dim. |
| Use | (referenced by Retrieve node embedded_string_storage param) |
Scoped to the session. |
| Destroy | DestroyEmbeddedStringStorageRequest |
Server responds with Ack. Agents holding the storage keep it alive — the call only drops the session's named reference. |
See Lifetime and Ownership → EmbeddedStringStorage for the full ownership model and the separation between the on-disk index cache and the in-memory index object.
Consumption by Retrieve¶
A Retrieve node names the storage via its
embedded_string_storage param. Per turn, the node:
- Embeds the user's message with the configured
embedding_model. - Queries the storage's HNSW index for the
top_kclosest records by cosine distance. - Drops results whose distance exceeds
threshold. - Emits one knowledge component per surviving record onto the current interaction.
How those components appear in the prompt is
controlled by the downstream Generate node's template (Mustache) and
placement params — see
Use Mustache Templates.
Validation and errors¶
The server validates at creation time:
- Exactly one of
config_path/stringsmust be set. - The named
embedding_modelmust be in the catalog with"purpose": "embedding". - Path A:
records_filemust exist, parse, and be non-empty. - Path A with
index_file: the index must load and its dimension must match the embedding model. - Path B:
stringsmust be non-empty.
| Code | Cause |
|---|---|
8001 |
Name empty, malformed, or reserved. |
8002 |
A storage with this name already exists in the session. |
8003 |
Content rejected: missing / corrupt config or records, empty inline array, unknown embedding model, or index-dimension mismatch. |
Minimum working example¶
# Path B (inline)
client.create_embedded_string_storage(
name="aquarium_kb",
strings=[
"Neon tetras thrive in soft, slightly acidic water.",
"Goldfish are cold-water fish and should not share a tank with tropicals.",
],
embedding_model="All-MiniLM-L6-v2 (Q4_K_M)",
)
# Path A (file-backed) — point the server at the config
client.create_embedded_string_storage(
name="rimworld_kb",
config_path="C:/tryll/data/rag/rimworld.json",
embedding_model="All-MiniLM-L6-v2 (Q4_K_M)",
)
// Path B (inline)
client.CreateEmbeddedStringStorageFromStrings(
"aquarium_kb",
{
"Neon tetras thrive in soft, slightly acidic water.",
"Goldfish are cold-water fish and should not share a tank with tropicals.",
},
"All-MiniLM-L6-v2 (Q4_K_M)");
// Path A (file-backed)
auto info = client.CreateEmbeddedStringStorage(
"rimworld_kb",
"C:/tryll/data/rag/rimworld.json",
/*embeddingModel=*/"All-MiniLM-L6-v2 (Q4_K_M)");
auto* subsystem = GetGameInstance()->GetSubsystem<UTryllSubsystem>();
// Path B (inline) — requires EmbeddingModel.
subsystem->RequestCreateEmbeddedStringStorageFromStrings(
TEXT("aquarium_kb"),
{ TEXT("Neon tetras thrive in soft, slightly acidic water."),
TEXT("Goldfish are cold-water fish and should not share a tank with tropicals.") },
TEXT("All-MiniLM-L6-v2 (Q4_K_M)"));
// Path A (file-backed). EmbeddingModel is optional (overrides config).
subsystem->RequestCreateEmbeddedStringStorage(
TEXT("rimworld_kb"),
TEXT("C:/tryll/data/rag/rimworld.json"),
TEXT("All-MiniLM-L6-v2 (Q4_K_M)"));
// Bind OnCreateEmbeddedStringStorageComplete(Name, RecordCount, bSuccess)
// to continue once the index is built.
See the full workflow in How to create a simple RAG assistant.
Client bindings¶
- C++:
Tryll::TryllClient::CreateEmbeddedStringStorage(file config) /CreateEmbeddedStringStorageFromStrings(inline) —TryllClient.h - Python:
tryll_client.TryllClient.create_embedded_string_storage(pass eitherconfig_path=…orstrings=…) —client.py - Unreal:
UTryllSubsystem::RequestCreateEmbeddedStringStorage—TryllSubsystem.h
Related¶
- Retrieve node
- Concept: RAG
- How to create a simple RAG assistant
- String Storage — non-embedded variant.
- Glossary