Embedded String Storage¶
An embedded string storage is a session-owned, named, vector-indexed container of text chunks. It is the backing store for every Retrieve node in a graph — the building block of RAG in Tryll.
Under the hood, each storage holds:
- A list of records —
{ id, text }pairs, optionally with metadata. - A precomputed embedding per record.
- An HNSW index over those embeddings, tuned for cosine distance.
Retrieval is always by embedding similarity, never by substring or keyword match. For substring / regex / exact-match use cases, see string storage.
Creation paths¶
CreateEmbeddedStringStorageRequest supports two construction paths.
Exactly one is used per request.
| Path | Trigger | What the server does |
|---|---|---|
| A (file-backed) | config_path set |
Reads the config JSON, resolves records_file and optional index_file relative to the config's directory, loads records, loads or builds the HNSW index. |
| B (inline) | strings non-empty |
Embeds every string with embedding_model and builds an in-memory HNSW index. |
embedding_model is required for Path B and, when provided,
overrides the config file for query-time embedding in Path A.
Path A — config_path¶
The config is a UTF-8 JSON file with this shape:
<!-- sample only; not a committed file -->
{
"version": 1,
"embedding_model": "All-MiniLM-L6-v2 (Q4_K_M)",
"records_file": "adventure_quests.kb.json",
"index_file": "adventure_quests.kb.usearch",
"fields": [
{ "name": "min_level", "type": "int", "default": 0 },
{ "name": "character_class", "type": "string", "optional": true },
{ "name": "character_classes", "type": "set<string>", "optional": true }
]
}
| Field | Type | Required | Description |
|---|---|---|---|
version |
int | No | Schema version. Currently 1. |
embedding_model |
string | Yes | Catalog name of the embedding model. Must exist in models.json with "purpose": "embedding". |
records_file |
string | Yes | Relative path (from the config file's directory) to the records JSON. |
index_file |
string | No | Relative path to a pre-built *.usearch HNSW index. If absent, the server builds the index at load time from the records and their embeddings. |
fields |
array | No | Typed metadata schema — field declarations for filterable record metadata. When present, each record's metadata object is validated against it at load time. See Metadata schema. |
The records file (*.kb.json by convention) is a JSON array of
objects:
<!-- sample only; not a committed file -->
[
{
"id": "ai_storytellers_1",
"text": "The AI storyteller is the main mechanism used to ...",
"metadata": { "source": "…", "link": "…" }
},
{
"id": "ai_storytellers_2",
"text": "…"
}
]
Per record:
| Field | Type | Required | Description |
|---|---|---|---|
id |
string | Yes | Stable identifier; returned on retrieval so clients can map back to the source. |
text |
string | Yes | The chunk text, already segmented to a size that fits within the embedding model's context window. |
metadata |
object | No | Optional typed metadata object. When the config declares a fields schema, each record's metadata is validated and stored in typed slots used by the Retrieve node's filter predicate. Fields not listed in the schema are ignored. |
If an index_file is present, its dimension must match
embedding_model's output dimension; a mismatch is rejected with error
8003 InvalidEmbeddedStringStorageData.
Metadata schema¶
When a config includes a fields array, the server validates every record's
metadata object against it at load time and stores the values in typed
per-field slots. Those slots are what the Retrieve node's
filter predicate compares against
during vector search — the filter compiler resolves field names to slot
indices once at compile time, so no string lookups happen on the hot path.
Field declarations¶
Each entry in fields declares one filterable field:
| Key | Required | Description |
|---|---|---|
name |
Yes | Field identifier used in records and in filter references ({"knowledge": "<name>"}). |
type |
Yes | One of int, float, string, bool, set<string>. |
default |
Exclusive with optional |
Value applied when a record omits this field. Must be coercible to type. |
optional |
Exclusive with default |
true — missing field is allowed; filters referencing it on a record where it is absent evaluate to false. |
filterable |
No (default true) |
If false, the field is loaded but cannot be referenced in filters. Useful for display-only fields such as page_title. |
Exactly one of default or optional: true must be present — a field that
is neither is a load-time error.
Supported types¶
| Type | JSON value in records | Notes |
|---|---|---|
int |
number (integer) | Stored as int64. |
float |
number | Stored as double. int↔float promotion is allowed in comparisons. |
string |
string | |
bool |
true / false |
|
set<string> |
array of strings | Stored sorted and deduplicated. Used as the haystack of an in op. |
Missing-field semantics¶
When a field is declared optional: true and a record omits it, every
comparison or in operation that references that field on that record
evaluates to false. The field is not treated as "no restriction" — filter
authors who want an unrestricted fallback should encode it explicitly with
or and a separate boolean gating field.
Example¶
{
"version": 1,
"embedding_model": "All-MiniLM-L6-v2 (Q4_K_M)",
"records_file": "adventure_quests.kb.json",
"index_file": "adventure_quests.kb.usearch",
"fields": [
{ "name": "topic", "type": "string", "default": "" },
{ "name": "min_level", "type": "int", "default": 0 },
{ "name": "tier", "type": "string", "default": "common" },
{ "name": "character_class", "type": "string", "optional": true },
{ "name": "character_classes", "type": "set<string>", "optional": true },
{ "name": "quest_reached", "type": "string", "optional": true }
]
}
A record that uses this schema might look like:
{
"id": "warrior_combat_intro",
"text": "The sword-and-shield style favoured by Aldhaven warriors …",
"metadata": {
"topic": "Combat",
"min_level": 1,
"character_classes": ["warrior", "paladin"]
}
}
topic, min_level, and tier use defaults when absent. character_class,
character_classes, and quest_reached are optional and will evaluate to
false in any filter that references them on a record that omits them.
See Retrieve filter grammar for the full JSON predicate syntax that queries these fields.
Path B — inline strings¶
For programmatic small-scale use, the client sends strings directly:
| Field | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Unique name within the session. |
strings |
[string] |
Yes | The chunks. The server assigns sequential ids ("0", "1", …). |
embedding_model |
string | Yes | Catalog name of the embedding model. |
The server embeds every string at creation time and builds the index in memory. Nothing is written to disk; the storage is gone once the session ends.
Lifecycle¶
| Step | Frame | Notes |
|---|---|---|
| Create | CreateEmbeddedStringStorageRequest |
Sent before CreateAgentRequest if a Retrieve node in the graph will reference it by name. Server responds with CreateEmbeddedStringStorageResponse carrying record_count and embedding_dim. |
| Use | (referenced by Retrieve node embedded_string_storage param) |
Scoped to the session. |
| Destroy | DestroyEmbeddedStringStorageRequest |
Server responds with Ack. Agents holding the storage keep it alive — the call only drops the session's named reference. |
See Lifetime and Ownership → EmbeddedStringStorage for the full ownership model and the separation between the on-disk index cache and the in-memory index object.
Consumption by Retrieve¶
A Retrieve node names the storage via its
embedded_string_storage param. Per turn, the node:
- Embeds the user's message with the configured
embedding_model. - Queries the storage's HNSW index for the
top_kclosest records by cosine distance. If afilteris set on the node, each candidate record's metadata is evaluated against the compiled predicate during the search; records that do not match are excluded beforetop_kis applied. - Drops results whose distance exceeds
threshold. - Emits one knowledge component per surviving record onto the current interaction.
How those components appear in the prompt is
controlled by the downstream Generate node's template (Mustache) and
placement params — see
Use Mustache Templates.
Validation and errors¶
The server validates at creation time:
- Exactly one of
config_path/stringsmust be set. - The named
embedding_modelmust be in the catalog with"purpose": "embedding". - Path A:
records_filemust exist, parse, and be non-empty. - Path A with
index_file: the index must load and its dimension must match the embedding model. - Path B:
stringsmust be non-empty.
| Code | Cause |
|---|---|
8001 |
Name empty, malformed, or reserved. |
8002 |
A storage with this name already exists in the session. |
8003 |
Content rejected: missing / corrupt config or records, empty inline array, unknown embedding model, or index-dimension mismatch. |
Minimum working example¶
# Path B (inline)
client.create_embedded_string_storage(
name="aquarium_kb",
strings=[
"Neon tetras thrive in soft, slightly acidic water.",
"Goldfish are cold-water fish and should not share a tank with tropicals.",
],
embedding_model="All-MiniLM-L6-v2 (Q4_K_M)",
)
# Path A (file-backed) — point the server at the config
client.create_embedded_string_storage(
name="rimworld_kb",
config_path="C:/tryll/data/rag/rimworld.json",
embedding_model="All-MiniLM-L6-v2 (Q4_K_M)",
)
// Path B (inline)
client.CreateEmbeddedStringStorageFromStrings(
"aquarium_kb",
{
"Neon tetras thrive in soft, slightly acidic water.",
"Goldfish are cold-water fish and should not share a tank with tropicals.",
},
"All-MiniLM-L6-v2 (Q4_K_M)");
// Path A (file-backed)
auto info = client.CreateEmbeddedStringStorage(
"rimworld_kb",
"C:/tryll/data/rag/rimworld.json",
/*embeddingModel=*/"All-MiniLM-L6-v2 (Q4_K_M)");
auto* subsystem = GetGameInstance()->GetSubsystem<UTryllSubsystem>();
// Path B (inline) — requires EmbeddingModel.
subsystem->RequestCreateEmbeddedStringStorageFromStrings(
TEXT("aquarium_kb"),
{ TEXT("Neon tetras thrive in soft, slightly acidic water."),
TEXT("Goldfish are cold-water fish and should not share a tank with tropicals.") },
TEXT("All-MiniLM-L6-v2 (Q4_K_M)"));
// Path A (file-backed). EmbeddingModel is optional (overrides config).
subsystem->RequestCreateEmbeddedStringStorage(
TEXT("rimworld_kb"),
TEXT("C:/tryll/data/rag/rimworld.json"),
TEXT("All-MiniLM-L6-v2 (Q4_K_M)"));
// Bind OnCreateEmbeddedStringStorageComplete(Name, RecordCount, bSuccess)
// to continue once the index is built.
See the full workflow in How to create a simple RAG assistant.
Client bindings¶
- C++:
Tryll::TryllClient::CreateEmbeddedStringStorage(file config) /CreateEmbeddedStringStorageFromStrings(inline) —TryllClient.h - Python:
tryll_client.TryllClient.create_embedded_string_storage(pass eitherconfig_path=…orstrings=…) —client.py - Unreal:
UTryllSubsystem::RequestCreateEmbeddedStringStorage—TryllSubsystem.h
Related¶
- Retrieve node
- Retrieve filter grammar
- Concept: RAG
- How to create a simple RAG assistant
- String Storage — non-embedded variant.
- Glossary