Skip to content

Embedded String Storage

An embedded string storage is a session-owned, named, vector-indexed container of text chunks. It is the backing store for every Retrieve node in a graph — the building block of RAG in Tryll.

Under the hood, each storage holds:

  1. A list of records{ id, text } pairs, optionally with metadata.
  2. A precomputed embedding per record.
  3. An HNSW index over those embeddings, tuned for cosine distance.

Retrieval is always by embedding similarity, never by substring or keyword match. For substring / regex / exact-match use cases, see string storage.

Creation paths

CreateEmbeddedStringStorageRequest supports two construction paths. Exactly one is used per request.

Path Trigger What the server does
A (file-backed) config_path set Reads the config JSON, resolves records_file and optional index_file relative to the config's directory, loads records, loads or builds the HNSW index.
B (inline) strings non-empty Embeds every string with embedding_model and builds an in-memory HNSW index.

embedding_model is required for Path B and, when provided, overrides the config file for query-time embedding in Path A.

Path A — config_path

The config is a UTF-8 JSON file with this shape:

<!-- sample only; not a committed file -->
{
    "version": 1,
    "embedding_model": "All-MiniLM-L6-v2 (Q4_K_M)",
    "records_file": "adventure_quests.kb.json",
    "index_file": "adventure_quests.kb.usearch",
    "fields": [
        { "name": "min_level",         "type": "int",         "default": 0      },
        { "name": "character_class",   "type": "string",      "optional": true  },
        { "name": "character_classes", "type": "set<string>", "optional": true  }
    ]
}
Field Type Required Description
version int No Schema version. Currently 1.
embedding_model string Yes Catalog name of the embedding model. Must exist in models.json with "purpose": "embedding".
records_file string Yes Relative path (from the config file's directory) to the records JSON.
index_file string No Relative path to a pre-built *.usearch HNSW index. If absent, the server builds the index at load time from the records and their embeddings.
fields array No Typed metadata schema — field declarations for filterable record metadata. When present, each record's metadata object is validated against it at load time. See Metadata schema.

The records file (*.kb.json by convention) is a JSON array of objects:

<!-- sample only; not a committed file -->
[
    {
        "id":   "ai_storytellers_1",
        "text": "The AI storyteller is the main mechanism used to ...",
        "metadata": { "source": "…", "link": "…" }
    },
    {
        "id":   "ai_storytellers_2",
        "text": "…"
    }
]

Per record:

Field Type Required Description
id string Yes Stable identifier; returned on retrieval so clients can map back to the source.
text string Yes The chunk text, already segmented to a size that fits within the embedding model's context window.
metadata object No Optional typed metadata object. When the config declares a fields schema, each record's metadata is validated and stored in typed slots used by the Retrieve node's filter predicate. Fields not listed in the schema are ignored.

If an index_file is present, its dimension must match embedding_model's output dimension; a mismatch is rejected with error 8003 InvalidEmbeddedStringStorageData.

Metadata schema

When a config includes a fields array, the server validates every record's metadata object against it at load time and stores the values in typed per-field slots. Those slots are what the Retrieve node's filter predicate compares against during vector search — the filter compiler resolves field names to slot indices once at compile time, so no string lookups happen on the hot path.

Field declarations

Each entry in fields declares one filterable field:

Key Required Description
name Yes Field identifier used in records and in filter references ({"knowledge": "<name>"}).
type Yes One of int, float, string, bool, set<string>.
default Exclusive with optional Value applied when a record omits this field. Must be coercible to type.
optional Exclusive with default true — missing field is allowed; filters referencing it on a record where it is absent evaluate to false.
filterable No (default true) If false, the field is loaded but cannot be referenced in filters. Useful for display-only fields such as page_title.

Exactly one of default or optional: true must be present — a field that is neither is a load-time error.

Supported types

Type JSON value in records Notes
int number (integer) Stored as int64.
float number Stored as double. intfloat promotion is allowed in comparisons.
string string
bool true / false
set<string> array of strings Stored sorted and deduplicated. Used as the haystack of an in op.

Missing-field semantics

When a field is declared optional: true and a record omits it, every comparison or in operation that references that field on that record evaluates to false. The field is not treated as "no restriction" — filter authors who want an unrestricted fallback should encode it explicitly with or and a separate boolean gating field.

Example

{
    "version": 1,
    "embedding_model": "All-MiniLM-L6-v2 (Q4_K_M)",
    "records_file": "adventure_quests.kb.json",
    "index_file":   "adventure_quests.kb.usearch",
    "fields": [
        { "name": "topic",             "type": "string",      "default": ""       },
        { "name": "min_level",         "type": "int",         "default": 0        },
        { "name": "tier",              "type": "string",      "default": "common" },
        { "name": "character_class",   "type": "string",      "optional": true    },
        { "name": "character_classes", "type": "set<string>", "optional": true    },
        { "name": "quest_reached",     "type": "string",      "optional": true    }
    ]
}

A record that uses this schema might look like:

{
    "id":   "warrior_combat_intro",
    "text": "The sword-and-shield style favoured by Aldhaven warriors …",
    "metadata": {
        "topic":             "Combat",
        "min_level":         1,
        "character_classes": ["warrior", "paladin"]
    }
}

topic, min_level, and tier use defaults when absent. character_class, character_classes, and quest_reached are optional and will evaluate to false in any filter that references them on a record that omits them.

See Retrieve filter grammar for the full JSON predicate syntax that queries these fields.

Path B — inline strings

For programmatic small-scale use, the client sends strings directly:

Field Type Required Description
name string Yes Unique name within the session.
strings [string] Yes The chunks. The server assigns sequential ids ("0", "1", …).
embedding_model string Yes Catalog name of the embedding model.

The server embeds every string at creation time and builds the index in memory. Nothing is written to disk; the storage is gone once the session ends.

Lifecycle

Step Frame Notes
Create CreateEmbeddedStringStorageRequest Sent before CreateAgentRequest if a Retrieve node in the graph will reference it by name. Server responds with CreateEmbeddedStringStorageResponse carrying record_count and embedding_dim.
Use (referenced by Retrieve node embedded_string_storage param) Scoped to the session.
Destroy DestroyEmbeddedStringStorageRequest Server responds with Ack. Agents holding the storage keep it alive — the call only drops the session's named reference.

See Lifetime and Ownership → EmbeddedStringStorage for the full ownership model and the separation between the on-disk index cache and the in-memory index object.

Consumption by Retrieve

A Retrieve node names the storage via its embedded_string_storage param. Per turn, the node:

  1. Embeds the user's message with the configured embedding_model.
  2. Queries the storage's HNSW index for the top_k closest records by cosine distance. If a filter is set on the node, each candidate record's metadata is evaluated against the compiled predicate during the search; records that do not match are excluded before top_k is applied.
  3. Drops results whose distance exceeds threshold.
  4. Emits one knowledge component per surviving record onto the current interaction.

How those components appear in the prompt is controlled by the downstream Generate node's template (Mustache) and placement params — see Use Mustache Templates.

Validation and errors

The server validates at creation time:

  • Exactly one of config_path / strings must be set.
  • The named embedding_model must be in the catalog with "purpose": "embedding".
  • Path A: records_file must exist, parse, and be non-empty.
  • Path A with index_file: the index must load and its dimension must match the embedding model.
  • Path B: strings must be non-empty.
Code Cause
8001 Name empty, malformed, or reserved.
8002 A storage with this name already exists in the session.
8003 Content rejected: missing / corrupt config or records, empty inline array, unknown embedding model, or index-dimension mismatch.

Minimum working example

# Path B (inline)
client.create_embedded_string_storage(
    name="aquarium_kb",
    strings=[
        "Neon tetras thrive in soft, slightly acidic water.",
        "Goldfish are cold-water fish and should not share a tank with tropicals.",
    ],
    embedding_model="All-MiniLM-L6-v2 (Q4_K_M)",
)

# Path A (file-backed) — point the server at the config
client.create_embedded_string_storage(
    name="rimworld_kb",
    config_path="C:/tryll/data/rag/rimworld.json",
    embedding_model="All-MiniLM-L6-v2 (Q4_K_M)",
)
// Path B (inline)
client.CreateEmbeddedStringStorageFromStrings(
    "aquarium_kb",
    {
        "Neon tetras thrive in soft, slightly acidic water.",
        "Goldfish are cold-water fish and should not share a tank with tropicals.",
    },
    "All-MiniLM-L6-v2 (Q4_K_M)");

// Path A (file-backed)
auto info = client.CreateEmbeddedStringStorage(
    "rimworld_kb",
    "C:/tryll/data/rag/rimworld.json",
    /*embeddingModel=*/"All-MiniLM-L6-v2 (Q4_K_M)");
auto* subsystem = GetGameInstance()->GetSubsystem<UTryllSubsystem>();

// Path B (inline) — requires EmbeddingModel.
subsystem->RequestCreateEmbeddedStringStorageFromStrings(
    TEXT("aquarium_kb"),
    { TEXT("Neon tetras thrive in soft, slightly acidic water."),
      TEXT("Goldfish are cold-water fish and should not share a tank with tropicals.") },
    TEXT("All-MiniLM-L6-v2 (Q4_K_M)"));

// Path A (file-backed). EmbeddingModel is optional (overrides config).
subsystem->RequestCreateEmbeddedStringStorage(
    TEXT("rimworld_kb"),
    TEXT("C:/tryll/data/rag/rimworld.json"),
    TEXT("All-MiniLM-L6-v2 (Q4_K_M)"));

// Bind OnCreateEmbeddedStringStorageComplete(Name, RecordCount, bSuccess)
// to continue once the index is built.

See the full workflow in How to create a simple RAG assistant.

Client bindings

  • C++: Tryll::TryllClient::CreateEmbeddedStringStorage (file config) / CreateEmbeddedStringStorageFromStrings (inline) — TryllClient.h
  • Python: tryll_client.TryllClient.create_embedded_string_storage (pass either config_path=… or strings=…) — client.py
  • Unreal: UTryllSubsystem::RequestCreateEmbeddedStringStorageTryllSubsystem.h