Use Your Own Local Model¶
Register a GGUF file already on disk so Tryll can load it without a Hugging Face download.
Temporary workflow
Registering a local model currently requires editing data/models.json at server startup. In the future there would be implemented dedicated RegisterModel() client API for handling this.
Prerequisites
- A
.gguffile on the server machine, e.g.C:\models\my-model-q4_k_m.gguf. - Write access to the server's
data/models.json. - The server configured for
LlamaCpp— see Connect and Manage a Session.
Steps¶
-
Add an entry to
data/models.jsonwith alocal_pathfield instead of (or in addition to) Hugging Face source fields. The key fields arename,type, and onevariantsentry withengine = "LlamaCpp"andlocal_path:{ "name": "My Local Model", "type": "Language", "variants": [ { "engine": "LlamaCpp", "local_path": "C:/models/my-model-q4_k_m.gguf", "kv_cache_type": "q8_0", "tool_call_format": "chatml" } ] }Notes:
nameis how clients will refer to the model. Choose something short and stable.local_pathcan be absolute or relative to the server executable's working directory. Forward slashes are fine on Windows.tool_call_formatis optional; set it if you plan to use the model with a ToolCall node and know the model was fine-tuned on one of the format families.kv_cache_typeis optional. Default isq8_0; usef16for maximum fidelity orq4_0to halve KV memory on large contexts.
For the full schema see Model Management.
-
Restart the server.
models.jsonis read at startup; Tryll does not currently watch the file. The first server log line after restart should list your model alongside the existing catalog entries. -
Verify via
ListModels. From a client:Call
UTryllSubsystem::ListModels(Blueprint: Tryll|Models → List Models). Bind On List Models Complete to iterateTArray<FTryllModelInfo>.Your new entry should appear with
ModelStatus.Local, meaning the file is present on disk and can be loaded. If you seeAbsent, Tryll could not find the file — re-checklocal_path. -
Reference the model from a graph. Pass the
nameyou chose asmodel_nameon anyGenerateorToolCallnode, or as the agent'sdefault_model_name: -
(Optional) pin it. Load the model eagerly so the first turn does not pay the load cost:
See Pin and Unpin Models for the trade-offs.
Verify it worked¶
Create an agent using the model and send one message. A successful load shows up server-side as:
[info] Loading model "My Local Model" from C:/models/my-model-q4_k_m.gguf
[info] Model loaded: ctx=4096 vocab=128256 dtype=q8_0
Common pitfalls¶
Absentstatus.local_pathis wrong or the file is not readable by the server process. Try opening it as the same user account that runs the server.- Model loads but refuses prompts. Usually means the tokenizer
bundled in the GGUF does not match the chat template the model
expects. Fix by setting
tool_call_formatto a matching family (see Tool Calling) or by picking a different quantisation of the same weights. - Out-of-memory on load. Either use a smaller quant (e.g. Q4_K_M
instead of Q8_0), switch
kv_cache_typetoq4_0, or use a smaller context window atConfigureSessiontime (not yet user-facing — currently fixed at model default).