Skip to content

Use Your Own Local Model

Register a GGUF file already on disk so Tryll can load it without a Hugging Face download.

Temporary workflow

Registering a local model currently requires editing data/models.json at server startup. In the future there would be implemented dedicated RegisterModel() client API for handling this.

Prerequisites

  • A .gguf file on the server machine, e.g. C:\models\my-model-q4_k_m.gguf.
  • Write access to the server's data/models.json.
  • The server configured for LlamaCpp — see Connect and Manage a Session.

Steps

  1. Add an entry to data/models.json with a local_path field instead of (or in addition to) Hugging Face source fields. The key fields are name, type, and one variants entry with engine = "LlamaCpp" and local_path:

    {
      "name": "My Local Model",
      "type": "Language",
      "variants": [
        {
          "engine": "LlamaCpp",
          "local_path": "C:/models/my-model-q4_k_m.gguf",
          "kv_cache_type": "q8_0",
          "tool_call_format": "chatml"
        }
      ]
    }
    

    Notes:

    • name is how clients will refer to the model. Choose something short and stable.
    • local_path can be absolute or relative to the server executable's working directory. Forward slashes are fine on Windows.
    • tool_call_format is optional; set it if you plan to use the model with a ToolCall node and know the model was fine-tuned on one of the format families.
    • kv_cache_type is optional. Default is q8_0; use f16 for maximum fidelity or q4_0 to halve KV memory on large contexts.

    For the full schema see Model Management.

  2. Restart the server. models.json is read at startup; Tryll does not currently watch the file. The first server log line after restart should list your model alongside the existing catalog entries.

  3. Verify via ListModels. From a client:

    for m in client.list_models():
        print(m.name, m.status.name)   # expect "My Local Model Local"
    
    auto models = client.ListModels();
    for (const auto& m : models) {
        std::cout << m.name << " "
                  << static_cast<int>(m.status) << "\n";
    }
    

    Call UTryllSubsystem::ListModels (Blueprint: Tryll|Models → List Models). Bind On List Models Complete to iterate TArray<FTryllModelInfo>.

    Your new entry should appear with ModelStatus.Local, meaning the file is present on disk and can be loaded. If you see Absent, Tryll could not find the file — re-check local_path.

  4. Reference the model from a graph. Pass the name you chose as model_name on any Generate or ToolCall node, or as the agent's default_model_name:

    graph.add_node("answer", NodeType.Generate,
                   {"model_name": "My Local Model"})
    # or, more commonly, set it as the graph's default:
    graph.set_default_model_name("My Local Model")
    
  5. (Optional) pin it. Load the model eagerly so the first turn does not pay the load cost:

    client.load_model("My Local Model")
    

    See Pin and Unpin Models for the trade-offs.

Verify it worked

Create an agent using the model and send one message. A successful load shows up server-side as:

[info] Loading model "My Local Model" from C:/models/my-model-q4_k_m.gguf
[info] Model loaded: ctx=4096 vocab=128256 dtype=q8_0

Common pitfalls

  • Absent status. local_path is wrong or the file is not readable by the server process. Try opening it as the same user account that runs the server.
  • Model loads but refuses prompts. Usually means the tokenizer bundled in the GGUF does not match the chat template the model expects. Fix by setting tool_call_format to a matching family (see Tool Calling) or by picking a different quantisation of the same weights.
  • Out-of-memory on load. Either use a smaller quant (e.g. Q4_K_M instead of Q8_0), switch kv_cache_type to q4_0, or use a smaller context window at ConfigureSession time (not yet user-facing — currently fixed at model default).