Skip to content

Use Your Own Local Model

Register a GGUF file already on disk so Tryll can load it without a Hugging Face download.

Temporary workflow

Registering a local model currently requires editing data/models.json at server startup. In the future there would be implemented dedicated RegisterModel() client API for handling this.

Prerequisites

  • A .gguf file on the server machine, e.g. C:\models\my-model-q4_k_m.gguf.
  • Write access to the server's data/models.json.
  • The server configured for LlamaCpp — see Connect and Manage a Session.

Steps

  1. Add an entry to data/models.json with a local_path field instead of (or in addition to) Hugging Face source fields. The key fields are name, type, and one variants entry with engine = "LlamaCpp" and local_path:

    {
      "name": "My Local Model",
      "type": "Language",
      "variants": [
        {
          "engine": "LlamaCpp",
          "local_path": "C:/models/my-model-q4_k_m.gguf",
          "kv_cache_type": "q8_0",
          "tool_call_format": "chatml"
        }
      ]
    }
    

    Notes:

    • name is how clients will refer to the model. Choose something short and stable.
    • local_path can be absolute or relative to the server executable's working directory. Forward slashes are fine on Windows.
    • tool_call_format is optional; set it if you plan to use the model with a ToolCall node and know the model was fine-tuned on one of the format families.
    • kv_cache_type is optional. Default is q8_0; use f16 for maximum fidelity or q4_0 to halve KV memory on large contexts.

    For the full schema see Model Management.

  2. Restart the server. models.json is read at startup; Tryll does not currently watch the file. The first server log line after restart should list your model alongside the existing catalog entries.

  3. Verify via ListModels. From a client:

    var (models, error) = await TryllClient.Instance.RequestListModelsAsync();
    if (!error.IsOk) { Debug.LogError(error.Message); return; }
    foreach (var m in models)
        Debug.Log($"{m.Name} {m.Status}");   // expect "My Local Model Local"
    

    Call UTryllSubsystem::ListModels (Blueprint: Tryll|Models → List Models). Bind On List Models Complete to iterate TArray<FTryllModelInfo>.

    auto models = client.ListModels();
    for (const auto& m : models) {
        std::cout << m.name << " "
                  << static_cast<int>(m.status) << "\n";
    }
    
    for m in client.list_models():
        print(m.name, m.status.name)   # expect "My Local Model Local"
    

    Your new entry should appear with ModelStatus.Local, meaning the file is present on disk and can be loaded. If you see Absent, Tryll could not find the file — re-check local_path.

  4. Reference the model from a graph. Pass the name you chose as model_name on any Generate or ToolCall node, or as the agent's default_model_name:

    from tryll_client.graph import GraphDescription, GenerateParams
    
    graph = (
        GraphDescription()
        .add_node("answer", GenerateParams(model_name="My Local Model"))
    )
    # or, more commonly, set it as the graph's default:
    graph.set_default_model_name("My Local Model")
    
  5. (Optional) pin it. Load the model eagerly so the first turn does not pay the load cost:

    client.load_model("My Local Model")
    

    See Pin and Unpin Models for the trade-offs.

Verify it worked

Create an agent using the model and send one message. A successful load shows up server-side as:

[info] Loading model "My Local Model" from C:/models/my-model-q4_k_m.gguf
[info] Model loaded: ctx=4096 vocab=128256 dtype=q8_0

Common pitfalls

  • Absent status. local_path is wrong or the file is not readable by the server process. Try opening it as the same user account that runs the server.
  • Model loads but refuses prompts. Usually means the tokenizer bundled in the GGUF does not match the chat template the model expects. Fix by setting tool_call_format to a matching family (see Tool Calling) or by picking a different quantisation of the same weights.
  • Out-of-memory on load. Either use a smaller quant (e.g. Q4_K_M instead of Q8_0), switch kv_cache_type to q4_0, or use a smaller context window at ConfigureSession time (not yet user-facing — currently fixed at model default).