Skip to content

Agents and Sessions

Everything a client does with Tryll happens inside a session, and every conversation inside that session is an agent. Understanding how these two objects come into being, what they own, and what happens when they go away is the single most useful piece of mechanical knowledge for integrating the server.

Session: one TCP connection, one world

A session is created by the server the instant it accepts a TCP connection from your client. From the client's side, the steps are:

  1. Client::connect(host, port) opens the socket.
  2. The server sends an unsolicited SessionReady frame — that is the moment you can start doing things.
  3. You send ConfigureSessionRequest to pick an inference engine (for example LlamaCpp or Mock). You can change it later; only future CreateAgent calls are affected. You may also set allow_auto_model_downloading=true here for development convenience (see Enable Auto Model Downloading).
  4. Optionally, you create per-session helpers (string storages, embedded string storages) and start creating agents.

Everything in steps 3 and 4 belongs to the session. When the socket closes, the server:

  • cancels every active agent turn,
  • destroys every agent in the session,
  • drops every string storage and embedded storage the session owned,
  • frees any on-demand language models that no other session still references (see Model Management).

A process can hold many sessions; a session holds many agents; and a single server process is shared across all of them. If the socket drops, the client re-connects and re-creates what it needs — there is no server-side persistence for session state.

The implicit session state machine

Sessions do not have an explicit state enum, but they behave like a tiny state machine:

stateDiagram-v2
    direction LR
    [*]         --> Connected:        TCP accept + SessionReady
    Connected   --> Connected:        ConfigureSessionRequest (repeatable)
    Connected   --> AgentActive:      CreateAgentRequest
    AgentActive --> AgentActive:      SendMessage / DownloadModel / Load / …
    AgentActive --> Connected:        last agent destroyed
    Connected   --> Closing:          socket EOF / server shutdown
    AgentActive --> Closing:          socket EOF / server shutdown
    Closing     --> [*]:              reader + writer exit

ConfigureSession is valid whenever the session is not closing. Model management requests are valid in both Connected and AgentActive states — you can warm up a model before creating any agents.

Agent: one conversation, one graph

An agent is a single ongoing conversation. You create one by sending CreateAgentRequest with:

  • a workflow graph (required),
  • a default model name for nodes that do not override it,
  • optional template and placement params on Generate nodes (see Use Mustache Templates) if the graph contains any Retrieve or Instruction node,
  • optional per-agent flags like enable_diagnostics and the max_steps_per_turn budget.

The server responds with CreateAgentResponse once the graph compiles and the default model resolves — after that you can send messages. If allow_auto_model_downloading is set on the session and models are absent from disk, the server downloads them first and streams DownloadProgress frames (keyed to the CreateAgent request_id) before sending CreateAgentResponse.

A session can own as many agents as you want. Each has its own dialog (the growing list of interactions), its own per-node KV caches, and its own sampling parameters. Turns on a single session run one at a time — agent A's turn finishes before agent B's turn starts.

The implicit agent state machine

stateDiagram-v2
    direction LR
    [*]     --> Idle:    CreateAgent compiled the graph
    Idle    --> Running: SendMessage → turn starts
    Running --> Running: nodes execute; inference runs on the shared model
    Running --> Idle:    graph reaches END → TurnComplete(Success)
    Running --> Idle:    cancellation_signal → TurnComplete(Cancelled)
    Running --> Idle:    max_steps exceeded / error → TurnComplete(Error)
    Idle    --> [*]:     DestroyAgent or session close
    Running --> [*]:     session close (turn is cancelled first)

Three facts about this machine matter in practice:

  1. Turns are atomic per agent. A second SendMessage while the previous turn is running is rejected with error 3001 AgentBusy. The in-flight turn is not pre-empted; it runs to completion.
  2. Cancellation is cooperative. Destroying the agent or closing the session signals the turn to stop; it unwinds at the next convenient point and produces TurnComplete(Cancelled).
  3. Step budget, not time budget. The server aborts a turn if the graph walks more than max_steps_per_turn nodes (default 64). This is the escape valve for routing loops, not a generation timeout.

What an agent owns

flowchart TB
    A[Agent] --> D[Dialog<br>list of Interactions]
    A --> G[Graph<br>frozen at CreateAgent]
    A --> N1[Node 1<br>KV cache / params]
    A --> N2[Node 2<br>KV cache / params]
    A --> N3[Node N<br>...]
    G --> Routes[routes table]
    G --> Start[start id]
  • The dialog is the source of truth for the conversation. Every turn appends one interaction: the user's message, the model's reply, and anything the graph attached along the way (retrieved knowledge, recorded tool calls).
  • The graph is frozen for the life of the agent. You cannot swap nodes or routes after CreateAgent — destroy and re-create the agent if you need a different shape.
  • Per-node state — KV caches, sampling defaults, and any node-local caches — belongs to the agent, not the session. Two agents using the same model still have independent KV state.

Lifecycle in one picture

sequenceDiagram
    participant C as Client
    participant S as Session
    participant A as Agent

    C->>S: connect
    S-->>C: SessionReady
    C->>S: ConfigureSessionRequest
    S-->>C: ConfigureSessionResponse
    C->>S: CreateAgentRequest(graph, model, …)
    S->>A: compile graph, resolve model
    S-->>C: CreateAgentResponse(agent_id)
    loop one per user turn
        C->>S: SendMessageRequest(agent_id, text)
        S->>A: dispatch
        A-->>S: AnswerText (stream)
        S-->>C: AnswerText
        A-->>S: TurnComplete(status)
        S-->>C: TurnComplete
    end
    C->>S: DestroyAgentRequest(agent_id)
    S->>A: cancel + destroy
    S-->>C: Ack
    C->>S: disconnect

Edges and pitfalls

  • Agent ids are session-scoped. An agent_id is only meaningful inside the session that created it. If you reconnect, all previous agent ids become invalid.
  • A new ConfigureSession does not change existing agents. Existing agents keep the inference engine they were created with. Only later CreateAgent calls see the new engine.
  • Shared string storages need the storage alive at CreateAgent time. Destroying a string storage after creating an agent that references it is safe — the agent keeps using the data until the agent itself is destroyed. But the reverse is not: you cannot reference a storage that does not yet exist. See Lifetime and Ownership for the full picture of who holds what reference and when each goes away.
  • Multiple agents on one session run serially. The server multiplexes their turns fairly, but on a given session only one turn runs at a time. Open more sessions if you need parallelism.