Skip to content

Stream Answers to a UI

Pipe each AnswerText chunk from the server into your application's view layer as it arrives — the standard "typewriter" chat experience.

Prerequisites

Tryll streams one AnswerText frame per token chunk. The last frame in a turn has is_final = true; a final TurnComplete frame carries the turn's outcome (success / error / cancelled) and the total token count.

Steps

Python does not expose per-chunk streaming today.

TryllClient.send_message blocks until the server sends TurnComplete and returns the complete reply text. Streaming chunks arrive over the wire and are accumulated inside the client, but there is no user callback for each chunk. Use C++ or Unreal when you need a per-token UI.

# Blocks, accumulates all AnswerText chunks, returns the full reply.
reply = agent.send_message("Tell me about the architecture.")
your_chat_widget.set_current_reply(reply)

# Diagnostics about the last turn:
print(agent.last_tokens_generated,
      agent.last_answer_chunk_count,
      agent.last_ttft_s)

AgentProxy::SendText takes the per-chunk callback directly; it fires on the client library's worker thread for each streamed chunk and once more with isFinal = true on the last chunk.

std::string buffer;

agent.SendText("Tell me about the architecture.",
    [&](std::string_view text, bool /*isDelta*/, bool isFinal)
    {
        buffer.append(text);
        YourChatWidget::SetCurrentReply(buffer);
        if (isFinal)
        {
            YourChatWidget::CommitReply(buffer);
            buffer.clear();
        }
    });

SendText returns when TurnComplete has been processed. For a non-streaming call, just omit the callback and print nothing until the method returns; the reply is still delivered to whatever accumulator you maintain inside the callback.

On the UTryllAgentComponent, bind two events in Blueprint:

  • On Answer Text (Text: FString, bIsFinal: bool) — append Text to your UTextBlock. bIsFinal is true only on the very last chunk.
  • On Turn Complete (Status: ETryllTurnStatus) — stop the typing indicator and commit the final reply.

There is also On Answer Full (FullText: FString) which fires once after the turn completes, with the whole response in one string — useful if your UI only needs the final text.

What the frames look like

Frame Fires Payload
AnswerText Many times per turn text (delta), is_final
TurnComplete Once at the end status, tokens_generated

is_final is true on the very last AnswerText before TurnComplete. It is normally safe to ignore and just rely on TurnComplete to lock in the reply; is_final is useful when you want to switch UI state slightly earlier (e.g., hide the cursor blink before the "turn done" animation).

Streaming only part of the output

Nodes that are not Generate do not typically emit AnswerText frames. For example:

  • CannedResponse emits one AnswerText with the full response and is_final = true — not a stream, but it uses the same callback.
  • ToolCall with generate_on_no_tool = true (experimental) emits the residual text the same way — one shot, is_final = true.
  • ToolCall with notify_client = true does not use AnswerText for the tool call itself; it fires a separate ToolCallNotification. See Define and Handle Tool Calls.

Common pitfalls

  • Waiting for is_final as a synchronisation point is fine inside a Generate turn but misleading for CannedResponse — you will get one chunk with is_final = true. Always also bind TurnComplete for "turn is really done".
  • Threading in Unreal. OnAnswerText is dispatched on the game thread; you can touch UMG widgets directly. In C++, the SendText callback fires on the client library's reader thread — marshal to your UI thread explicitly. Python is non-streaming and single-threaded at the call site.
  • Accumulating bytes, not chars. For multi-byte scripts, concatenate the FString / std::string as you receive them; the server already chunks on UTF-8 boundaries.