Stream Answers to a UI¶
Pipe each AnswerText chunk from the server into your application's
view layer as it arrives — the standard "typewriter" chat
experience.
Prerequisites
- An agent created and ready to receive messages — see Build a Chat Agent with a Graph.
Tryll streams one AnswerText frame per token chunk. The last frame
in a turn has is_final = true; a final TurnComplete frame carries
the turn's outcome (success / error / cancelled) and the total token
count.
Steps¶
Python does not expose per-chunk streaming today.
TryllClient.send_message blocks until the server sends
TurnComplete and returns the complete reply text.
Streaming chunks arrive over the wire and are accumulated
inside the client, but there is no user callback for each
chunk. Use C++ or Unreal when you need a per-token UI.
AgentProxy::SendText takes the per-chunk callback directly; it
fires on the client library's worker thread for each streamed
chunk and once more with isFinal = true on the last chunk.
std::string buffer;
agent.SendText("Tell me about the architecture.",
[&](std::string_view text, bool /*isDelta*/, bool isFinal)
{
buffer.append(text);
YourChatWidget::SetCurrentReply(buffer);
if (isFinal)
{
YourChatWidget::CommitReply(buffer);
buffer.clear();
}
});
SendText returns when TurnComplete has been processed. For a
non-streaming call, just omit the callback and print nothing until
the method returns; the reply is still delivered to whatever
accumulator you maintain inside the callback.
On the UTryllAgentComponent, bind two events in Blueprint:
- On Answer Text (Text: FString, bIsFinal: bool) — append
Textto yourUTextBlock.bIsFinalistrueonly on the very last chunk. - On Turn Complete (Status: ETryllTurnStatus) — stop the typing indicator and commit the final reply.
There is also On Answer Full (FullText: FString) which fires once after the turn completes, with the whole response in one string — useful if your UI only needs the final text.
What the frames look like¶
| Frame | Fires | Payload |
|---|---|---|
AnswerText |
Many times per turn | text (delta), is_final |
TurnComplete |
Once at the end | status, tokens_generated |
is_final is true on the very last AnswerText before
TurnComplete. It is normally safe to ignore and just rely on
TurnComplete to lock in the reply; is_final is useful when you
want to switch UI state slightly earlier (e.g., hide the cursor
blink before the "turn done" animation).
Streaming only part of the output¶
Nodes that are not Generate do not typically emit AnswerText
frames. For example:
CannedResponseemits oneAnswerTextwith the full response andis_final = true— not a stream, but it uses the same callback.ToolCallwithgenerate_on_no_tool = true(experimental) emits the residual text the same way — one shot,is_final = true.ToolCallwithnotify_client = truedoes not useAnswerTextfor the tool call itself; it fires a separateToolCallNotification. See Define and Handle Tool Calls.
Common pitfalls¶
- Waiting for
is_finalas a synchronisation point is fine inside aGenerateturn but misleading forCannedResponse— you will get one chunk withis_final = true. Always also bindTurnCompletefor "turn is really done". - Threading in Unreal.
OnAnswerTextis dispatched on the game thread; you can touch UMG widgets directly. In C++, theSendTextcallback fires on the client library's reader thread — marshal to your UI thread explicitly. Python is non-streaming and single-threaded at the call site. - Accumulating bytes, not chars. For multi-byte scripts,
concatenate the
FString/std::stringas you receive them; the server already chunks on UTF-8 boundaries.