Use Canned Responses and Guardrails¶
Short-circuit jailbreak attempts, off-topic prompts, and other unwanted inputs to a scripted reply — without ever running the model.
Prerequisites
- A session connected and configured — see Connect and Manage a Session.
The pattern¶
flowchart LR
guard["HumanMessageGuardrail<br>guard"]
refuse["CannedResponse<br>refuse"]
gen["Generate<br>answer"]
guard -- "triggered" --> refuse
guard -- "not_triggered" --> gen
refuse -- "default" --> END
gen -- "default" --> END
A HumanMessageGuardrail
matches the current user message against a list of case-insensitive
regexes. On a match, the graph takes the triggered route to a
CannedResponse that emits
one scripted reply (random selection by default; set
selection_strategy: first for deterministic picks). On a miss, it
falls through to Generate as normal.
Steps¶
from tryll_client import GraphDescription, NodeType
# 1. Define the patterns and the refusal lines as string storages.
# These die with the session; recreate on reconnect.
client.create_string_storage(
name="jailbreak_patterns",
strings=[
r"ignore\s+previous\s+instructions?",
r"you\s+are\s+DAN",
r"pretend\s+to\s+be",
],
)
client.create_string_storage(
name="refusal_lines",
strings=[
"I can't help with that request.",
"That's outside what I can assist with.",
],
)
# 2. Wire up the graph.
graph = (
GraphDescription()
.add_node("guard", NodeType.HumanMessageGuardrail,
{"string_storage": "jailbreak_patterns"})
.add_node("refuse", NodeType.CannedResponse,
{"string_storage": "refusal_lines"})
.add_node("answer", NodeType.Generate)
.wire("guard", "triggered", "refuse")
.wire("guard", "not_triggered", "answer")
.wire("refuse", "default", "END")
.wire("answer", "default", "END")
.set_start_node("guard")
.set_default_model_name("My Local Model")
)
agent = client.create_agent(graph)
namespace TC = Tryll::Client;
// 1. Storages
client.CreateStringStorage("jailbreak_patterns", {
R"(ignore\s+previous\s+instructions?)",
R"(you\s+are\s+DAN)",
});
client.CreateStringStorage("refusal_lines", {
"I can't help with that request.",
"That's outside what I can assist with.",
});
// 2. Graph
TC::GraphDescription graph;
graph.AddNode("guard", TC::NodeType::HumanMessageGuardrail,
{{"string_storage", "jailbreak_patterns"}})
.AddNode("refuse", TC::NodeType::CannedResponse,
{{"string_storage", "refusal_lines"}})
.AddNode("answer", TC::NodeType::Generate)
.Wire("guard", "triggered", "refuse")
.Wire("guard", "not_triggered", "answer")
.Wire("refuse", "default", "END")
.Wire("answer", "default", "END")
.SetStartNode("guard")
.SetDefaultModelName("My Local Model");
auto agent = client.CreateAgent(graph);
- Call
UTryllSubsystem::CreateStringStoragetwice — once for patterns, once for refusal lines. Bind On Create String Storage Complete for each and wait forbSuccess. - In a
UTryllWorkflowAsset(or directly on the component's Graph details panel), build the three-node graph with thestring_storageparameter set on the guardrail and the canned-response nodes.
Use file-based patterns instead of inline¶
If your patterns or refusal lines live in a text file on the server
machine (say you maintain them out-of-band), pass the path to
CreateStringStorage instead of listing strings inline. The server
reads the file at storage-creation time; the resulting storage is
used exactly like an inline one.
File format: UTF-8, one entry per line, # starts a comment, blank
lines are ignored. See
String Storage → File format
for the full spec.
Paths are resolved on the server, relative to the server's current working directory if not absolute.
Wire the graph the same way as in the inline example — the node
string_storage param refers to the storage by name and does not
care whether it was created from inline strings or from a file.
Server-default fallback¶
If a node has neither string_storage nor inline
pattern_0 / response_0 / … params, the server falls back to a
file the operator configured in
server-config.json
(default_canned_responses_path, default_guardrail_patterns_path).
This is a deployment-wide fallback for the server admin. Client workflows should create a storage explicitly; do not rely on the operator's defaults unless you control both.
Priority order: explicit string_storage > inline pattern_0 /
response_0 / … params > operator-configured default file. See the
node reference.
Verify it worked¶
Send a jailbreak-style prompt:
Call UTryllAgentComponent::SendMessage with the prompt; the
refusal line arrives as a single OnAnswerText chunk with
bIsFinal = true.
You should get one of the refusal lines back, with one
AnswerText chunk carrying the full string and
is_final = true. Server log:
Now a normal prompt:
UTryllAgentComponent::SendMessage("What's the capital of France?")
— the streamed reply arrives through On Answer Text.
goes through not_triggered → answer and runs the model.
Common pitfalls¶
- Regex anchors. Patterns are matched with
std::regex_search, so they hit anywhere in the message. If you want whole-message match, anchor with^…$yourself. - Compilation errors in a pattern fail the whole agent creation with error 3003. Test patterns in a regex tester first.
- False positives. A pattern like
pretendwill trigger on "the actor is pretending to be angry". Make patterns tighter:pretend\s+to\s+be\s+(DAN|an? (unrestricted|uncensored)). - Node with nothing to resolve. If you leave both
string_storageand inline params unset and the operator has not configured a default file, the node is effectively disabled — a guardrail always exits vianot_triggered, a canned-response has no lines to emit. Create a storage explicitly.