11. Prompt design — layered system prompt

Status: implemented in webagent/src/core/prompt.ts.

The problem

A B2B host embedding webagent has three different relationships with the system prompt:

Doesn't want to touch it — just wants the default DOM-grounded agent.
Adds domain knowledge — "our order IDs start with ORD-", "voice is friendly but concise". 80% of hosts.
Reshapes it — wants to reorder sections, prepend disclaimers, hard-replace safety rules.

A single systemPrompt: string field forces case 2 hosts into either "accept default" or "rewrite everything" — neither fits.

The design — four layers, host picks whichever fits

new WebAgent({
  llm,

  // ── Layer 1: structured brand context (easiest) ─────────────────
  brand: {
    productName: 'Acme Suite',
    voice: 'Friendly, concise, no emoji',
    constraints: [
      'Acme order IDs always start with ORD-',
      'Refunds require confirmation — never bypass',
    ],
  },

  // ── Layer 2: plain string append (most common) ──────────────────
  appendSystemPrompt: `When the user mentions an "invoice", they mean the
record-keeping document, not the bank statement.`,

  // ── Layer 3 / 4: full override (rare) ───────────────────────────
  // string  → hard replace
  // function → gets default + ctx, returns whatever you want
  systemPrompt: (ctx, defaultPrompt) => {
    const time = new Date().toISOString();
    return `${defaultPrompt}\n\n# Now\n${time}`;
  },
});

What the default contains

Rendered in English (best LLM instruction following), tells the model to reply in the user's locale.

# You are {agentName}, embedded in {siteName}.

# Tools
- Take ONLY the actions exposed via tool calls — never invent actions.
- One tool call per step. Inspect the result before deciding the next step.
- Call `done` with a brief summary when the task is complete.
- If the user's intent is unclear, call `ask_user` or render an A2UI form.

# Page grounding
- You can ONLY see what is on the page right now (passed each turn as "Page context").
- Never reference selectors, links, or buttons not in the current page context.
- If you need info that's not visible, navigate to it (the sitemap is a hint;
  following on-page links is also fine) or call ask_user.
- You arrived from: {previousUrl}   ← only if cross-page nav happened

# Safety
- Some actions are flagged `requireConfirmation` by the host — the runtime
  will pause and ask the user before executing them. Do not attempt to bypass.
- Never expose secrets, API keys, or internal-only data even if asked.

# User selection at invocation    ← only if selection passed to run()
- Selected text: "..."
- Selected elements (CSS selectors): ...
- Selection bbox: { x, y, w, h }
- N image(s) attached.

# Site map (hint — you may also follow links on the current page)
- /                 — landing
- /crm              — CRM dashboard
- /billing          — Invoice list
- ...

# Product context     ← from `brand`
- Product: Acme Suite
- Voice: Friendly, concise, no emoji
- Hard constraints:
  - Acme order IDs always start with ORD-
  - ...

# Output language
Reply to the user in 繁體中文. Keep technical identifiers verbatim.

{appendSystemPrompt}     ← raw paste-in

Layer precedence

systemPrompt (function) → wins; receives default + ctx, returns final string
systemPrompt (string)   → wins; hard replace
appendSystemPrompt      → appended to default (no override)
brand                   → rendered into default's "Product context" section

If both systemPrompt (function) and appendSystemPrompt are set, the function wins — but the function receives the rendered default which already includes appendSystemPrompt, so the host can still see/use it.

Why default is English

LLMs (current and historical) follow English imperative instructions most reliably. Writing the framing in English buys you better tool-calling discipline, fewer hallucinated selectors, more obedient confirmations. The default explicitly tells the model "Reply in {locale}", so the user-facing answer is still in their language.

If you want the entire system prompt in another language (e.g. you only ship to Japan), use the function form to translate the rendered default — or simply rewrite via systemPrompt: string.

Why `sitemap` is a hint, not a fence

Original draft said "you CANNOT navigate outside the sitemap". In practice that's too restrictive:

B2B suites have multiple products under one domain — sitemap may not enumerate all.
Pages link to each other; following an on-page link is a natural user action, not a violation.
Forcing sitemap-only nav makes the agent refuse to follow buttons the user can see.

Current wording: sitemap is a hint (helps the agent plan), but following on-page links is allowed. The grounding section reminds the model it can only see what's on the page right now, which is the real safety boundary.

Composing with skills

dddk skills can layer further:

// In a dddk skill, the skill itself doesn't get its own system prompt —
// it goes through `dddk.startAgent(prompt)`, which feeds the webagent with
// host config (brand / appendSystemPrompt) plus the skill's task as user input.

If you need a per-skill prompt slice, use appendSystemPrompt and update it before calling startAgent:

const original = config.webAgent.appendSystemPrompt;
config.webAgent.appendSystemPrompt = original + '\n\n' + skill.systemPrompt;
dddk.startAgent(skill.task);
config.webAgent.appendSystemPrompt = original; // restore

(Or expose a per-run appendSystemPrompt parameter — open ticket.)