June 29, 2026

Instruction Bleed: when your SKILL.md files quietly interfere

If you build agents the modern way, your agent’s behavior lives in text. A SKILL.md here, an AGENTS.md there, a persona file, a rules file, a handful of mode definitions — all concatenated into one context window and handed to a model that interprets them. It’s a pleasant way to work. You edit a file, the behavior changes, no redeploy.

A paper posted to arXiv in late June 2026 puts a name to a failure mode that people who work this way keep reporting: edit one of those files, and the behavior of an unrelated one shifts — even though the two share no variable, no import, no executable dependency. The authors, Ching-Yu Lin and Yifan Liu, call it compositional behavioral leakage (CBL), and the practitioner-facing name they use is instruction bleed.

This post is a plain explainer: what the paper actually claims, how they measured it, and — just as important — what it does not claim. The finding is real but small, and the honest version is more useful than the scary one.

The one-sentence version

Two prompt modules that share a context window can interfere with each other’s behavior, because transformer self-attention provides no boundary between them.

That’s it. When you paste a “Professional Chef” persona into a shared rules file, the model doesn’t quarantine those tokens to the cooking task. Every token attends to every other token. A ### Persona heading is, in the authors’ words, “a statistical hint, not a scope boundary.” The structure you see — sections, files, headings — is a convenience for you. The model sees one undifferentiated stream.

What they actually measured

The strength of the paper is that it doesn’t hand-wave. It defines CBL operationally and then goes looking for it on a real system.

The test subject was a job-evaluation agent (career-ops) that scores job descriptions 1–5 across several dimensions. The focal metric they watched was cv_match. The model was Claude Sonnet 4.6. The design: 12 job descriptions × 3 runs × 4 conditions = 144 trials.

The clever part is the three conditions, each perturbing a non-focal module — a part of the prompt that has nothing to do with cv_match:

Volume (C1): add a 200-line, unrelated recipe-evaluation module. Does more context move the score?
Content (C2): add an irrelevant “Professional Chef” archetype row to a shared rules file. Does

meaningful but off-topic text move the score?

Form (C3): change headings, emoji, and ordering without changing meaning. Does formatting move

the score?

Only one channel moved the needle.

Condition	Δ vs baseline	Cohen’s d	95% CI	Effect?
Volume	−0.03	−0.11	includes 0	No
Content	+0.17	0.63	[+0.03, +0.31]	Yes
Form	−0.08	−0.29	includes 0	No

Adding the chef persona nudged the job-match score up by about 0.17 on the 1–5 scale (a medium effect, d ≈ 0.63), and 8 of 12 job descriptions shifted in the same direction. Adding unrelated bulk did nothing measurable. Reformatting did nothing measurable. The interference is specifically about semantic content bleeding across module lines — not context length, not formatting.

The part everyone will skip — and shouldn’t

Here is the finding’s most important sentence: no recommendation flipped. In all 144 trials, across every condition, the agent’s actual decision never changed. The chef persona shifted the distribution of scores, but never enough to push a candidate across a decision boundary.

So this is a sub-threshold effect. It’s the kind of thing standard QA will never catch, because standard QA checks outputs and the outputs looked fine. The authors are careful to frame this as an existence proof, not a prevalence claim: one model, one system, a measurable-but-small signal. They are not saying your agents are broken. They are saying the interference is real, has a mechanism, and is currently invisible to how most teams test.

Whether sub-threshold drift matters is a judgment call about your system. A score that wobbles under the decision line is harmless once and potentially meaningful across the thousands of decisions a deployed agent makes. The paper measures the wobble; it doesn’t characterize how it compounds. That’s left open.

Why it happens (and why it’s hard to “just fix”)

The root cause is architectural, not a bug in anyone’s prompt. Self-attention computes pairwise interactions across the entire input. There’s no mechanism that restricts a module’s tokens from attending to another module’s tokens. Files, headings, and --- separators are formatting you imposed; the attention mechanism doesn’t honor them.

The authors raise the obvious question and leave it genuinely open: could providers offer “module-isolation primitives” — separately cached prompt segments with restricted cross-segment attention — or is text-level isolation “fundamentally impossible” under global attention? Nobody knows yet. That’s an honest open question, not a solved one.

What this is not

Because it’s easy to over-read, the paper is explicit about the boundaries, and so should we be:

Not prompt injection. No malicious input. CBL is accidental, single-agent, and happens at

initialization — your own well-meaning modules interfering with each other.

Not “context rot” or cognitive degradation. The volume channel was null. This isn’t “long prompts

get dumber over time.”

Not a formatting problem. The form channel was null too. Reordering and re-styling didn’t move the

score in this system. (The authors note other work has found large format sensitivity in other settings — so this may be model- and system-dependent.)

Not generalized. One model, one system. They predict the magnitude will vary by model family, but

that’s a prediction to be tested, not a result.

Why a kanban-protocol blog cares

Short version: because the systems the paper surveys are the systems our readers build. Its system survey names the Markdown convention directly — SOUL.md, SKILL.md, AGENTS.md, the OpenClaw ecosystem with thousands of community skills — as the canonical example of text-composed agentic systems where this interference can occur. If you maintain a growing pile of skill and agent files, this is a finding about your stack.

In a follow-up post we’ll get practical: what instruction bleed means if you actually write SKILL.md and AGENTS.md files, and what the authors suggest you do about it (spoiler: it involves a kind of regression testing almost nobody does yet). For now, the takeaway is just the name and the shape of the thing — so the next time editing one file moves something it shouldn’t, you know what you’re looking at.

Source: Ching-Yu Lin & Yifan Liu, “Instruction Bleed: Cross-Module Interference in Prompt-Composed Agentic Systems,” arXiv:2606.26356 (submitted 24 June 2026). All figures in this post — the 144 trials, the d = 0.63 content effect, the [+0.03, +0.31] interval, and “no recommendation flipped” — are quoted from that paper.

Run your workflow as a protocol, not a board

kanbento is a headless, agent-native kanban — your agents operate the board through a CLI while state lives in plain files you can read and diff.

Get in touch