§ IV · Domain INP

Controls: 10
Edition: v.1.2

INP · Domain 4 of 9

Input & Prompt Security

Defense against direct and indirect prompt injection, including through memory and retrieval.

INP covers the full input attack surface: direct prompt injection, indirect injection via retrieved content, memory poisoning, and adversarial inputs that exploit model weaknesses. Controls cover input sanitization, trust boundaries between user and tool content, and the detection of injection attempts in retrieved documents.

Table INP.1 · Controls in INP · v.1.210 controls · 5-level maturity

INP-01

Direct prompt injection detection

Direct prompt injection detection is applied to user-provided inputs, using a combination of heuristic and classifier-based methods.

Direct injection — where a user attempts to override the system prompt through their input — is detected using a combination of heuristic and classifier-based methods. The agent does not disclose its system prompt or override its operating envelope in response to user-crafted inputs. Detection rates and false-positive rates are measured and reported.

L3 · Operated

Detection is applied to all user-facing input paths; detection rates and false-positive rates are measured; the detection pipeline is updated as new attack patterns are cataloged in SPC-08.

INP-02

Indirect prompt injection defense

Indirect prompt injection defense is applied to content returned from tools, retrieval systems, web pages, emails, documents, and agent memory. Such content is treated as untrusted data and is not executed as instruction without explicit user confirmation.

Indirect injection — where malicious instructions are placed in documents, web pages, emails, or other content the agent retrieves — is the most under-defended attack class in production today. INP-02 requires that retrieved content be processed in a trust boundary that prevents it from issuing commands to the agent. Detection rates and false-positive rates are measured.

L3 · Operated

All retrieval paths apply trust boundary tagging; sampling-based human review of flagged retrievals is in place; injection-attempt rate is reported quarterly with trend analysis.

INP-03

Trust boundary enforcement

A trust boundary is enforced: instructions originating from non-user sources require user confirmation through the primary interaction channel before any consequential action. In multi-agent chains, confirmation binding is governed by MAS-03.

The boundary between user-provided content and system-issued instructions is enforced at every layer. Instructions originating from non-user sources — tools, retrieved documents, other agents — cannot direct consequential actions without explicit user confirmation through the primary interaction channel. In multi-agent chains, confirmation binding is governed by MAS-03.

L3 · Operated

Trust boundaries are documented for all input paths; enforcement is tested as part of the adversarial assessment; no consequential action from an untrusted source has bypassed confirmation in the review period.

INP-04

Sensitive content handling

PII and other sensitive content in inputs is detected and handled per policy (redact, block, or log).

Personally identifiable information and other sensitive content in inputs is detected at the input layer and handled per the organization's data classification policy — redacted, blocked, or logged depending on the context and the agent's authorization level. This control works in concert with DAT-06 and DAT-07 to ensure sensitive data does not propagate where it should not.

L3 · Operated

Detection is applied to all input paths; handling rules are documented per data class; detection accuracy is measured; false negatives are tracked and the detection pipeline is tuned.

INP-05

Jailbreak pattern library

A jailbreak pattern library is maintained and continuously updated. Inputs are evaluated against known patterns prior to processing.

The organization maintains a continuously updated library of known jailbreak patterns, escape sequences, and adversarial prompt templates. All inputs are evaluated against this library before reaching the model. The library is updated as new attack patterns are discovered through red-team assessments, community disclosures, and production monitoring.

L3 · Operated

Pattern library exists and is current; inputs are evaluated against it in real time; the library has been updated within the prior quarter; new patterns from red-team and production findings are incorporated.

INP-06

Input size and complexity limits

Input size and complexity limits are enforced to prevent context-window flooding and denial-of-service.

Limits on input size and complexity — token counts, nesting depth, attachment counts — are enforced to prevent context-window flooding, resource exhaustion, and denial-of-service attacks. The limits are tuned to the agent's operating envelope and are enforced before the input reaches the model.

L3 · Operated

Limits are documented and enforced for all input paths; limit values are tuned to the agent's operating envelope; limit-exceeded events are logged and reviewed.

INP-07

Rate limiting and abuse detection

Rate limiting and abuse detection are enforced per user, per IP, and per tenant.

Rate limits on input volume are enforced per user, per IP address, and per tenant to prevent abuse, resource exhaustion, and automated attack campaigns. Abuse detection identifies patterns — rapid-fire requests, systematic probing, credential stuffing — and triggers throttling or blocking with appropriate logging.

L3 · Operated

Rate limits are enforced and documented; abuse-detection rules are active; rate-limit and abuse events are logged; thresholds are tuned based on observed traffic patterns.

INP-08

Untrusted content source handling

Untrusted content sources (emails, attachments, retrieved documents, web pages, embedded media) are handled as data; embedded instructions are not honored without verification.

Content from untrusted sources — emails, attachments, retrieved documents, web pages, embedded media — is treated as data, not as instruction. Any instructions embedded in such content are not honored without explicit verification through the trust boundary defined in INP-03. This control is the operational complement to INP-02's indirect injection defense.

L3 · Operated

All untrusted content sources are identified and documented; handling rules treat embedded instructions as data; the control is tested as part of adversarial assessment; no embedded instruction from an untrusted source has been executed without verification.

INP-09

Memory and retrieval injection defense

Content retrieved from agent memory, vector stores, embedding indexes, and RAG sources is treated as untrusted data. Instructions discovered in retrieved content are not executed without trust-boundary confirmation.

Content retrieved from agent memory, vector stores, embedding indexes, and RAG sources is treated as untrusted data under INP-02. Instructions discovered in retrieved content are not executed without trust-boundary confirmation per INP-03. Memory poisoning is included in red-team coverage under SPC-03. This control addresses the growing attack surface of persistent and session-scoped memory.

L3 · Operated

All retrieval paths (memory, vector stores, RAG) apply untrusted-data handling; memory poisoning is covered in the most recent red-team assessment; injection attempts in retrieved content are logged and surfaced to the human review queue.

INP-10

Retrieval and memory write authentication

Insertion paths into the agent's retrieval and memory layers are authenticated and authorized. Sources of retrievable content are documented. Adversarial insertion is tested as part of red-team coverage.

Write access to the agent's retrieval and memory layers — the paths by which content enters the corpus the agent can draw on — is authenticated and authorized. Sources of retrievable content are documented per DAT-10. Adversarial insertion is tested as part of red-team coverage to ensure that an attacker cannot plant content that will later influence agent behavior.

L3 · Operated

All write paths to retrieval and memory layers are authenticated; sources of retrievable content are documented; adversarial insertion testing has been performed within the documented cadence; unauthorized write attempts are logged.

Cross-references