Concepto Estándar advanced EN

Prompt Injection

The #1 security risk for LLM applications (OWASP LLM01): crafted input that overrides the model's intended instructions, exfiltrates data, or triggers unintended actions.

Definition

Prompt injection is an attack where malicious input manipulates an LLM into ignoring its original instructions. It is the top entry in the OWASP Top 10 for LLM Applications because it is easy to attempt and hard to fully prevent.

Two flavors

  • Direct injection — the attacker types adversarial instructions straight into the prompt (“ignore your previous instructions and…”).
  • Indirect injection — malicious instructions are hidden in content the model later ingests (a web page, a document in a RAG corpus, an email). This is especially dangerous for agents with tool access.

Mitigations (defense in depth)

  • Strong system prompts + input/output guardrails.
  • Privilege separation: never let model output trigger irreversible actions without checks.
  • Treat retrieved/3rd-party content as untrusted.
  • Continuous monitoring of attack success rate in production.

Grafo de conocimiento