What term describes compromising AI by entering prompts that cause it to behave in unintended ways?

Study for the AAISM Domain 1: AI Governance Program Management Test. Utilize flashcards and multiple-choice questions. Each question includes hints and explanations to prepare you for success!

Multiple Choice

What term describes compromising AI by entering prompts that cause it to behave in unintended ways?

Explanation:
Prompt injection is a prompt-based attack that exploits how AI models interpret user input by injecting instructions into the prompt that steer the model to behave in unintended ways. It works because the model treats the supplied text as guidance, so cleverly crafted prompts can override safeguards, reveal restricted information, or ignore safety rules. This directly captures the idea of compromising AI through prompts. It differs from data poisoning, which corrupts training data to influence behavior over time, and from other terms that describe different threat models. Defenses include keeping system prompts separate from user content, reinforcing guardrails, input validation, and monitoring for jailbreak prompts.

Prompt injection is a prompt-based attack that exploits how AI models interpret user input by injecting instructions into the prompt that steer the model to behave in unintended ways. It works because the model treats the supplied text as guidance, so cleverly crafted prompts can override safeguards, reveal restricted information, or ignore safety rules. This directly captures the idea of compromising AI through prompts. It differs from data poisoning, which corrupts training data to influence behavior over time, and from other terms that describe different threat models. Defenses include keeping system prompts separate from user content, reinforcing guardrails, input validation, and monitoring for jailbreak prompts.

Subscribe

Get the latest from Passetra

You can unsubscribe at any time. Read our privacy policy