Introduction:

Here’s the list to provide some materials describing the basic concept of prompt injection, the difference between jailbreaking and prompt injection, and the real-world examples of prompt injection.

Articles & Papers

This article introduces six mainstraem AI cybersecurity attacks with their corresponding consequences and remediations. Among them, prompt injection is placed on the first.

A prompt injection is a type of cyberattack against large language models (LLMs). Hackers disguise malicious inputs as legitimate prompts, manipulating generative AI systems (GenAI) into leaking sensitive data, spreading misinformation, or worse.

In this IBM’s article, it also provides several prevention and mitigation strategies to prompt injection cyberattack, including general input validation, least priviledge policy, and human-in-the-loop on high risk actions.

This article describes the basic concept of prompt injection and its type including direct prompt injection and indirect prompt injection. It also explains the difference between jailbreaking and prompt injection, which is their goals and processes.

Again, prompt injection happens when an attacker embeds malicious instructions into an AI’s input field to override its original programming. So the goal is to make the AI ignore prior instructions and follow the attacker’s command instead.

On the other hand: Jailbreaking is a technique used to remove or bypass an AI system’s built-in safeguards. The goal is to make the model ignore its ethical constraints or safety filters.

In this paper, it designs a novel, closed-box prompt leaking attack framework, called PLeak, to optimize an adversarial query such that when the attacker sends it to a target LLM application, its response reveals its own system prompt.


Remediation & Mitigation

Palo Alto Prompt Injection Solutions