Tech Term Decoded: Prompt Leaking

Definition

Prompt leaking is a type of prompt injection which tricks the model to spit out its own prompt. It is the unintentional exposure of sensitive information, system details, or proprietary prompts through the outputs of an AI model. This happens when the model accidentally reveals parts of its prompt or other confidential data in its responses, possibly endangering security or intellectual property [1].

An example scenario of prompt leak [2].

Origin

Prompt leaking is a type of prompt injection which tricks the model to spit out its own prompt. The concept of prompt injection attack (though not known by that name then) was first described by Riley Goodside in his tweet from Sep 12, 2022, when he noticed that if you add some new instruction at the end of a GPT-3 prompt, the bot will follow this instruction even when explicitly instructed not to.

As new LLM abuse methods were discovered over the course of time, prompt injection has been easily embraced as an umbrella term for all attacks against LLMs that involve any kind of prompt manipulation.

Context and Usage

Prompt leaking can be used to impact businesses negatively through several ways such as loss of competitive advantage if proprietary prompts are exposed, potential exposure of business logic and security measures, and reduced value of prompt engineering investments.

For instance, a telemedicine platform's diagnostic AI using prompts with specific symptom evaluation protocols and treatment pathways developed through years of medical research could lose their market advantage if competitors gain access to their diagnostic expertise and sensitive medical decision-making processes that patients assumed were confidential [2].

Why it Matters

Depending on the content of a prompt, a successful prompt leaking attack can copy the system prompt used in the model, possibly giving the attacker access to valuable information, such as sensitive personal information or intellectual property, which may be used to replicate some of the functionality of the model [3].

In Practice

A good example of a real-life case study of prompt leaking can be seen in the case of Chevrolet AI Chatbot. In December 2023, a Chevrolet dealership’s AI chatbot was tricked into offering a $76,000 Tahoe for just $1. A user easily manipulated the AI chatbot’s responses, proving that these customer-facing tools frequently present on websites can be exploited through simple prompts [4].

Tech Term Decoded: Prompt Leaking

Post a Comment

Contact Form