“In the realm of data, a secret threat emerges,
Model extraction, a villainous surge.
With defenses in place, the battle we'll track,
Protecting our models from the extraction attack.”
- ChatGPT, April 16, 2023

Introduction

Creating LLM models is a resource-intensive and time-consuming task that requires a large amount of training data (also referred to as corpus/token size, ranging from some million to several trillion tokens) and computing power to match the challenge. The training results that the neural network of the model is tuned with billions of wights/parameters. Your home laptop is not enough to replicate these efforts, therefore the models and their parameters are valuable assets for the companies who created them. Other parties (hackers, competitors) try to get their hands on these proprietary data with various techniques we will describe in this blog post. Stealing or extracting the model without the parameters would not result in the same behavior as the model you want to copy, similarly as a computer without software is not directly usable.

Language model sizes, source: https://lifearchitect.ai/models/

The Growing Threat of Model Extraction Attacks

AI models are increasingly being exposed through Application Programming Interfaces (APIs), enabling businesses to offer AI-driven services to users. However, this also makes the models vulnerable to extraction attacks. In these attacks, adversaries query the exposed models and use the outputs to recreate or approximate the target model, potentially violating intellectual property rights and enabling malicious uses.

‍

For example, researchers demonstrated this vulnerability in a study involving BERT-based APIs. BERT is a widely used transformer-based model for natural language understanding tasks. The researchers managed to extract the BERT model by querying the APIs with carefully crafted inputs, and the extracted model's performance was almost identical to the original model.

Model extraction setup, source: https://arxiv.org/abs/1910.12366

Model extraction attacks are not limited to text-based AI models. Recent studies have shown that self-supervised speech models, which learn representations from large amounts of unlabeled audio data, are also susceptible to these attacks. By analyzing the outputs of a self-supervised speech model, researchers were able to reconstruct the model's internal representations and use them to create a similar model.

‍

In another study, researchers focused on stealing machine learning models via prediction APIs. They demonstrated that attackers could use a limited number of API queries to steal the underlying model or gain valuable information about its structure and parameters. This type of attack not only compromises the model's confidentiality and integrity but also threatens the privacy of user data processed by the model.

Real-World Example

However, model extraction mainly happens behind the curtain, there are some scientific examples, which show the real possibilities and challenges of model extraction. As an example, the GPT4All project aimed to create a free and open-source model to accelerate open LLM research.

They fine-tuned the LLaMA 7B (7 billion parameters) model with one million prompt response pairs gathered from the GPT-3.5-Turbo OpenAI API. The total API cost to collect the training data was $500, while the final training cost was $100. To reach the final training, some trial and error steps should be taken, which cost about $800. Although it was only a fine-tune for the LLaMA 7B model in order to get an assistant style chat bot, the extraction and training costs remained in an affordable range.

Real-World Implications

Besides the business implication of model leaks, another consequence can be seen in the case of Meta's AI language model LLaMA – designed to generate human-like text –, whose parameters leaked online. The open source model’s parameters were discovered on the internet and has since been used for various purposes, including generating fake news, disinformation, and offensive content. While Meta has attempted to limit the model's misuse, the incident highlights the potential impact of the possibility of stealing models.

Countermeasures to Mitigate Model Extraction Attacks

To protect AI systems from model extraction attacks, organizations can adopt several countermeasures:

Limiting access to APIs: Implement authentication and authorization mechanisms to restrict API access and prevent unauthorized users from accessing the AI models. Rate limiting on API queries and monitoring usage patterns can be an effective way to prevent model extraction attacks as well.
Membership inference: A binary classification method, which aims to identify inputs that are possibly not coming from legitimate users, and in cases like that a randomly generated output is replied instead of the predicted answer of the model. A special version of this mitigation is the implicit membership classification which adds a “no answer” label to the nonsensical inputs to hamper even further the model extraction possibilities.
Model hardening: Employ techniques such as model watermarking (in a very small number of randomly selected queries a false/incorrect answer is responded), adding noise to predictions, and using model encryption to protect the model's intellectual property and make extraction more challenging for attackers.

‍

Conclusion

In conclusion, the rising prevalence of AI models and their increasing integration into various applications has led to a heightened risk of model extraction attacks. These attacks not only threaten the intellectual property of AI models’ creators but also pose risks to user privacy and can potentially lead to the spread of misinformation and other malicious activities. As a result, it is essential for organizations to invest in robust countermeasures to protect their AI models and ensure the responsible and secure deployment of these technologies. By adopting strategies such as access restriction, membership inference, and model hardening, organizations can lower the risks associated with model extraction attacks and maintain the confidentiality of their AI systems. Since this research is only in its infancy, more research is required, which will result in fascinating advancements in this domain.

‍

Sources:

Model extraction of Bert-based APIs

Model extraction attack against self-supervised speech models

GPT4AII Technical Report

Github-facebookresearch