Context-Aware Membership Inference Attacks against Pre-trained Large Language Models

1Mohamed bin Zayed University of Artificial Intelligence, 2Brave, 3Imperial College London 4National University of Singapore

We propose CAMIA (Context-Aware Membership Inference Attack), a simple and effective method to infer if a large language model was pretrained on the given text.

Problem

Large language models (LLMs) are known to memorize parts of their training data, raising concerns about privacy, copyright, and evaluation contamination. Membership Inference Attacks (MIAs) are designed to assess such memorization by testing whether a given data point was part of a model's training set. However, MIAs applied to pre-trained LLMs have been largely ineffective. A key reason is that most algorithms were originally developed for classification models and fail to capture the generative nature of LLMs. While classification models output a single prediction, LLMs generate text token by token, with each prediction shaped by the prefix that comes before it. Consequently, MIAs that simply aggregate outputs over an entire sequence miss the crucial token-level dynamics that underlie memorization in LLMs.

Key insight

Our key insight is that memorization in LLMs is context dependent.

  • • When the prefix provides clear guidance, such as through repetition or overlap with the next tokens, the model can generalize without memorizing.
  • • When the prefix is ambiguous or complex, the model becomes uncertain, and in these cases it is more likely to rely on memorized sequences from training.

Thus, an effective MIA should explicitly account for how context shapes predictive uncertainty at the token level, rather than simply relying on overall sequence loss.

Our method: CAMIA

We introduce a context aware MIA that analyzes how quickly a model moves from uncertainty to confident predictions as it generates text. CAMIA:

  • • Measures the rate at which uncertainty is resolved across prefixes
  • • Adjusts for cases where uncertainty is artificially reduced by repetition
  • • Makes token level decisions rather than relying on a single static threshold

Results

On the MIMIR benchmark, across Pythia and GPT Neo models and six domains, CAMIA consistently outperforms existing attacks. For example, when applied to Pythia 2.8B on the ArXiv dataset, it increases the true positive rate from 20.11% to 32.00% while keeping the false positive rate at 1%.

BibTeX

@article{chang2024context,
  title={Context-aware membership inference attacks against pre-trained large language models},
  author={Chang, Hongyan and Shamsabadi, Ali Shahin and Katevas, Kleomenis and Haddadi, Hamed and Shokri, Reza},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2025}
}