AI systems are voracious consumers of personal data — training on medical records, financial transactions, private messages, and browsing history. Data privacy in AI addresses three concerns: (1) Training data privacy — preventing models from memorising and leaking personal information. (2) Inference privacy — protecting the data of individuals whose information is used for predictions. (3) Model inversion — the risk that model outputs reveal training data. Differential privacy provides mathematically rigorous privacy guarantees. GDPR and CCPA create legal frameworks. The tension between AI capability (which often improves with more data) and privacy (which requires limiting data) is a central AI ethics challenge.
The model memorisation problem
Large language models memorise training data. GPT-2 can reproduce verbatim text from The New York Times articles seen during training. GPT-3 can repeat individuals' personal information from web pages. Carlini et al. (2021) demonstrated that by carefully crafting prompts, they could extract memorised credit card numbers, private emails, and personal information from GPT-2. This is not a bug — it is an inherent risk of training on real-world data containing private information.
Differential Privacy — mathematical privacy guarantee
For any two datasets D and D' differing in one individual, the mechanism M's output is nearly indistinguishable. ε (epsilon) = privacy budget: smaller = more private, more noise. δ = probability of failure. Gaussian mechanism adds calibrated noise to gradients during training.
Differential Privacy in ML training with Opacus
# Opacus: differentially private training for PyTorch
# pip install opacus
import torch
import torch.nn as nn
from opacus import PrivacyEngine
from opacus.validators import ModuleValidator
# Standard model and training setup
model = nn.Sequential(nn.Linear(10, 64), nn.ReLU(), nn.Linear(64, 2))
model = ModuleValidator.fix(model) # Fix any incompatible layers
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)
loss_fn = nn.CrossEntropyLoss()
# Wrap with PrivacyEngine — adds Gaussian noise to gradients
privacy_engine = PrivacyEngine()
model, optimizer, data_loader = privacy_engine.make_private_with_epsilon(
module=model,
optimizer=optimizer,
data_loader=train_loader,
epochs=10,
target_epsilon=1.0, # Privacy budget: lower = more private
target_delta=1e-5, # Probability of privacy failure
max_grad_norm=1.0, # Clip gradients to bound sensitivity
)
# Training loop is identical to non-private training
for epoch in range(10):
for X, y in data_loader:
optimizer.zero_grad()
loss = loss_fn(model(X), y)
loss.backward()
optimizer.step()
# Report current privacy expenditure
epsilon = privacy_engine.get_epsilon(delta=1e-5)
print(f"Epoch {epoch}: ε = {epsilon:.3f} (target: 1.0)")
# Privacy guarantee: knowing the model output tells an adversary almost nothing
# about whether any specific individual was in the training data
# ── Membership Inference Attack (testing if data was used for training) ──
def membership_inference_attack(model, target_point, threshold=0.7):
"""
Simple membership inference: if model is very confident on this point,
it may have memorised it (was in training set).
"""
with torch.no_grad():
probs = torch.softmax(model(target_point), dim=-1)
confidence = probs.max().item()
return confidence > threshold, confidence
# With differential privacy: confidence should be roughly equal for
# training and non-training data — membership inference becomes hardGDPR compliance for AI systems
| GDPR Requirement | AI implication | Technical implementation |
|---|---|---|
| Lawful basis for processing | Training data must have consent or legitimate interest | Data provenance tracking, consent management systems |
| Purpose limitation | Data collected for one purpose cannot be used for another | Separate training pipelines, use-case documentation |
| Data minimisation | Collect only what is necessary | Feature selection, anonymisation before training |
| Right to erasure (Right to be forgotten) | Delete a person's data from model | Machine unlearning algorithms (active research area) |
| Right to explanation (Article 22) | Explain automated decisions | XAI systems (SHAP, LIME, rule-based fallbacks) |
| Data Protection Impact Assessment | Assess privacy risks before high-risk AI deployment | Pre-deployment risk assessment documentation |
Machine unlearning — the hardest GDPR problem
Right to erasure (GDPR Article 17) requires that personal data be deleted on request. For traditional databases this is straightforward. For ML models trained on that data: the information is distributed across millions of parameters in ways that cannot be directly "deleted." Machine unlearning (removing a specific training example's influence from a trained model) is an active research problem with no fully satisfactory solution yet. Current approaches: retrain from scratch (impractical for LLMs), approximate unlearning (perturb weights), or certified unlearning (bounds on residual information).
Practice questions
- What is a membership inference attack? (Answer: An adversary queries a trained model to determine whether a specific data point was in the training set. If the model is more confident on training data than held-out data, the adversary can infer membership. Used to reveal whether medical records, private messages, or financial data were used for training.)
- Differential privacy with ε=0.1 vs ε=10 — which is more private and what is the cost? (Answer: ε=0.1 is far more private (smaller privacy budget = more noise added). Cost: more noise = lower model accuracy. ε=0.1 makes the model nearly indistinguishable regardless of any individual's data but may reduce accuracy significantly. ε=10 provides weak privacy guarantees but minimal accuracy loss. Typical production: ε=1-3.)
- An AI company trains a model on user emails without explicit consent but argues they have "legitimate interest" under GDPR. Is this valid? (Answer: Disputed. GDPR's legitimate interest basis requires the company's interest to not be overridden by the individual's privacy interests. Training commercial AI on private personal communications would likely fail this test. Most regulators and courts have ruled that legitimate interest does not justify using personal data for commercial AI training without consent.)
- What makes "right to erasure" for ML models technically difficult? (Answer: Neural network weights are holistically trained — no single data point has a dedicated location. The influence of one training example is spread across all parameters. You cannot "delete" a data point from a trained model. Retraining from scratch is the only complete solution — infeasible for LLMs. Active research: certified unlearning, gradient-based approximate unlearning.)
- Max grad norm clipping in differential privacy serves what purpose? (Answer: Gradient clipping bounds the sensitivity of each training step to any single data point. Without clipping, one outlier data point can have unbounded influence on gradient updates. Clipping to max_grad_norm=1.0 ensures each sample contributes at most 1.0 to the gradient norm — allowing precisely calibrated noise to achieve (ε,δ)-privacy.)
On LumiChats
Anthropic publicly commits to not training on user conversations by default and provides opt-out mechanisms. Understanding differential privacy explains why privacy guarantees are mathematically bounded rather than absolute — and why truly private AI requires fundamentally different training approaches than standard gradient descent.
Try it free