Unlearun Documentation
Unlearun is a comprehensive Python library designed for machine unlearning in large language models (LLMs). Machine unlearning is the process of removing specific knowledge or data from trained models without requiring complete retraining from scratch.
Why Unlearning?
Privacy Compliance
Meet GDPR 'right to be forgotten' requirements
Copyright Protection
Remove copyrighted content from model outputs
AI Safety
Eliminate harmful, dangerous, or malicious knowledge
Model Correction
Fix outdated, incorrect, or biased information
Key Features
- Five State-of-the-Art Methods: Gradient Ascent, Gradient Difference, DPO, RMU, and SimNPO
- Production-Ready: High-level API with HuggingFace integration
- Flexible Architecture: Extensible design for custom methods and metrics
- Comprehensive Evaluation: Perplexity, ROUGE, MIA, and more
Installation
Install unlearun using pip:
Requirements
- • Python ≥ 3.8
- • PyTorch ≥ 1.13.0
- • Transformers ≥ 4.30.0
- • Datasets ≥ 2.0.0
Quick Start
Get started with unlearning in just a few lines of code:
from unlearun import Unlearning
# 1. Initialize
unlearner = Unlearning(
method="grad_ascent",
model="gpt2",
output_dir="./outputs"
)
# 2. Load data
unlearner.load_data(
forget_data="forget.json",
retain_data="retain.json",
max_length=128
)
# 3. Train
unlearner.run(
batch_size=4,
learning_rate=5e-5,
num_epochs=3
)
# 4. Evaluate
results = unlearner.evaluate(
metrics=["perplexity", "forget_quality", "model_utility"]
)
# 5. Save
unlearner.save_model("./final_model")Data Format
Your data should be in JSON format with question-answer pairs:
[
{
"question": "What is 2+2?",
"answer": "4"
},
{
"question": "Who is the president?",
"answer": "Joe Biden"
}
]Unlearning Methods
Gradient Ascent
The simplest and fastest method. Directly maximizes loss on forget data through gradient ascent.
Pros
- ✓ Fast training
- ✓ No reference model
- ✓ Perfect for quick experiments
Cons
- ✗ Unstable
- ✗ Can harm model utility
unlearner = Unlearning(
method="grad_ascent",
model="gpt2",
output_dir="./outputs"
)Gradient Difference
Balances forgetting with retention by combining gradient ascent and descent on different data sets.
Pros
- ✓ Stable optimization
- ✓ Explicit retain preservation
- ✓ Configurable loss weights
Cons
- ✗ Requires careful tuning
unlearner = Unlearning(
method="grad_diff",
model="gpt2",
gamma=2.0, # Forget loss weight
alpha=1.0, # Retain loss weight
retain_loss_type="KL"
)Direct Preference Optimization (DPO)
Preference-based method that steers the model toward alternate answers while avoiding original ones.
Pros
- ✓ Principled approach
- ✓ Preference learning
- ✓ Stable convergence
Cons
- ✗ Requires alternate answers
- ✗ Needs reference model
unlearner = Unlearning(
method="dpo",
model="gpt2",
beta=0.5
)Representation Misdirection (RMU)
Steers internal representations toward random directions for hazardous knowledge removal.
Pros
- ✓ Safety-critical unlearning
- ✓ Layer-specific control
- ✓ Adaptive steering
Cons
- ✗ Complex implementation
- ✗ Requires reference model
unlearner = Unlearning(
method="rmu",
model="gpt2-large",
steering_coeff=1.5,
target_layer=12,
adaptive=True
)Simple NPO (SimNPO)
Margin-based loss approach that reduces likelihood of forget examples without reference models.
Pros
- ✓ No reference model
- ✓ Margin-based optimization
- ✓ Memory efficient
Cons
- ✗ Less principled than DPO
unlearner = Unlearning(
method="simnpo",
model="gpt2",
delta=0.5, # Margin
beta=2.0 # Temperature
)Data Processing
Forget Set
Data containing information that should be removed from the model. The model should increase perplexity on this data and reduce its ability to generate correct responses.
Retain Set
Data representing knowledge that should be preserved. The model should maintain its performance on this data after unlearning. Should be 2-5x larger than forget set.
Supported Formats
- • JSON files (.json)
- • JSONL files (.jsonl)
- • HuggingFace datasets
- • Python lists
Custom Keys
unlearner.load_data(
forget_data="data.json",
retain_data="data.json",
question_key="input", # Custom key
answer_key="output", # Custom key
max_length=256
)Evaluation Metrics
Perplexity
Language modeling quality. Higher is better for forget data, lower for retain.
Forget Quality
Normalized measure of unlearning success (0-1, higher is better).
Model Utility
Measures retention of useful knowledge (0-1, higher is better).
ROUGE Scores
Text similarity with ground truth for detecting memorization.
Verbatim Memorization
Measures exact reproduction of forget data (lower is better).
MIA (AUROC)
Privacy guarantee via membership inference attack (≈0.5 is best).
Usage
results = unlearner.evaluate(
metrics=[
"perplexity",
"forget_quality",
"model_utility",
"rouge",
"verbatim_memorization",
"mia"
]
)
print(f"Forget Quality: {results['forget_quality']:.3f}")
print(f"Model Utility: {results['model_utility']:.3f}")Troubleshooting
Model Forgets Too Much
Low model utility, high forget quality
Solutions:
- Decrease gamma (forget loss weight)
- Increase alpha (retain loss weight)
- Reduce learning rate
- Reduce number of epochs
Model Doesn't Forget Enough
Low forget quality, high model utility
Solutions:
- Increase gamma
- Decrease alpha
- Increase learning rate
- Try stronger method (RMU instead of GradAscent)
Training Instability
Loss spikes, NaN values
Solutions:
- Reduce learning rate
- Enable gradient clipping
- Use mixed precision training
- Check for data issues
Memory Issues
CUDA out of memory
Solutions:
- Reduce batch size
- Increase gradient accumulation
- Enable gradient checkpointing
- Use smaller model