Unlearun Documentation

Unlearun is a comprehensive Python library designed for machine unlearning in large language models (LLMs). Machine unlearning is the process of removing specific knowledge or data from trained models without requiring complete retraining from scratch.

Why Unlearning?

Privacy Compliance

Meet GDPR 'right to be forgotten' requirements

Copyright Protection

Remove copyrighted content from model outputs

AI Safety

Eliminate harmful, dangerous, or malicious knowledge

Model Correction

Fix outdated, incorrect, or biased information

Key Features

Five State-of-the-Art Methods: Gradient Ascent, Gradient Difference, DPO, RMU, and SimNPO
Production-Ready: High-level API with HuggingFace integration
Flexible Architecture: Extensible design for custom methods and metrics
Comprehensive Evaluation: Perplexity, ROUGE, MIA, and more

Installation

Install unlearun using pip:

$pip install unlearun

Requirements

• Python ≥ 3.8
• PyTorch ≥ 1.13.0
• Transformers ≥ 4.30.0
• Datasets ≥ 2.0.0

Quick Start

Get started with unlearning in just a few lines of code:

from unlearun import Unlearning

# 1. Initialize
unlearner = Unlearning(
    method="grad_ascent",
    model="gpt2",
    output_dir="./outputs"
)

# 2. Load data
unlearner.load_data(
    forget_data="forget.json",
    retain_data="retain.json",
    max_length=128
)

# 3. Train
unlearner.run(
    batch_size=4,
    learning_rate=5e-5,
    num_epochs=3
)

# 4. Evaluate
results = unlearner.evaluate(
    metrics=["perplexity", "forget_quality", "model_utility"]
)

# 5. Save
unlearner.save_model("./final_model")

Data Format

Your data should be in JSON format with question-answer pairs:

[
    {
        "question": "What is 2+2?",
        "answer": "4"
    },
    {
        "question": "Who is the president?",
        "answer": "Joe Biden"
    }
]

Unlearning Methods

Gradient Ascent

The simplest and fastest method. Directly maximizes loss on forget data through gradient ascent.

Pros

✓ Fast training
✓ No reference model
✓ Perfect for quick experiments

Cons

✗ Unstable
✗ Can harm model utility

unlearner = Unlearning(
    method="grad_ascent",
    model="gpt2",
    output_dir="./outputs"
)

Gradient Difference

Balances forgetting with retention by combining gradient ascent and descent on different data sets.

Pros

✓ Stable optimization
✓ Explicit retain preservation
✓ Configurable loss weights

Cons

✗ Requires careful tuning

unlearner = Unlearning(
    method="grad_diff",
    model="gpt2",
    gamma=2.0,  # Forget loss weight
    alpha=1.0,  # Retain loss weight
    retain_loss_type="KL"
)

Direct Preference Optimization (DPO)

Preference-based method that steers the model toward alternate answers while avoiding original ones.

Pros

✓ Principled approach
✓ Preference learning
✓ Stable convergence

Cons

✗ Requires alternate answers
✗ Needs reference model

unlearner = Unlearning(
    method="dpo",
    model="gpt2",
    beta=0.5
)

Representation Misdirection (RMU)

Steers internal representations toward random directions for hazardous knowledge removal.

Pros

✓ Safety-critical unlearning
✓ Layer-specific control
✓ Adaptive steering

Cons

✗ Complex implementation
✗ Requires reference model

unlearner = Unlearning(
    method="rmu",
    model="gpt2-large",
    steering_coeff=1.5,
    target_layer=12,
    adaptive=True
)

Simple NPO (SimNPO)

Margin-based loss approach that reduces likelihood of forget examples without reference models.

Pros

✓ No reference model
✓ Margin-based optimization
✓ Memory efficient

Cons

✗ Less principled than DPO

unlearner = Unlearning(
    method="simnpo",
    model="gpt2",
    delta=0.5,  # Margin
    beta=2.0    # Temperature
)

Data Processing

Forget Set

Data containing information that should be removed from the model. The model should increase perplexity on this data and reduce its ability to generate correct responses.

Retain Set

Data representing knowledge that should be preserved. The model should maintain its performance on this data after unlearning. Should be 2-5x larger than forget set.

Supported Formats

• JSON files (.json)
• JSONL files (.jsonl)
• HuggingFace datasets
• Python lists

Custom Keys

unlearner.load_data(
    forget_data="data.json",
    retain_data="data.json",
    question_key="input",  # Custom key
    answer_key="output",   # Custom key
    max_length=256
)

Evaluation Metrics

Perplexity

Language modeling quality. Higher is better for forget data, lower for retain.

Forget Quality

Normalized measure of unlearning success (0-1, higher is better).

Model Utility

Measures retention of useful knowledge (0-1, higher is better).

ROUGE Scores

Text similarity with ground truth for detecting memorization.

Verbatim Memorization

Measures exact reproduction of forget data (lower is better).

MIA (AUROC)

Privacy guarantee via membership inference attack (≈0.5 is best).

Usage

results = unlearner.evaluate(
    metrics=[
        "perplexity",
        "forget_quality",
        "model_utility",
        "rouge",
        "verbatim_memorization",
        "mia"
    ]
)

print(f"Forget Quality: {results['forget_quality']:.3f}")
print(f"Model Utility: {results['model_utility']:.3f}")

Troubleshooting

Model Forgets Too Much

Low model utility, high forget quality

Solutions:

Decrease gamma (forget loss weight)
Increase alpha (retain loss weight)
Reduce learning rate
Reduce number of epochs

Model Doesn't Forget Enough

Low forget quality, high model utility

Solutions:

Increase gamma
Decrease alpha
Increase learning rate
Try stronger method (RMU instead of GradAscent)

Training Instability

Loss spikes, NaN values

Solutions:

Reduce learning rate
Enable gradient clipping
Use mixed precision training
Check for data issues

Memory Issues

CUDA out of memory

Solutions:

Reduce batch size
Increase gradient accumulation
Enable gradient checkpointing
Use smaller model

Unlearun Documentation

Why Unlearning?

Privacy Compliance

Copyright Protection

AI Safety

Model Correction

Key Features

Installation

Requirements

Quick Start

Data Format

Unlearning Methods

Gradient Ascent

Pros

Cons

Gradient Difference

Pros

Cons

Direct Preference Optimization (DPO)

Pros

Cons

Representation Misdirection (RMU)

Pros

Cons

Simple NPO (SimNPO)

Pros

Cons

Data Processing

Forget Set

Retain Set

Supported Formats

Custom Keys

Evaluation Metrics

Perplexity

Forget Quality

Model Utility

ROUGE Scores

Verbatim Memorization

MIA (AUROC)

Usage

Troubleshooting

Model Forgets Too Much

Solutions:

Model Doesn't Forget Enough

Solutions:

Training Instability

Solutions:

Memory Issues

Solutions:

Additional Resources

GitHub Repository

PyPI Package

Support