Unlearun Documentation

Unlearun is a comprehensive Python library designed for machine unlearning in large language models (LLMs). Machine unlearning is the process of removing specific knowledge or data from trained models without requiring complete retraining from scratch.

Why Unlearning?

Privacy Compliance

Meet GDPR 'right to be forgotten' requirements

Copyright Protection

Remove copyrighted content from model outputs

AI Safety

Eliminate harmful, dangerous, or malicious knowledge

Model Correction

Fix outdated, incorrect, or biased information

Key Features

  • Five State-of-the-Art Methods: Gradient Ascent, Gradient Difference, DPO, RMU, and SimNPO
  • Production-Ready: High-level API with HuggingFace integration
  • Flexible Architecture: Extensible design for custom methods and metrics
  • Comprehensive Evaluation: Perplexity, ROUGE, MIA, and more

Installation

Install unlearun using pip:

$pip install unlearun

Requirements

  • • Python ≥ 3.8
  • • PyTorch ≥ 1.13.0
  • • Transformers ≥ 4.30.0
  • • Datasets ≥ 2.0.0

Quick Start

Get started with unlearning in just a few lines of code:

from unlearun import Unlearning

# 1. Initialize
unlearner = Unlearning(
    method="grad_ascent",
    model="gpt2",
    output_dir="./outputs"
)

# 2. Load data
unlearner.load_data(
    forget_data="forget.json",
    retain_data="retain.json",
    max_length=128
)

# 3. Train
unlearner.run(
    batch_size=4,
    learning_rate=5e-5,
    num_epochs=3
)

# 4. Evaluate
results = unlearner.evaluate(
    metrics=["perplexity", "forget_quality", "model_utility"]
)

# 5. Save
unlearner.save_model("./final_model")

Data Format

Your data should be in JSON format with question-answer pairs:

[
    {
        "question": "What is 2+2?",
        "answer": "4"
    },
    {
        "question": "Who is the president?",
        "answer": "Joe Biden"
    }
]

Unlearning Methods

Gradient Ascent

The simplest and fastest method. Directly maximizes loss on forget data through gradient ascent.

Pros

  • Fast training
  • No reference model
  • Perfect for quick experiments

Cons

  • Unstable
  • Can harm model utility
unlearner = Unlearning(
    method="grad_ascent",
    model="gpt2",
    output_dir="./outputs"
)

Gradient Difference

Balances forgetting with retention by combining gradient ascent and descent on different data sets.

Pros

  • Stable optimization
  • Explicit retain preservation
  • Configurable loss weights

Cons

  • Requires careful tuning
unlearner = Unlearning(
    method="grad_diff",
    model="gpt2",
    gamma=2.0,  # Forget loss weight
    alpha=1.0,  # Retain loss weight
    retain_loss_type="KL"
)

Direct Preference Optimization (DPO)

Preference-based method that steers the model toward alternate answers while avoiding original ones.

Pros

  • Principled approach
  • Preference learning
  • Stable convergence

Cons

  • Requires alternate answers
  • Needs reference model
unlearner = Unlearning(
    method="dpo",
    model="gpt2",
    beta=0.5
)

Representation Misdirection (RMU)

Steers internal representations toward random directions for hazardous knowledge removal.

Pros

  • Safety-critical unlearning
  • Layer-specific control
  • Adaptive steering

Cons

  • Complex implementation
  • Requires reference model
unlearner = Unlearning(
    method="rmu",
    model="gpt2-large",
    steering_coeff=1.5,
    target_layer=12,
    adaptive=True
)

Simple NPO (SimNPO)

Margin-based loss approach that reduces likelihood of forget examples without reference models.

Pros

  • No reference model
  • Margin-based optimization
  • Memory efficient

Cons

  • Less principled than DPO
unlearner = Unlearning(
    method="simnpo",
    model="gpt2",
    delta=0.5,  # Margin
    beta=2.0    # Temperature
)

Data Processing

Forget Set

Data containing information that should be removed from the model. The model should increase perplexity on this data and reduce its ability to generate correct responses.

Retain Set

Data representing knowledge that should be preserved. The model should maintain its performance on this data after unlearning. Should be 2-5x larger than forget set.

Supported Formats

  • • JSON files (.json)
  • • JSONL files (.jsonl)
  • • HuggingFace datasets
  • • Python lists

Custom Keys

unlearner.load_data(
    forget_data="data.json",
    retain_data="data.json",
    question_key="input",  # Custom key
    answer_key="output",   # Custom key
    max_length=256
)

Evaluation Metrics

Perplexity

Language modeling quality. Higher is better for forget data, lower for retain.

Forget Quality

Normalized measure of unlearning success (0-1, higher is better).

Model Utility

Measures retention of useful knowledge (0-1, higher is better).

ROUGE Scores

Text similarity with ground truth for detecting memorization.

Verbatim Memorization

Measures exact reproduction of forget data (lower is better).

MIA (AUROC)

Privacy guarantee via membership inference attack (≈0.5 is best).

Usage

results = unlearner.evaluate(
    metrics=[
        "perplexity",
        "forget_quality",
        "model_utility",
        "rouge",
        "verbatim_memorization",
        "mia"
    ]
)

print(f"Forget Quality: {results['forget_quality']:.3f}")
print(f"Model Utility: {results['model_utility']:.3f}")

Troubleshooting

Model Forgets Too Much

Low model utility, high forget quality

Solutions:

  • Decrease gamma (forget loss weight)
  • Increase alpha (retain loss weight)
  • Reduce learning rate
  • Reduce number of epochs

Model Doesn't Forget Enough

Low forget quality, high model utility

Solutions:

  • Increase gamma
  • Decrease alpha
  • Increase learning rate
  • Try stronger method (RMU instead of GradAscent)

Training Instability

Loss spikes, NaN values

Solutions:

  • Reduce learning rate
  • Enable gradient clipping
  • Use mixed precision training
  • Check for data issues

Memory Issues

CUDA out of memory

Solutions:

  • Reduce batch size
  • Increase gradient accumulation
  • Enable gradient checkpointing
  • Use smaller model

Additional Resources