AI Fairness in Practice: Detecting and Mitigating Bias in Machine Learning
Note: This guide is based on fairness research including “Fairness and Machine Learning” by Barocas et al., AI Fairness 360 (IBM Research), Fairlearn (Microsoft), and documented case studies from COMPAS recidivism algorithm analysis. All code examples use established fairness metrics and follow industry best practices for responsible AI.
AI bias has real-world consequences: Amazon’s recruiting tool penalized resumes mentioning “women’s” activities, COMPAS criminal risk assessment showed racial disparities, and healthcare algorithms under-allocated resources to Black patients. As ML systems increasingly make high-stakes decisions about loans, jobs, and parole, detecting and mitigating bias is not just ethical—it’s legally required under regulations like GDPR and the EU AI Act.
This guide demonstrates measuring fairness with multiple metrics, detecting bias using AI Fairness 360 and Fairlearn, implementing mitigation strategies (reweighing, adversarial debiasing), and deploying fair models in production.
Prerequisites
Required Knowledge:
- Python 3.8+ programming
- Machine learning basics (classification, evaluation metrics)
- Understanding of protected attributes (race, gender, age)
- Familiarity with pandas and scikit-learn
Required Tools:
# Install core ML libraries
pip install pandas==2.1.0 numpy==1.24.3 scikit-learn==1.3.0
# Install fairness libraries
pip install aif360==0.5.0 # AI Fairness 360 (IBM)
pip install fairlearn==0.9.0 # Fairlearn (Microsoft)
# Install model interpretation
pip install shap==0.44.0
pip install lime==0.2.0.1
# Install visualization
pip install matplotlib==3.8.0 seaborn==0.12.2
# Install additional tools
pip install imbalanced-learn==0.11.0 # Resampling techniques
Understanding AI Bias
Types of Bias
| Bias Type | Definition | Example |
|---|---|---|
| Historical Bias | Training data reflects existing societal inequalities | Hiring data showing men in senior roles |
| Representation Bias | Training data doesn’t represent real-world distribution | Face recognition trained mostly on white faces |
| Measurement Bias | Proxy variables correlate with protected attributes | ZIP code as proxy for race in lending |
| Aggregation Bias | One model for all groups ignores subgroup differences | Diabetes model trained on adults applied to children |
| Evaluation Bias | Benchmark doesn’t represent deployment population | Speech recognition tested only on native speakers |
| Deployment Bias | System used differently than intended | Risk assessment designed for bail used for sentencing |
Real-World Case Studies
1. COMPAS Recidivism Algorithm (2016)
- Issue: ProPublica analysis found Black defendants labeled higher risk than white defendants with similar criminal histories
- Impact: Used to make bail and sentencing decisions affecting thousands
- Metric Violated: Equalized odds (false positive rates differed by race)
2. Amazon Recruiting Tool (2018)
- Issue: AI penalized resumes mentioning “women’s” (e.g., “women’s chess club”)
- Impact: Systematically disadvantaged female candidates
- Root Cause: Training data from 10 years of resumes (mostly male)
3. Healthcare Resource Allocation (2019)
- Issue: Algorithm used healthcare costs as proxy for health needs, under-allocated to Black patients
- Impact: Reduced access to care for sicker Black patients
- Fix: Changed prediction target from cost to actual health needs
Fairness Metrics
Implementing Core Fairness Metrics
# fairness_metrics.py - Calculate fairness metrics
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from typing import Dict, Tuple
class FairnessMetrics:
"""
Calculate fairness metrics for classification models
"""
def __init__(self, y_true: np.ndarray, y_pred: np.ndarray, sensitive_attr: np.ndarray):
"""
Args:
y_true: True labels (0/1)
y_pred: Predicted labels (0/1)
sensitive_attr: Protected attribute (e.g., race, gender)
"""
self.y_true = np.array(y_true)
self.y_pred = np.array(y_pred)
self.sensitive_attr = np.array(sensitive_attr)
# Get unique groups
self.groups = np.unique(sensitive_attr)
def demographic_parity(self) -> Dict[str, float]:
"""
Demographic Parity: P(Ŷ=1 | A=0) = P(Ŷ=1 | A=1)
Positive prediction rate should be same across groups
Returns:
Dict with positive rate per group and disparity
"""
rates = {}
for group in self.groups:
mask = self.sensitive_attr == group
rates[f"group_{group}"] = np.mean(self.y_pred[mask])
# Calculate maximum disparity
disparity = max(rates.values()) / min(rates.values()) if min(rates.values()) > 0 else float('inf')
return {
**rates,
'disparity_ratio': disparity,
'is_fair': 0.8 <= disparity <= 1.25 # 80% rule (EEOC guideline)
}
def equalized_odds(self) -> Dict[str, Dict[str, float]]:
"""
Equalized Odds: P(Ŷ=1 | Y=y, A=a) should be same for all groups
TPR and FPR should be equal across groups
Returns:
Dict with TPR/FPR per group
"""
metrics = {}
for group in self.groups:
mask = self.sensitive_attr == group
# Get confusion matrix
tn, fp, fn, tp = confusion_matrix(
self.y_true[mask],
self.y_pred[mask]
).ravel()
# Calculate TPR and FPR
tpr = tp / (tp + fn) if (tp + fn) > 0 else 0 # Sensitivity/Recall
fpr = fp / (fp + tn) if (fp + tn) > 0 else 0 # False Positive Rate
metrics[f"group_{group}"] = {
'true_positive_rate': tpr,
'false_positive_rate': fpr
}
# Calculate disparities
tprs = [m['true_positive_rate'] for m in metrics.values()]
fprs = [m['false_positive_rate'] for m in metrics.values()]
metrics['tpr_disparity'] = max(tprs) - min(tprs)
metrics['fpr_disparity'] = max(fprs) - min(fprs)
metrics['is_fair'] = metrics['tpr_disparity'] < 0.1 and metrics['fpr_disparity'] < 0.1
return metrics
def equal_opportunity(self) -> Dict[str, float]:
"""
Equal Opportunity: TPR should be same across groups
Focuses on true positives (those who should get positive outcome)
Returns:
Dict with TPR per group and disparity
"""
tprs = {}
for group in self.groups:
mask = self.sensitive_attr == group
# Only look at positive class (Y=1)
positive_mask = mask & (self.y_true == 1)
if positive_mask.sum() > 0:
tpr = np.mean(self.y_pred[positive_mask])
tprs[f"group_{group}"] = tpr
else:
tprs[f"group_{group}"] = 0.0
# Calculate disparity
disparity = max(tprs.values()) - min(tprs.values())
return {
**tprs,
'disparity': disparity,
'is_fair': disparity < 0.1
}
def predictive_parity(self) -> Dict[str, float]:
"""
Predictive Parity: PPV (precision) should be same across groups
P(Y=1 | Ŷ=1, A=a) should be equal
Returns:
Dict with PPV per group and disparity
"""
ppvs = {}
for group in self.groups:
mask = self.sensitive_attr == group
# Only look at positive predictions
predicted_positive_mask = mask & (self.y_pred == 1)
if predicted_positive_mask.sum() > 0:
ppv = np.mean(self.y_true[predicted_positive_mask])
ppvs[f"group_{group}"] = ppv
else:
ppvs[f"group_{group}"] = 0.0
# Calculate disparity
disparity = max(ppvs.values()) - min(ppvs.values())
return {
**ppvs,
'disparity': disparity,
'is_fair': disparity < 0.1
}
def calculate_all_metrics(self) -> Dict:
"""Calculate all fairness metrics"""
return {
'demographic_parity': self.demographic_parity(),
'equalized_odds': self.equalized_odds(),
'equal_opportunity': self.equal_opportunity(),
'predictive_parity': self.predictive_parity()
}
# Example usage
if __name__ == "__main__":
# Simulated predictions
y_true = np.array([1, 1, 0, 0, 1, 1, 0, 0])
y_pred = np.array([1, 0, 0, 0, 1, 1, 1, 0])
sensitive_attr = np.array(['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'])
metrics = FairnessMetrics(y_true, y_pred, sensitive_attr)
results = metrics.calculate_all_metrics()
print("Fairness Analysis:")
print(f"Demographic Parity: {results['demographic_parity']}")
print(f"Equalized Odds: {results['equalized_odds']}")
print(f"Equal Opportunity: {results['equal_opportunity']}")
Bias Detection with AI Fairness 360
Using AIF360 for Comprehensive Bias Analysis
# aif360_detection.py - Bias detection with AI Fairness 360
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.algorithms.preprocessing import Reweighing
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
def detect_bias_aif360(
df: pd.DataFrame,
label_column: str,
protected_attribute: str,
favorable_label: int = 1,
unfavorable_label: int = 0
) -> Dict:
"""
Detect bias in dataset using AIF360
Args:
df: DataFrame with features, label, and protected attribute
label_column: Name of label column
protected_attribute: Name of protected attribute column
favorable_label: Favorable outcome value
unfavorable_label: Unfavorable outcome value
Returns:
Dict with bias metrics
"""
# Create AIF360 dataset
dataset = BinaryLabelDataset(
df=df,
label_names=[label_column],
protected_attribute_names=[protected_attribute],
favorable_label=favorable_label,
unfavorable_label=unfavorable_label
)
# Define privileged and unprivileged groups
privileged_groups = [{protected_attribute: 1}]
unprivileged_groups = [{protected_attribute: 0}]
# Calculate bias metrics
metric = BinaryLabelDatasetMetric(
dataset,
unprivileged_groups=unprivileged_groups,
privileged_groups=privileged_groups
)
return {
'disparate_impact': metric.disparate_impact(),
'statistical_parity_difference': metric.statistical_parity_difference(),
'consistency': metric.consistency()[0] if len(metric.consistency()) > 0 else None
}
def train_and_evaluate_fairness(
X_train: pd.DataFrame,
y_train: pd.Series,
X_test: pd.DataFrame,
y_test: pd.Series,
sensitive_attr_train: pd.Series,
sensitive_attr_test: pd.Series
) -> Dict:
"""
Train model and evaluate fairness
Returns:
Dict with accuracy and fairness metrics
"""
# Train model
model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)
# Predict
y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)
# Create AIF360 datasets
dataset_true = BinaryLabelDataset(
df=pd.DataFrame({
**X_test,
'label': y_test,
'sensitive': sensitive_attr_test
}),
label_names=['label'],
protected_attribute_names=['sensitive']
)
dataset_pred = dataset_true.copy()
dataset_pred.labels = y_pred_test.reshape(-1, 1)
# Calculate fairness metrics
classified_metric = ClassificationMetric(
dataset_true,
dataset_pred,
unprivileged_groups=[{'sensitive': 0}],
privileged_groups=[{'sensitive': 1}]
)
return {
'accuracy': (y_pred_test == y_test).mean(),
'disparate_impact': classified_metric.disparate_impact(),
'equal_opportunity_difference': classified_metric.equal_opportunity_difference(),
'average_odds_difference': classified_metric.average_odds_difference(),
'theil_index': classified_metric.theil_index()
}
Bias Mitigation Strategies
1. Reweighing (Pre-processing)
# reweighing.py - Reweight training examples to remove bias
from aif360.datasets import BinaryLabelDataset
from aif360.algorithms.preprocessing import Reweighing
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
def apply_reweighing(
X_train: pd.DataFrame,
y_train: pd.Series,
sensitive_attr_train: pd.Series
) -> Tuple[pd.DataFrame, pd.Series, np.ndarray]:
"""
Apply reweighing to remove bias from training data
Returns:
Reweighted X, y, and sample weights
"""
# Create AIF360 dataset
df_train = pd.DataFrame({
**X_train,
'label': y_train,
'sensitive': sensitive_attr_train
})
dataset = BinaryLabelDataset(
df=df_train,
label_names=['label'],
protected_attribute_names=['sensitive']
)
# Apply reweighing
RW = Reweighing(
unprivileged_groups=[{'sensitive': 0}],
privileged_groups=[{'sensitive': 1}]
)
dataset_transf = RW.fit_transform(dataset)
# Extract reweighted data
weights = dataset_transf.instance_weights
return X_train, y_train, weights
# Train model with reweighted data
def train_fair_model_reweighing(
X_train: pd.DataFrame,
y_train: pd.Series,
sensitive_attr_train: pd.Series
):
"""Train logistic regression with reweighed samples"""
X_train_rw, y_train_rw, weights = apply_reweighing(
X_train, y_train, sensitive_attr_train
)
model = LogisticRegression(max_iter=1000)
model.fit(X_train_rw, y_train_rw, sample_weight=weights)
return model
2. Threshold Optimization (Post-processing)
# threshold_optimization.py - Optimize decision thresholds per group
from fairlearn.postprocessing import ThresholdOptimizer
from sklearn.linear_model import LogisticRegression
import numpy as np
def optimize_thresholds(
X_train: pd.DataFrame,
y_train: pd.Series,
sensitive_attr_train: pd.Series,
X_test: pd.DataFrame,
y_test: pd.Series,
sensitive_attr_test: pd.Series,
constraint: str = "equalized_odds"
):
"""
Optimize decision thresholds to satisfy fairness constraints
Args:
constraint: "demographic_parity" or "equalized_odds"
"""
# Train base model
base_model = LogisticRegression(max_iter=1000)
base_model.fit(X_train, y_train)
# Get predicted probabilities
y_pred_proba = base_model.predict_proba(X_test)[:, 1]
# Optimize thresholds
postprocess_est = ThresholdOptimizer(
estimator=base_model,
constraints=constraint,
objective="balanced_accuracy_score",
prefit=True
)
postprocess_est.fit(X_train, y_train, sensitive_features=sensitive_attr_train)
# Predict with optimized thresholds
y_pred_fair = postprocess_est.predict(X_test, sensitive_features=sensitive_attr_test)
return postprocess_est, y_pred_fair
3. Adversarial Debiasing (In-processing)
# adversarial_debiasing.py - Train fair model with adversarial network
from aif360.algorithms.inprocessing import AdversarialDebiasing
import tensorflow as tf
def train_adversarial_debiased_model(
dataset_train,
dataset_test,
privileged_groups,
unprivileged_groups
):
"""
Train model with adversarial debiasing
Model learns to make predictions while adversary tries to predict protected attribute
Args:
dataset_train: AIF360 BinaryLabelDataset for training
dataset_test: AIF360 BinaryLabelDataset for testing
"""
tf.compat.v1.disable_eager_execution()
# Train debiased model
debiased_model = AdversarialDebiasing(
privileged_groups=privileged_groups,
unprivileged_groups=unprivileged_groups,
scope_name='debiased_classifier',
debias=True,
sess=tf.compat.v1.Session()
)
debiased_model.fit(dataset_train)
# Predict
dataset_pred = debiased_model.predict(dataset_test)
return debiased_model, dataset_pred
Production Best Practices
Fairness-Aware Model Selection
# model_selection_fairness.py - Select model balancing accuracy and fairness
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
import pandas as pd
def evaluate_model_fairness_tradeoff(
models: Dict[str, any],
X_train, y_train, X_test, y_test,
sensitive_attr_test
) -> pd.DataFrame:
"""
Evaluate multiple models on accuracy AND fairness
Returns:
DataFrame with model performance and fairness metrics
"""
results = []
for name, model in models.items():
# Train model
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
# Calculate fairness
fairness = FairnessMetrics(y_test, y_pred, sensitive_attr_test)
dp = fairness.demographic_parity()
eo = fairness.equalized_odds()
results.append({
'model': name,
'accuracy': accuracy,
'demographic_parity_ratio': dp['disparity_ratio'],
'demographic_parity_fair': dp['is_fair'],
'equalized_odds_fair': eo['is_fair'],
'tpr_disparity': eo['tpr_disparity'],
'fpr_disparity': eo['fpr_disparity']
})
return pd.DataFrame(results).sort_values('accuracy', ascending=False)
Continuous Fairness Monitoring
# fairness_monitoring.py - Monitor fairness in production
import json
from datetime import datetime
from typing import Dict
class FairnessMonitor:
"""Monitor fairness metrics in production"""
def __init__(self, log_file: str = "fairness_log.jsonl"):
self.log_file = log_file
def log_prediction(
self,
prediction: int,
true_label: int,
sensitive_attr: str,
model_version: str
):
"""Log individual prediction with metadata"""
entry = {
'timestamp': datetime.now().isoformat(),
'prediction': prediction,
'true_label': true_label,
'sensitive_attr': sensitive_attr,
'model_version': model_version
}
with open(self.log_file, 'a') as f:
f.write(json.dumps(entry) + '\n')
def calculate_daily_fairness(self, date: str) -> Dict:
"""Calculate fairness metrics for a specific day"""
# Load predictions for date
predictions = []
with open(self.log_file, 'r') as f:
for line in f:
entry = json.loads(line)
if entry['timestamp'].startswith(date):
predictions.append(entry)
if not predictions:
return {}
# Extract arrays
y_true = np.array([p['true_label'] for p in predictions])
y_pred = np.array([p['prediction'] for p in predictions])
sensitive = np.array([p['sensitive_attr'] for p in predictions])
# Calculate metrics
metrics = FairnessMetrics(y_true, y_pred, sensitive)
return metrics.calculate_all_metrics()
Best Practices and Limitations
Fairness Checklist
Pre-Development:
- ✅ Define protected attributes (race, gender, age, disability)
- ✅ Choose appropriate fairness metric for use case
- ✅ Get legal review (GDPR, EEOC, EU AI Act compliance)
- ✅ Conduct impact assessment
Data Collection:
- ✅ Ensure representative sampling across demographics
- ✅ Audit historical data for existing biases
- ✅ Document data provenance and collection methods
- ✅ Consider proxy variables (ZIP code → race)
Model Development:
- ✅ Measure fairness across multiple metrics (no single “correct” metric)
- ✅ Evaluate accuracy AND fairness jointly
- ✅ Test for intersectional fairness (race × gender)
- ✅ Use interpretability tools (SHAP, LIME)
Deployment:
- ✅ Monitor fairness metrics continuously
- ✅ Implement feedback loops for bias detection
- ✅ Provide explanations for high-stakes decisions
- ✅ Enable human override for contested decisions
Known Limitations
| Limitation | Impact | Mitigation |
|---|---|---|
| Fairness-Accuracy Tradeoff | Increasing fairness may reduce accuracy | Use Pareto frontier analysis, accept tradeoff for high-stakes domains |
| No Universal Fairness Definition | Demographic parity vs equal opportunity are incompatible | Choose metric appropriate for use case, document choice |
| Simpson’s Paradox | Fair subgroups can be unfair when aggregated | Evaluate fairness at multiple granularities |
| Feedback Loops | Biased predictions create biased future data | Regular model retraining, audit data drift |
| Proxy Discrimination | Non-protected features correlate with protected attributes | Remove correlated features, use adversarial debiasing |
Conclusion and Resources
AI fairness requires continuous effort across the ML lifecycle—from data collection through deployment monitoring. Key takeaways:
- Multiple Metrics: Demographic parity, equalized odds, and equal opportunity measure different notions of fairness
- Mitigation Strategies: Reweighing (pre), adversarial debiasing (in), threshold optimization (post)
- Tools: AI Fairness 360 and Fairlearn provide production-ready implementations
- Monitoring: Fairness metrics must be tracked continuously in production
- Trade-offs: Fairness and accuracy are often in tension—document decisions
Building fair AI systems is both a technical and societal challenge requiring cross-functional collaboration.
Further Resources:
- Fairness and Machine Learning Book: https://fairmlbook.org/ (Barocas, Hardt, Narayanan)
- AI Fairness 360: https://aif360.mybluemix.net/ (IBM Research toolkit)
- Fairlearn: https://fairlearn.org/ (Microsoft fairness toolkit)
- Google’s Responsible AI Practices: https://ai.google/responsibilities/responsible-ai-practices/
- ProPublica COMPAS Analysis: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
- EU AI Act: https://artificialintelligenceact.eu/