AI Fairness in Practice: Detecting and Mitigating Bias in Machine Learning

Note: This guide is based on fairness research including “Fairness and Machine Learning” by Barocas et al., AI Fairness 360 (IBM Research), Fairlearn (Microsoft), and documented case studies from COMPAS recidivism algorithm analysis. All code examples use established fairness metrics and follow industry best practices for responsible AI.

AI bias has real-world consequences: Amazon’s recruiting tool penalized resumes mentioning “women’s” activities, COMPAS criminal risk assessment showed racial disparities, and healthcare algorithms under-allocated resources to Black patients. As ML systems increasingly make high-stakes decisions about loans, jobs, and parole, detecting and mitigating bias is not just ethical—it’s legally required under regulations like GDPR and the EU AI Act.

This guide demonstrates measuring fairness with multiple metrics, detecting bias using AI Fairness 360 and Fairlearn, implementing mitigation strategies (reweighing, adversarial debiasing), and deploying fair models in production.

Prerequisites

Required Knowledge:

  • Python 3.8+ programming
  • Machine learning basics (classification, evaluation metrics)
  • Understanding of protected attributes (race, gender, age)
  • Familiarity with pandas and scikit-learn

Required Tools:

# Install core ML libraries
pip install pandas==2.1.0 numpy==1.24.3 scikit-learn==1.3.0

# Install fairness libraries
pip install aif360==0.5.0  # AI Fairness 360 (IBM)
pip install fairlearn==0.9.0  # Fairlearn (Microsoft)

# Install model interpretation
pip install shap==0.44.0
pip install lime==0.2.0.1

# Install visualization
pip install matplotlib==3.8.0 seaborn==0.12.2

# Install additional tools
pip install imbalanced-learn==0.11.0  # Resampling techniques

Understanding AI Bias

Types of Bias

Bias Type Definition Example
Historical Bias Training data reflects existing societal inequalities Hiring data showing men in senior roles
Representation Bias Training data doesn’t represent real-world distribution Face recognition trained mostly on white faces
Measurement Bias Proxy variables correlate with protected attributes ZIP code as proxy for race in lending
Aggregation Bias One model for all groups ignores subgroup differences Diabetes model trained on adults applied to children
Evaluation Bias Benchmark doesn’t represent deployment population Speech recognition tested only on native speakers
Deployment Bias System used differently than intended Risk assessment designed for bail used for sentencing

Real-World Case Studies

1. COMPAS Recidivism Algorithm (2016)

  • Issue: ProPublica analysis found Black defendants labeled higher risk than white defendants with similar criminal histories
  • Impact: Used to make bail and sentencing decisions affecting thousands
  • Metric Violated: Equalized odds (false positive rates differed by race)

2. Amazon Recruiting Tool (2018)

  • Issue: AI penalized resumes mentioning “women’s” (e.g., “women’s chess club”)
  • Impact: Systematically disadvantaged female candidates
  • Root Cause: Training data from 10 years of resumes (mostly male)

3. Healthcare Resource Allocation (2019)

  • Issue: Algorithm used healthcare costs as proxy for health needs, under-allocated to Black patients
  • Impact: Reduced access to care for sicker Black patients
  • Fix: Changed prediction target from cost to actual health needs

Fairness Metrics

Implementing Core Fairness Metrics

# fairness_metrics.py - Calculate fairness metrics
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from typing import Dict, Tuple

class FairnessMetrics:
    """
    Calculate fairness metrics for classification models
    """

    def __init__(self, y_true: np.ndarray, y_pred: np.ndarray, sensitive_attr: np.ndarray):
        """
        Args:
            y_true: True labels (0/1)
            y_pred: Predicted labels (0/1)
            sensitive_attr: Protected attribute (e.g., race, gender)
        """
        self.y_true = np.array(y_true)
        self.y_pred = np.array(y_pred)
        self.sensitive_attr = np.array(sensitive_attr)

        # Get unique groups
        self.groups = np.unique(sensitive_attr)

    def demographic_parity(self) -> Dict[str, float]:
        """
        Demographic Parity: P(Ŷ=1 | A=0) = P(Ŷ=1 | A=1)
        Positive prediction rate should be same across groups

        Returns:
            Dict with positive rate per group and disparity
        """
        rates = {}
        for group in self.groups:
            mask = self.sensitive_attr == group
            rates[f"group_{group}"] = np.mean(self.y_pred[mask])

        # Calculate maximum disparity
        disparity = max(rates.values()) / min(rates.values()) if min(rates.values()) > 0 else float('inf')

        return {
            **rates,
            'disparity_ratio': disparity,
            'is_fair': 0.8 <= disparity <= 1.25  # 80% rule (EEOC guideline)
        }

    def equalized_odds(self) -> Dict[str, Dict[str, float]]:
        """
        Equalized Odds: P(Ŷ=1 | Y=y, A=a) should be same for all groups
        TPR and FPR should be equal across groups

        Returns:
            Dict with TPR/FPR per group
        """
        metrics = {}

        for group in self.groups:
            mask = self.sensitive_attr == group

            # Get confusion matrix
            tn, fp, fn, tp = confusion_matrix(
                self.y_true[mask],
                self.y_pred[mask]
            ).ravel()

            # Calculate TPR and FPR
            tpr = tp / (tp + fn) if (tp + fn) > 0 else 0  # Sensitivity/Recall
            fpr = fp / (fp + tn) if (fp + tn) > 0 else 0  # False Positive Rate

            metrics[f"group_{group}"] = {
                'true_positive_rate': tpr,
                'false_positive_rate': fpr
            }

        # Calculate disparities
        tprs = [m['true_positive_rate'] for m in metrics.values()]
        fprs = [m['false_positive_rate'] for m in metrics.values()]

        metrics['tpr_disparity'] = max(tprs) - min(tprs)
        metrics['fpr_disparity'] = max(fprs) - min(fprs)
        metrics['is_fair'] = metrics['tpr_disparity'] < 0.1 and metrics['fpr_disparity'] < 0.1

        return metrics

    def equal_opportunity(self) -> Dict[str, float]:
        """
        Equal Opportunity: TPR should be same across groups
        Focuses on true positives (those who should get positive outcome)

        Returns:
            Dict with TPR per group and disparity
        """
        tprs = {}

        for group in self.groups:
            mask = self.sensitive_attr == group

            # Only look at positive class (Y=1)
            positive_mask = mask & (self.y_true == 1)

            if positive_mask.sum() > 0:
                tpr = np.mean(self.y_pred[positive_mask])
                tprs[f"group_{group}"] = tpr
            else:
                tprs[f"group_{group}"] = 0.0

        # Calculate disparity
        disparity = max(tprs.values()) - min(tprs.values())

        return {
            **tprs,
            'disparity': disparity,
            'is_fair': disparity < 0.1
        }

    def predictive_parity(self) -> Dict[str, float]:
        """
        Predictive Parity: PPV (precision) should be same across groups
        P(Y=1 | Ŷ=1, A=a) should be equal

        Returns:
            Dict with PPV per group and disparity
        """
        ppvs = {}

        for group in self.groups:
            mask = self.sensitive_attr == group

            # Only look at positive predictions
            predicted_positive_mask = mask & (self.y_pred == 1)

            if predicted_positive_mask.sum() > 0:
                ppv = np.mean(self.y_true[predicted_positive_mask])
                ppvs[f"group_{group}"] = ppv
            else:
                ppvs[f"group_{group}"] = 0.0

        # Calculate disparity
        disparity = max(ppvs.values()) - min(ppvs.values())

        return {
            **ppvs,
            'disparity': disparity,
            'is_fair': disparity < 0.1
        }

    def calculate_all_metrics(self) -> Dict:
        """Calculate all fairness metrics"""
        return {
            'demographic_parity': self.demographic_parity(),
            'equalized_odds': self.equalized_odds(),
            'equal_opportunity': self.equal_opportunity(),
            'predictive_parity': self.predictive_parity()
        }


# Example usage
if __name__ == "__main__":
    # Simulated predictions
    y_true = np.array([1, 1, 0, 0, 1, 1, 0, 0])
    y_pred = np.array([1, 0, 0, 0, 1, 1, 1, 0])
    sensitive_attr = np.array(['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'])

    metrics = FairnessMetrics(y_true, y_pred, sensitive_attr)
    results = metrics.calculate_all_metrics()

    print("Fairness Analysis:")
    print(f"Demographic Parity: {results['demographic_parity']}")
    print(f"Equalized Odds: {results['equalized_odds']}")
    print(f"Equal Opportunity: {results['equal_opportunity']}")

Bias Detection with AI Fairness 360

Using AIF360 for Comprehensive Bias Analysis

# aif360_detection.py - Bias detection with AI Fairness 360
from aif360.datasets import BinaryLabelDataset
from aif360.metrics import BinaryLabelDatasetMetric, ClassificationMetric
from aif360.algorithms.preprocessing import Reweighing
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression

def detect_bias_aif360(
    df: pd.DataFrame,
    label_column: str,
    protected_attribute: str,
    favorable_label: int = 1,
    unfavorable_label: int = 0
) -> Dict:
    """
    Detect bias in dataset using AIF360

    Args:
        df: DataFrame with features, label, and protected attribute
        label_column: Name of label column
        protected_attribute: Name of protected attribute column
        favorable_label: Favorable outcome value
        unfavorable_label: Unfavorable outcome value

    Returns:
        Dict with bias metrics
    """
    # Create AIF360 dataset
    dataset = BinaryLabelDataset(
        df=df,
        label_names=[label_column],
        protected_attribute_names=[protected_attribute],
        favorable_label=favorable_label,
        unfavorable_label=unfavorable_label
    )

    # Define privileged and unprivileged groups
    privileged_groups = [{protected_attribute: 1}]
    unprivileged_groups = [{protected_attribute: 0}]

    # Calculate bias metrics
    metric = BinaryLabelDatasetMetric(
        dataset,
        unprivileged_groups=unprivileged_groups,
        privileged_groups=privileged_groups
    )

    return {
        'disparate_impact': metric.disparate_impact(),
        'statistical_parity_difference': metric.statistical_parity_difference(),
        'consistency': metric.consistency()[0] if len(metric.consistency()) > 0 else None
    }


def train_and_evaluate_fairness(
    X_train: pd.DataFrame,
    y_train: pd.Series,
    X_test: pd.DataFrame,
    y_test: pd.Series,
    sensitive_attr_train: pd.Series,
    sensitive_attr_test: pd.Series
) -> Dict:
    """
    Train model and evaluate fairness

    Returns:
        Dict with accuracy and fairness metrics
    """
    # Train model
    model = LogisticRegression(max_iter=1000)
    model.fit(X_train, y_train)

    # Predict
    y_pred_train = model.predict(X_train)
    y_pred_test = model.predict(X_test)

    # Create AIF360 datasets
    dataset_true = BinaryLabelDataset(
        df=pd.DataFrame({
            **X_test,
            'label': y_test,
            'sensitive': sensitive_attr_test
        }),
        label_names=['label'],
        protected_attribute_names=['sensitive']
    )

    dataset_pred = dataset_true.copy()
    dataset_pred.labels = y_pred_test.reshape(-1, 1)

    # Calculate fairness metrics
    classified_metric = ClassificationMetric(
        dataset_true,
        dataset_pred,
        unprivileged_groups=[{'sensitive': 0}],
        privileged_groups=[{'sensitive': 1}]
    )

    return {
        'accuracy': (y_pred_test == y_test).mean(),
        'disparate_impact': classified_metric.disparate_impact(),
        'equal_opportunity_difference': classified_metric.equal_opportunity_difference(),
        'average_odds_difference': classified_metric.average_odds_difference(),
        'theil_index': classified_metric.theil_index()
    }

Bias Mitigation Strategies

1. Reweighing (Pre-processing)

# reweighing.py - Reweight training examples to remove bias
from aif360.datasets import BinaryLabelDataset
from aif360.algorithms.preprocessing import Reweighing
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression

def apply_reweighing(
    X_train: pd.DataFrame,
    y_train: pd.Series,
    sensitive_attr_train: pd.Series
) -> Tuple[pd.DataFrame, pd.Series, np.ndarray]:
    """
    Apply reweighing to remove bias from training data

    Returns:
        Reweighted X, y, and sample weights
    """
    # Create AIF360 dataset
    df_train = pd.DataFrame({
        **X_train,
        'label': y_train,
        'sensitive': sensitive_attr_train
    })

    dataset = BinaryLabelDataset(
        df=df_train,
        label_names=['label'],
        protected_attribute_names=['sensitive']
    )

    # Apply reweighing
    RW = Reweighing(
        unprivileged_groups=[{'sensitive': 0}],
        privileged_groups=[{'sensitive': 1}]
    )

    dataset_transf = RW.fit_transform(dataset)

    # Extract reweighted data
    weights = dataset_transf.instance_weights

    return X_train, y_train, weights


# Train model with reweighted data
def train_fair_model_reweighing(
    X_train: pd.DataFrame,
    y_train: pd.Series,
    sensitive_attr_train: pd.Series
):
    """Train logistic regression with reweighed samples"""
    X_train_rw, y_train_rw, weights = apply_reweighing(
        X_train, y_train, sensitive_attr_train
    )

    model = LogisticRegression(max_iter=1000)
    model.fit(X_train_rw, y_train_rw, sample_weight=weights)

    return model

2. Threshold Optimization (Post-processing)

# threshold_optimization.py - Optimize decision thresholds per group
from fairlearn.postprocessing import ThresholdOptimizer
from sklearn.linear_model import LogisticRegression
import numpy as np

def optimize_thresholds(
    X_train: pd.DataFrame,
    y_train: pd.Series,
    sensitive_attr_train: pd.Series,
    X_test: pd.DataFrame,
    y_test: pd.Series,
    sensitive_attr_test: pd.Series,
    constraint: str = "equalized_odds"
):
    """
    Optimize decision thresholds to satisfy fairness constraints

    Args:
        constraint: "demographic_parity" or "equalized_odds"
    """
    # Train base model
    base_model = LogisticRegression(max_iter=1000)
    base_model.fit(X_train, y_train)

    # Get predicted probabilities
    y_pred_proba = base_model.predict_proba(X_test)[:, 1]

    # Optimize thresholds
    postprocess_est = ThresholdOptimizer(
        estimator=base_model,
        constraints=constraint,
        objective="balanced_accuracy_score",
        prefit=True
    )

    postprocess_est.fit(X_train, y_train, sensitive_features=sensitive_attr_train)

    # Predict with optimized thresholds
    y_pred_fair = postprocess_est.predict(X_test, sensitive_features=sensitive_attr_test)

    return postprocess_est, y_pred_fair

3. Adversarial Debiasing (In-processing)

# adversarial_debiasing.py - Train fair model with adversarial network
from aif360.algorithms.inprocessing import AdversarialDebiasing
import tensorflow as tf

def train_adversarial_debiased_model(
    dataset_train,
    dataset_test,
    privileged_groups,
    unprivileged_groups
):
    """
    Train model with adversarial debiasing
    Model learns to make predictions while adversary tries to predict protected attribute

    Args:
        dataset_train: AIF360 BinaryLabelDataset for training
        dataset_test: AIF360 BinaryLabelDataset for testing
    """
    tf.compat.v1.disable_eager_execution()

    # Train debiased model
    debiased_model = AdversarialDebiasing(
        privileged_groups=privileged_groups,
        unprivileged_groups=unprivileged_groups,
        scope_name='debiased_classifier',
        debias=True,
        sess=tf.compat.v1.Session()
    )

    debiased_model.fit(dataset_train)

    # Predict
    dataset_pred = debiased_model.predict(dataset_test)

    return debiased_model, dataset_pred

Production Best Practices

Fairness-Aware Model Selection

# model_selection_fairness.py - Select model balancing accuracy and fairness
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
import pandas as pd

def evaluate_model_fairness_tradeoff(
    models: Dict[str, any],
    X_train, y_train, X_test, y_test,
    sensitive_attr_test
) -> pd.DataFrame:
    """
    Evaluate multiple models on accuracy AND fairness

    Returns:
        DataFrame with model performance and fairness metrics
    """
    results = []

    for name, model in models.items():
        # Train model
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)

        # Calculate accuracy
        accuracy = accuracy_score(y_test, y_pred)

        # Calculate fairness
        fairness = FairnessMetrics(y_test, y_pred, sensitive_attr_test)
        dp = fairness.demographic_parity()
        eo = fairness.equalized_odds()

        results.append({
            'model': name,
            'accuracy': accuracy,
            'demographic_parity_ratio': dp['disparity_ratio'],
            'demographic_parity_fair': dp['is_fair'],
            'equalized_odds_fair': eo['is_fair'],
            'tpr_disparity': eo['tpr_disparity'],
            'fpr_disparity': eo['fpr_disparity']
        })

    return pd.DataFrame(results).sort_values('accuracy', ascending=False)

Continuous Fairness Monitoring

# fairness_monitoring.py - Monitor fairness in production
import json
from datetime import datetime
from typing import Dict

class FairnessMonitor:
    """Monitor fairness metrics in production"""

    def __init__(self, log_file: str = "fairness_log.jsonl"):
        self.log_file = log_file

    def log_prediction(
        self,
        prediction: int,
        true_label: int,
        sensitive_attr: str,
        model_version: str
    ):
        """Log individual prediction with metadata"""
        entry = {
            'timestamp': datetime.now().isoformat(),
            'prediction': prediction,
            'true_label': true_label,
            'sensitive_attr': sensitive_attr,
            'model_version': model_version
        }

        with open(self.log_file, 'a') as f:
            f.write(json.dumps(entry) + '\n')

    def calculate_daily_fairness(self, date: str) -> Dict:
        """Calculate fairness metrics for a specific day"""
        # Load predictions for date
        predictions = []
        with open(self.log_file, 'r') as f:
            for line in f:
                entry = json.loads(line)
                if entry['timestamp'].startswith(date):
                    predictions.append(entry)

        if not predictions:
            return {}

        # Extract arrays
        y_true = np.array([p['true_label'] for p in predictions])
        y_pred = np.array([p['prediction'] for p in predictions])
        sensitive = np.array([p['sensitive_attr'] for p in predictions])

        # Calculate metrics
        metrics = FairnessMetrics(y_true, y_pred, sensitive)
        return metrics.calculate_all_metrics()

Best Practices and Limitations

Fairness Checklist

Pre-Development:

  • ✅ Define protected attributes (race, gender, age, disability)
  • ✅ Choose appropriate fairness metric for use case
  • ✅ Get legal review (GDPR, EEOC, EU AI Act compliance)
  • ✅ Conduct impact assessment

Data Collection:

  • ✅ Ensure representative sampling across demographics
  • ✅ Audit historical data for existing biases
  • ✅ Document data provenance and collection methods
  • ✅ Consider proxy variables (ZIP code → race)

Model Development:

  • ✅ Measure fairness across multiple metrics (no single “correct” metric)
  • ✅ Evaluate accuracy AND fairness jointly
  • ✅ Test for intersectional fairness (race × gender)
  • ✅ Use interpretability tools (SHAP, LIME)

Deployment:

  • ✅ Monitor fairness metrics continuously
  • ✅ Implement feedback loops for bias detection
  • ✅ Provide explanations for high-stakes decisions
  • ✅ Enable human override for contested decisions

Known Limitations

Limitation Impact Mitigation
Fairness-Accuracy Tradeoff Increasing fairness may reduce accuracy Use Pareto frontier analysis, accept tradeoff for high-stakes domains
No Universal Fairness Definition Demographic parity vs equal opportunity are incompatible Choose metric appropriate for use case, document choice
Simpson’s Paradox Fair subgroups can be unfair when aggregated Evaluate fairness at multiple granularities
Feedback Loops Biased predictions create biased future data Regular model retraining, audit data drift
Proxy Discrimination Non-protected features correlate with protected attributes Remove correlated features, use adversarial debiasing

Conclusion and Resources

AI fairness requires continuous effort across the ML lifecycle—from data collection through deployment monitoring. Key takeaways:

  • Multiple Metrics: Demographic parity, equalized odds, and equal opportunity measure different notions of fairness
  • Mitigation Strategies: Reweighing (pre), adversarial debiasing (in), threshold optimization (post)
  • Tools: AI Fairness 360 and Fairlearn provide production-ready implementations
  • Monitoring: Fairness metrics must be tracked continuously in production
  • Trade-offs: Fairness and accuracy are often in tension—document decisions

Building fair AI systems is both a technical and societal challenge requiring cross-functional collaboration.

Further Resources: