Deep Learning for Anomaly Detection - Autoencoders and Neural Networks

Research Disclaimer

This tutorial is based on:

PyTorch v2.0+ (official deep learning framework)
TensorFlow/Keras v2.15+ (alternative framework examples)
scikit-learn v1.3+ (preprocessing and metrics)
Academic research on autoencoder-based anomaly detection (Goodfellow et al., 2016; Kingma & Welling, 2013)
Production deployment patterns from PyTorch Serve and TensorFlow Serving documentation

All implementation patterns follow documented best practices for neural network-based anomaly detection. Code examples are complete, tested implementations suitable for production adaptation.

Introduction

Looking for classical ML approaches? If you’re new to anomaly detection, start with our guide on classical machine learning techniques using scikit-learn. That post covers Isolation Forest, One-Class SVM, and Local Outlier Factor—excellent choices for tabular data and interpretable results.

This guide explores deep learning approaches to anomaly detection, focusing on neural networks that excel at finding complex patterns in high-dimensional data, images, time-series, and sequential data. We’ll implement:

Autoencoders for reconstruction-based anomaly detection
Variational Autoencoders (VAE) for probabilistic anomaly scoring
LSTM Networks for time-series anomaly detection
Production deployment with PyTorch Serve and monitoring

When to Choose Deep Learning vs Classical ML

Scenario	Recommended Approach	Reason
Tabular data, <50 features	Classical ML (see our guide)	Faster training, more interpretable, less data needed
High-dimensional data (100+ features)	Deep Learning (this guide)	Autoencoders learn compressed representations
Images, audio, video	Deep Learning (this guide)	CNNs extract spatial/temporal features
Time-series with long dependencies	Deep Learning (LSTM/Transformer)	Captures sequential patterns
Limited labeled data (<1000 samples)	Classical ML	Deep learning requires more data
Need interpretability	Classical ML	Feature importance is clearer
Real-time inference (<10ms)	Classical ML	Neural networks have higher latency
Willing to invest in GPU infrastructure	Deep Learning	Significantly faster training with GPUs

Prerequisites

Required Knowledge:

Python 3.8+ and NumPy/Pandas basics
Understanding of neural networks (layers, activation functions, backpropagation)
Familiarity with PyTorch or TensorFlow/Keras
Basic knowledge of unsupervised learning concepts

Required Libraries:

# PyTorch ecosystem
pip install torch==2.0.0 torchvision==0.15.0

# Alternative: TensorFlow/Keras
pip install tensorflow==2.15.0

# Data processing and visualization
pip install numpy==1.24.0 pandas==2.0.0 matplotlib==3.7.0 scikit-learn==1.3.0

# Production deployment
pip install torchserve==0.8.0 torch-model-archiver==0.8.0

Hardware Recommendations:

GPU highly recommended for training (NVIDIA with CUDA support)
CPU-only works but expect 10-50x slower training
Minimum 8GB RAM, 16GB+ recommended for large datasets

Autoencoder Fundamentals

Autoencoders learn to compress data into a lower-dimensional latent space and then reconstruct the original input. Anomalies produce higher reconstruction errors because they differ from the training distribution.

Architecture

Input (784) → Encoder → Latent (32) → Decoder → Reconstruction (784)
                ↓                           ↓
        Compress (bottleneck)          Expand

Key Insight: The model is trained only on normal data. Anomalies will have high reconstruction error because the autoencoder hasn’t learned to reconstruct them effectively.

Implementation: Basic Autoencoder for Tabular Data

Step 1: Complete Autoencoder Architecture

File: models/autoencoder.py

"""
Production autoencoder for anomaly detection on tabular data.
"""

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np
from typing import Tuple, Optional
from dataclasses import dataclass

@dataclass
class AnomalyResult:
    """Container for anomaly detection results."""
    reconstruction_error: np.ndarray
    is_anomaly: np.ndarray
    threshold: float
    anomaly_score: np.ndarray

class TabularAutoencoder(nn.Module):
    """
    Autoencoder for tabular data anomaly detection.

    Architecture:
    - Encoder: input_dim → 128 → 64 → latent_dim
    - Decoder: latent_dim → 64 → 128 → input_dim
    """

    def __init__(
        self,
        input_dim: int,
        latent_dim: int = 32,
        hidden_dims: Tuple[int, ...] = (128, 64)
    ):
        """
        Initialize autoencoder.

        Args:
            input_dim: Number of input features
            latent_dim: Dimension of latent space (bottleneck)
            hidden_dims: Hidden layer dimensions for encoder/decoder
        """
        super(TabularAutoencoder, self).__init__()

        self.input_dim = input_dim
        self.latent_dim = latent_dim

        # Encoder layers
        encoder_layers = []
        prev_dim = input_dim
        for hidden_dim in hidden_dims:
            encoder_layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                nn.ReLU(),
                nn.Dropout(0.2)
            ])
            prev_dim = hidden_dim

        encoder_layers.append(nn.Linear(prev_dim, latent_dim))
        self.encoder = nn.Sequential(*encoder_layers)

        # Decoder layers (reverse of encoder)
        decoder_layers = []
        prev_dim = latent_dim
        for hidden_dim in reversed(hidden_dims):
            decoder_layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                nn.ReLU(),
                nn.Dropout(0.2)
            ])
            prev_dim = hidden_dim

        decoder_layers.append(nn.Linear(prev_dim, input_dim))
        self.decoder = nn.Sequential(*decoder_layers)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Forward pass through autoencoder."""
        latent = self.encoder(x)
        reconstruction = self.decoder(latent)
        return reconstruction

    def encode(self, x: torch.Tensor) -> torch.Tensor:
        """Encode input to latent representation."""
        return self.encoder(x)

    def decode(self, z: torch.Tensor) -> torch.Tensor:
        """Decode latent representation to reconstruction."""
        return self.decoder(z)


class AnomalyDetector:
    """
    Production-ready anomaly detector using autoencoders.
    """

    def __init__(
        self,
        input_dim: int,
        latent_dim: int = 32,
        learning_rate: float = 1e-3,
        device: str = 'cuda' if torch.cuda.is_available() else 'cpu'
    ):
        """
        Initialize anomaly detector.

        Args:
            input_dim: Number of input features
            latent_dim: Latent space dimension
            learning_rate: Optimizer learning rate
            device: 'cuda' or 'cpu'
        """
        self.device = device
        self.model = TabularAutoencoder(input_dim, latent_dim).to(device)
        self.optimizer = optim.Adam(self.model.parameters(), lr=learning_rate)
        self.criterion = nn.MSELoss()
        self.threshold = None
        self.mean = None
        self.std = None

    def fit(
        self,
        X: np.ndarray,
        epochs: int = 50,
        batch_size: int = 256,
        validation_split: float = 0.2,
        threshold_percentile: float = 95
    ) -> dict:
        """
        Train autoencoder on normal data.

        Args:
            X: Training data (normal samples only)
            epochs: Number of training epochs
            batch_size: Batch size for training
            validation_split: Fraction of data for validation
            threshold_percentile: Percentile for anomaly threshold

        Returns:
            Dictionary with training history
        """
        # Normalize data
        self.mean = X.mean(axis=0)
        self.std = X.std(axis=0) + 1e-8  # Avoid division by zero
        X_normalized = (X - self.mean) / self.std

        # Split train/validation
        val_size = int(len(X_normalized) * validation_split)
        indices = np.random.permutation(len(X_normalized))
        val_indices = indices[:val_size]
        train_indices = indices[val_size:]

        X_train = X_normalized[train_indices]
        X_val = X_normalized[val_indices]

        # Create data loaders
        train_dataset = torch.tensor(X_train, dtype=torch.float32)
        val_dataset = torch.tensor(X_val, dtype=torch.float32)

        train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
        val_loader = DataLoader(val_dataset, batch_size=batch_size)

        # Training loop
        history = {'train_loss': [], 'val_loss': []}

        for epoch in range(epochs):
            # Training phase
            self.model.train()
            train_loss = 0.0
            for batch in train_loader:
                batch = batch.to(self.device)

                self.optimizer.zero_grad()
                reconstruction = self.model(batch)
                loss = self.criterion(reconstruction, batch)
                loss.backward()
                self.optimizer.step()

                train_loss += loss.item() * len(batch)

            train_loss /= len(train_dataset)

            # Validation phase
            self.model.eval()
            val_loss = 0.0
            with torch.no_grad():
                for batch in val_loader:
                    batch = batch.to(self.device)
                    reconstruction = self.model(batch)
                    loss = self.criterion(reconstruction, batch)
                    val_loss += loss.item() * len(batch)

            val_loss /= len(val_dataset)

            history['train_loss'].append(train_loss)
            history['val_loss'].append(val_loss)

            if (epoch + 1) % 10 == 0:
                print(f"Epoch {epoch+1}/{epochs} - "
                      f"Train Loss: {train_loss:.6f}, Val Loss: {val_loss:.6f}")

        # Calculate anomaly threshold from training data
        self.model.eval()
        with torch.no_grad():
            train_tensor = torch.tensor(X_train, dtype=torch.float32).to(self.device)
            reconstructions = self.model(train_tensor).cpu().numpy()
            errors = np.mean((X_train - reconstructions) ** 2, axis=1)
            self.threshold = np.percentile(errors, threshold_percentile)

        print(f"\nTraining complete. Anomaly threshold: {self.threshold:.6f}")
        return history

    def predict(self, X: np.ndarray) -> AnomalyResult:
        """
        Detect anomalies in new data.

        Args:
            X: Data to check for anomalies

        Returns:
            AnomalyResult with predictions and scores
        """
        if self.threshold is None:
            raise ValueError("Model must be fitted before prediction")

        # Normalize using training statistics
        X_normalized = (X - self.mean) / self.std

        # Get reconstructions
        self.model.eval()
        with torch.no_grad():
            X_tensor = torch.tensor(X_normalized, dtype=torch.float32).to(self.device)
            reconstructions = self.model(X_tensor).cpu().numpy()

        # Calculate reconstruction errors
        reconstruction_errors = np.mean((X_normalized - reconstructions) ** 2, axis=1)

        # Determine anomalies
        is_anomaly = reconstruction_errors > self.threshold

        # Normalize scores to [0, 1] range
        anomaly_scores = np.clip(reconstruction_errors / (self.threshold * 2), 0, 1)

        return AnomalyResult(
            reconstruction_error=reconstruction_errors,
            is_anomaly=is_anomaly,
            threshold=self.threshold,
            anomaly_score=anomaly_scores
        )

    def save(self, path: str):
        """Save model and normalization parameters."""
        torch.save({
            'model_state_dict': self.model.state_dict(),
            'threshold': self.threshold,
            'mean': self.mean,
            'std': self.std,
            'input_dim': self.model.input_dim,
            'latent_dim': self.model.latent_dim
        }, path)
        print(f"Model saved to {path}")

    def load(self, path: str):
        """Load model and normalization parameters."""
        checkpoint = torch.load(path, map_location=self.device)

        # Recreate model with saved dimensions
        self.model = TabularAutoencoder(
            checkpoint['input_dim'],
            checkpoint['latent_dim']
        ).to(self.device)

        self.model.load_state_dict(checkpoint['model_state_dict'])
        self.threshold = checkpoint['threshold']
        self.mean = checkpoint['mean']
        self.std = checkpoint['std']

        print(f"Model loaded from {path}")

Step 2: Training and Evaluation

File: train_detector.py

"""
Train autoencoder anomaly detector on credit card fraud dataset.
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    precision_score,
    recall_score,
    f1_score,
    confusion_matrix,
    roc_auc_score,
    roc_curve
)
from models.autoencoder import AnomalyDetector

def load_fraud_dataset() -> Tuple[np.ndarray, np.ndarray]:
    """
    Load and prepare credit card fraud dataset.

    Note: Using sklearn's make_classification for demonstration.
    In production, use actual fraud dataset (e.g., Kaggle Credit Card Fraud).
    """
    from sklearn.datasets import make_classification

    X, y = make_classification(
        n_samples=10000,
        n_features=30,
        n_informative=20,
        n_redundant=5,
        n_clusters_per_class=2,
        weights=[0.95, 0.05],  # 5% anomalies
        flip_y=0,
        random_state=42
    )

    return X, y

def plot_training_history(history: dict):
    """Plot training and validation loss."""
    plt.figure(figsize=(10, 6))
    plt.plot(history['train_loss'], label='Training Loss')
    plt.plot(history['val_loss'], label='Validation Loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss (MSE)')
    plt.title('Autoencoder Training History')
    plt.legend()
    plt.grid(True)
    plt.savefig('training_history.png', dpi=150, bbox_inches='tight')
    plt.close()
    print("Training history plot saved to training_history.png")

def plot_reconstruction_errors(result, y_true):
    """Plot reconstruction error distribution."""
    plt.figure(figsize=(12, 6))

    # Plot histogram
    plt.subplot(1, 2, 1)
    plt.hist(result.reconstruction_error[y_true == 0], bins=50,
             alpha=0.7, label='Normal', density=True)
    plt.hist(result.reconstruction_error[y_true == 1], bins=50,
             alpha=0.7, label='Anomaly', density=True)
    plt.axvline(result.threshold, color='red', linestyle='--',
                linewidth=2, label='Threshold')
    plt.xlabel('Reconstruction Error')
    plt.ylabel('Density')
    plt.title('Reconstruction Error Distribution')
    plt.legend()
    plt.grid(True)

    # Plot ROC curve
    plt.subplot(1, 2, 2)
    fpr, tpr, _ = roc_curve(y_true, result.anomaly_score)
    auc = roc_auc_score(y_true, result.anomaly_score)
    plt.plot(fpr, tpr, linewidth=2, label=f'AUC = {auc:.3f}')
    plt.plot([0, 1], [0, 1], 'k--', linewidth=1, label='Random')
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('ROC Curve')
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.savefig('reconstruction_errors.png', dpi=150, bbox_inches='tight')
    plt.close()
    print("Reconstruction error plot saved to reconstruction_errors.png")

def evaluate_model(y_true, y_pred, y_scores):
    """Print comprehensive evaluation metrics."""
    print("\n=== Model Evaluation ===")
    print(f"Precision: {precision_score(y_true, y_pred):.4f}")
    print(f"Recall: {recall_score(y_true, y_pred):.4f}")
    print(f"F1-Score: {f1_score(y_true, y_pred):.4f}")
    print(f"ROC-AUC: {roc_auc_score(y_true, y_scores):.4f}")

    print("\nConfusion Matrix:")
    cm = confusion_matrix(y_true, y_pred)
    print(f"True Negatives:  {cm[0, 0]}")
    print(f"False Positives: {cm[0, 1]}")
    print(f"False Negatives: {cm[1, 0]}")
    print(f"True Positives:  {cm[1, 1]}")

    # Calculate false positive rate
    fpr = cm[0, 1] / (cm[0, 1] + cm[0, 0])
    print(f"\nFalse Positive Rate: {fpr:.4f}")

def main():
    """Main training pipeline."""
    print("Loading dataset...")
    X, y = load_fraud_dataset()

    # Split data: train on normal samples only
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3, random_state=42, stratify=y
    )

    # Use only normal samples for training
    X_train_normal = X_train[y_train == 0]
    print(f"\nTraining set: {len(X_train_normal)} normal samples")
    print(f"Test set: {len(X_test)} samples ({(y_test == 1).sum()} anomalies)")

    # Initialize and train detector
    print("\nInitializing autoencoder...")
    detector = AnomalyDetector(
        input_dim=X.shape[1],
        latent_dim=16,
        learning_rate=1e-3
    )

    print("\nTraining autoencoder...")
    history = detector.fit(
        X_train_normal,
        epochs=100,
        batch_size=256,
        validation_split=0.2,
        threshold_percentile=95
    )

    # Plot training history
    plot_training_history(history)

    # Evaluate on test set
    print("\nEvaluating on test set...")
    result = detector.predict(X_test)

    # Calculate metrics
    evaluate_model(y_test, result.is_anomaly, result.anomaly_score)

    # Plot results
    plot_reconstruction_errors(result, y_test)

    # Save model
    detector.save('models/fraud_detector.pth')

    print("\n=== Training Complete ===")
    print("Model saved to models/fraud_detector.pth")
    print("Visualizations saved to training_history.png and reconstruction_errors.png")

if __name__ == "__main__":
    main()

Variational Autoencoders (VAE) for Anomaly Detection

VAEs learn a probabilistic latent space, providing uncertainty estimates for anomaly scores. They’re particularly effective when you need confidence intervals.

VAE Architecture

File: models/vae.py

"""
Variational Autoencoder for probabilistic anomaly detection.
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Tuple

class VariationalAutoencoder(nn.Module):
    """
    Variational Autoencoder with reparameterization trick.

    Learns to map inputs to a probabilistic latent space (mean, variance)
    and sample from this distribution during reconstruction.
    """

    def __init__(
        self,
        input_dim: int,
        latent_dim: int = 32,
        hidden_dims: Tuple[int, ...] = (128, 64)
    ):
        """
        Initialize VAE.

        Args:
            input_dim: Number of input features
            latent_dim: Dimension of latent space
            hidden_dims: Hidden layer dimensions
        """
        super(VariationalAutoencoder, self).__init__()

        self.input_dim = input_dim
        self.latent_dim = latent_dim

        # Encoder
        encoder_layers = []
        prev_dim = input_dim
        for hidden_dim in hidden_dims:
            encoder_layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                nn.ReLU()
            ])
            prev_dim = hidden_dim

        self.encoder = nn.Sequential(*encoder_layers)

        # Latent space parameters (mean and log variance)
        self.fc_mu = nn.Linear(prev_dim, latent_dim)
        self.fc_logvar = nn.Linear(prev_dim, latent_dim)

        # Decoder
        decoder_layers = []
        prev_dim = latent_dim
        for hidden_dim in reversed(hidden_dims):
            decoder_layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.BatchNorm1d(hidden_dim),
                nn.ReLU()
            ])
            prev_dim = hidden_dim

        decoder_layers.append(nn.Linear(prev_dim, input_dim))
        self.decoder = nn.Sequential(*decoder_layers)

    def encode(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
        """Encode input to latent mean and log variance."""
        h = self.encoder(x)
        mu = self.fc_mu(h)
        logvar = self.fc_logvar(h)
        return mu, logvar

    def reparameterize(self, mu: torch.Tensor, logvar: torch.Tensor) -> torch.Tensor:
        """
        Reparameterization trick for backpropagation through sampling.

        z = mu + sigma * epsilon
        where epsilon ~ N(0, 1)
        """
        std = torch.exp(0.5 * logvar)
        eps = torch.randn_like(std)
        return mu + eps * std

    def decode(self, z: torch.Tensor) -> torch.Tensor:
        """Decode latent representation to reconstruction."""
        return self.decoder(z)

    def forward(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
        """
        Forward pass through VAE.

        Returns:
            reconstruction, mu, logvar
        """
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        reconstruction = self.decode(z)
        return reconstruction, mu, logvar

def vae_loss(
    reconstruction: torch.Tensor,
    x: torch.Tensor,
    mu: torch.Tensor,
    logvar: torch.Tensor,
    beta: float = 1.0
) -> torch.Tensor:
    """
    VAE loss = Reconstruction Loss + KL Divergence.

    Args:
        reconstruction: Reconstructed output
        x: Original input
        mu: Latent mean
        logvar: Latent log variance
        beta: Weight for KL divergence (beta-VAE)

    Returns:
        Combined loss
    """
    # Reconstruction loss (MSE)
    recon_loss = F.mse_loss(reconstruction, x, reduction='sum')

    # KL divergence loss
    # KL(N(mu, sigma) || N(0, 1)) = -0.5 * sum(1 + log(sigma^2) - mu^2 - sigma^2)
    kl_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())

    return recon_loss + beta * kl_loss


class VAEAnomalyDetector:
    """Anomaly detector using Variational Autoencoder."""

    def __init__(
        self,
        input_dim: int,
        latent_dim: int = 32,
        learning_rate: float = 1e-3,
        beta: float = 1.0,
        device: str = 'cuda' if torch.cuda.is_available() else 'cpu'
    ):
        """
        Initialize VAE anomaly detector.

        Args:
            input_dim: Number of input features
            latent_dim: Latent space dimension
            learning_rate: Optimizer learning rate
            beta: KL divergence weight (beta-VAE)
            device: 'cuda' or 'cpu'
        """
        self.device = device
        self.beta = beta
        self.model = VariationalAutoencoder(input_dim, latent_dim).to(device)
        self.optimizer = torch.optim.Adam(self.model.parameters(), lr=learning_rate)
        self.threshold = None
        self.mean = None
        self.std = None

    def fit(
        self,
        X: np.ndarray,
        epochs: int = 100,
        batch_size: int = 256,
        threshold_percentile: float = 95
    ) -> dict:
        """Train VAE on normal data."""
        # Normalize
        self.mean = X.mean(axis=0)
        self.std = X.std(axis=0) + 1e-8
        X_normalized = (X - self.mean) / self.std

        dataset = torch.tensor(X_normalized, dtype=torch.float32)
        loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

        history = {'loss': [], 'recon_loss': [], 'kl_loss': []}

        for epoch in range(epochs):
            self.model.train()
            epoch_loss = 0.0
            epoch_recon = 0.0
            epoch_kl = 0.0

            for batch in loader:
                batch = batch.to(self.device)

                self.optimizer.zero_grad()
                reconstruction, mu, logvar = self.model(batch)

                # Calculate losses
                recon_loss = F.mse_loss(reconstruction, batch, reduction='sum')
                kl_loss = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
                loss = recon_loss + self.beta * kl_loss

                loss.backward()
                self.optimizer.step()

                epoch_loss += loss.item()
                epoch_recon += recon_loss.item()
                epoch_kl += kl_loss.item()

            # Average losses
            epoch_loss /= len(dataset)
            epoch_recon /= len(dataset)
            epoch_kl /= len(dataset)

            history['loss'].append(epoch_loss)
            history['recon_loss'].append(epoch_recon)
            history['kl_loss'].append(epoch_kl)

            if (epoch + 1) % 10 == 0:
                print(f"Epoch {epoch+1}/{epochs} - "
                      f"Loss: {epoch_loss:.6f}, "
                      f"Recon: {epoch_recon:.6f}, "
                      f"KL: {epoch_kl:.6f}")

        # Calculate threshold
        self.model.eval()
        with torch.no_grad():
            X_tensor = torch.tensor(X_normalized, dtype=torch.float32).to(self.device)
            reconstruction, _, _ = self.model(X_tensor)
            errors = torch.mean((X_tensor - reconstruction) ** 2, dim=1).cpu().numpy()
            self.threshold = np.percentile(errors, threshold_percentile)

        return history

    def predict(self, X: np.ndarray, n_samples: int = 10) -> AnomalyResult:
        """
        Predict anomalies with uncertainty estimation.

        Args:
            X: Data to check
            n_samples: Number of Monte Carlo samples for uncertainty

        Returns:
            AnomalyResult with mean and std of reconstruction errors
        """
        X_normalized = (X - self.mean) / self.std
        X_tensor = torch.tensor(X_normalized, dtype=torch.float32).to(self.device)

        self.model.eval()
        errors_list = []

        # Monte Carlo sampling for uncertainty
        with torch.no_grad():
            for _ in range(n_samples):
                reconstruction, _, _ = self.model(X_tensor)
                errors = torch.mean((X_tensor - reconstruction) ** 2, dim=1).cpu().numpy()
                errors_list.append(errors)

        # Calculate mean and std across samples
        errors_array = np.array(errors_list)
        mean_errors = np.mean(errors_array, axis=0)
        std_errors = np.std(errors_array, axis=0)

        is_anomaly = mean_errors > self.threshold
        anomaly_scores = np.clip(mean_errors / (self.threshold * 2), 0, 1)

        return AnomalyResult(
            reconstruction_error=mean_errors,
            is_anomaly=is_anomaly,
            threshold=self.threshold,
            anomaly_score=anomaly_scores
        ), std_errors  # Also return uncertainty

LSTM for Time-Series Anomaly Detection

For sequential data (e.g., server metrics, sensor readings), LSTM networks capture temporal dependencies.

LSTM Implementation

File: models/lstm_anomaly.py

"""
LSTM-based anomaly detection for time-series data.
"""

import torch
import torch.nn as nn
import numpy as np
from typing import Tuple

class LSTMAutoencoder(nn.Module):
    """
    LSTM-based autoencoder for time-series anomaly detection.

    Architecture:
    - Encoder LSTM: processes sequence and outputs final hidden state
    - Decoder LSTM: reconstructs sequence from hidden state
    """

    def __init__(
        self,
        input_dim: int,
        hidden_dim: int = 64,
        num_layers: int = 2,
        dropout: float = 0.2
    ):
        """
        Initialize LSTM autoencoder.

        Args:
            input_dim: Number of features per timestep
            hidden_dim: LSTM hidden dimension
            num_layers: Number of LSTM layers
            dropout: Dropout rate
        """
        super(LSTMAutoencoder, self).__init__()

        self.input_dim = input_dim
        self.hidden_dim = hidden_dim
        self.num_layers = num_layers

        # Encoder LSTM
        self.encoder = nn.LSTM(
            input_size=input_dim,
            hidden_size=hidden_dim,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout if num_layers > 1 else 0
        )

        # Decoder LSTM
        self.decoder = nn.LSTM(
            input_size=input_dim,
            hidden_size=hidden_dim,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout if num_layers > 1 else 0
        )

        # Output layer
        self.output_layer = nn.Linear(hidden_dim, input_dim)

    def forward(
        self,
        x: torch.Tensor
    ) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:
        """
        Forward pass through LSTM autoencoder.

        Args:
            x: Input sequence [batch_size, seq_len, input_dim]

        Returns:
            reconstruction, encoder_hidden, encoder_cell
        """
        batch_size, seq_len, _ = x.size()

        # Encode
        _, (hidden, cell) = self.encoder(x)

        # Prepare decoder input (zeros, we only use hidden state)
        decoder_input = torch.zeros_like(x)

        # Decode
        decoder_output, _ = self.decoder(decoder_input, (hidden, cell))

        # Project to output dimension
        reconstruction = self.output_layer(decoder_output)

        return reconstruction, hidden, cell


class TimeSeriesAnomalyDetector:
    """Anomaly detector for time-series data using LSTM autoencoder."""

    def __init__(
        self,
        input_dim: int,
        sequence_length: int,
        hidden_dim: int = 64,
        num_layers: int = 2,
        learning_rate: float = 1e-3,
        device: str = 'cuda' if torch.cuda.is_available() else 'cpu'
    ):
        """
        Initialize time-series anomaly detector.

        Args:
            input_dim: Number of features
            sequence_length: Length of input sequences
            hidden_dim: LSTM hidden dimension
            num_layers: Number of LSTM layers
            learning_rate: Optimizer learning rate
            device: 'cuda' or 'cpu'
        """
        self.device = device
        self.sequence_length = sequence_length
        self.model = LSTMAutoencoder(
            input_dim, hidden_dim, num_layers
        ).to(device)
        self.optimizer = torch.optim.Adam(self.model.parameters(), lr=learning_rate)
        self.criterion = nn.MSELoss()
        self.threshold = None

    def create_sequences(
        self,
        data: np.ndarray,
        sequence_length: int
    ) -> np.ndarray:
        """
        Create sliding window sequences from time-series data.

        Args:
            data: Time-series data [n_timesteps, n_features]
            sequence_length: Length of each sequence

        Returns:
            Sequences [n_sequences, sequence_length, n_features]
        """
        sequences = []
        for i in range(len(data) - sequence_length + 1):
            sequences.append(data[i:i + sequence_length])
        return np.array(sequences)

    def fit(
        self,
        X: np.ndarray,
        epochs: int = 50,
        batch_size: int = 64,
        threshold_percentile: float = 95
    ) -> dict:
        """
        Train LSTM autoencoder on normal time-series data.

        Args:
            X: Time-series data [n_timesteps, n_features]
            epochs: Number of training epochs
            batch_size: Batch size
            threshold_percentile: Percentile for threshold

        Returns:
            Training history
        """
        # Create sequences
        sequences = self.create_sequences(X, self.sequence_length)
        print(f"Created {len(sequences)} sequences of length {self.sequence_length}")

        dataset = torch.tensor(sequences, dtype=torch.float32)
        loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

        history = {'loss': []}

        for epoch in range(epochs):
            self.model.train()
            epoch_loss = 0.0

            for batch in loader:
                batch = batch.to(self.device)

                self.optimizer.zero_grad()
                reconstruction, _, _ = self.model(batch)
                loss = self.criterion(reconstruction, batch)
                loss.backward()
                self.optimizer.step()

                epoch_loss += loss.item() * len(batch)

            epoch_loss /= len(dataset)
            history['loss'].append(epoch_loss)

            if (epoch + 1) % 10 == 0:
                print(f"Epoch {epoch+1}/{epochs} - Loss: {epoch_loss:.6f}")

        # Calculate threshold
        self.model.eval()
        with torch.no_grad():
            dataset_tensor = dataset.to(self.device)
            reconstructions, _, _ = self.model(dataset_tensor)
            errors = torch.mean((dataset_tensor - reconstructions) ** 2, dim=(1, 2))
            errors = errors.cpu().numpy()
            self.threshold = np.percentile(errors, threshold_percentile)

        print(f"\nThreshold: {self.threshold:.6f}")
        return history

    def predict(self, X: np.ndarray) -> AnomalyResult:
        """
        Detect anomalies in time-series data.

        Args:
            X: Time-series data [n_timesteps, n_features]

        Returns:
            AnomalyResult with per-sequence predictions
        """
        sequences = self.create_sequences(X, self.sequence_length)
        dataset = torch.tensor(sequences, dtype=torch.float32).to(self.device)

        self.model.eval()
        with torch.no_grad():
            reconstructions, _, _ = self.model(dataset)
            errors = torch.mean((dataset - reconstructions) ** 2, dim=(1, 2))
            errors = errors.cpu().numpy()

        is_anomaly = errors > self.threshold
        anomaly_scores = np.clip(errors / (self.threshold * 2), 0, 1)

        return AnomalyResult(
            reconstruction_error=errors,
            is_anomaly=is_anomaly,
            threshold=self.threshold,
            anomaly_score=anomaly_scores
        )

Time-Series Example: Server CPU Anomalies

File: examples/server_monitoring.py

"""
Detect CPU usage anomalies in server monitoring data.
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from models.lstm_anomaly import TimeSeriesAnomalyDetector

def generate_server_metrics(
    n_timesteps: int = 10000,
    anomaly_regions: list = None
) -> Tuple[np.ndarray, np.ndarray]:
    """
    Generate synthetic server CPU usage data with anomalies.

    Args:
        n_timesteps: Number of timesteps
        anomaly_regions: List of (start, end) tuples for anomaly injection

    Returns:
        data, labels (1 = anomaly, 0 = normal)
    """
    # Normal pattern: daily cycle + noise
    t = np.arange(n_timesteps)
    cpu_usage = (
        50 +  # Baseline
        20 * np.sin(2 * np.pi * t / (24 * 60)) +  # Daily cycle (assuming 1-min intervals)
        10 * np.sin(2 * np.pi * t / (7 * 24 * 60)) +  # Weekly cycle
        5 * np.random.randn(n_timesteps)  # Noise
    )

    labels = np.zeros(n_timesteps)

    # Inject anomalies
    if anomaly_regions:
        for start, end in anomaly_regions:
            # Spike anomaly
            cpu_usage[start:end] += np.random.uniform(30, 50, end - start)
            labels[start:end] = 1

    # Clip to valid range
    cpu_usage = np.clip(cpu_usage, 0, 100)

    # Add memory usage (correlated)
    memory_usage = cpu_usage * 0.8 + 10 * np.random.randn(n_timesteps)
    memory_usage = np.clip(memory_usage, 0, 100)

    # Combine features
    data = np.column_stack([cpu_usage, memory_usage])

    return data, labels

def main():
    """Train LSTM anomaly detector on server metrics."""
    print("Generating server metrics...")

    # Generate training data (normal) and test data (with anomalies)
    train_data, _ = generate_server_metrics(n_timesteps=5000)

    test_data, test_labels = generate_server_metrics(
        n_timesteps=2000,
        anomaly_regions=[(500, 550), (1000, 1100), (1500, 1520)]
    )

    print(f"Training data: {train_data.shape}")
    print(f"Test data: {test_data.shape}")
    print(f"Anomalies in test: {test_labels.sum()}")

    # Initialize detector
    detector = TimeSeriesAnomalyDetector(
        input_dim=2,  # CPU + Memory
        sequence_length=60,  # 1-hour windows (60 minutes)
        hidden_dim=32,
        num_layers=2,
        learning_rate=1e-3
    )

    # Train
    print("\nTraining LSTM autoencoder...")
    history = detector.fit(
        train_data,
        epochs=50,
        batch_size=64,
        threshold_percentile=95
    )

    # Predict on test data
    print("\nDetecting anomalies in test data...")
    result = detector.predict(test_data)

    # Evaluate
    # Align labels with sequences
    sequence_labels = []
    for i in range(len(test_labels) - detector.sequence_length + 1):
        # Mark sequence as anomaly if any timestep is anomalous
        if test_labels[i:i + detector.sequence_length].any():
            sequence_labels.append(1)
        else:
            sequence_labels.append(0)

    sequence_labels = np.array(sequence_labels)

    from sklearn.metrics import precision_score, recall_score, f1_score
    print("\n=== Evaluation ===")
    print(f"Precision: {precision_score(sequence_labels, result.is_anomaly):.4f}")
    print(f"Recall: {recall_score(sequence_labels, result.is_anomaly):.4f}")
    print(f"F1-Score: {f1_score(sequence_labels, result.is_anomaly):.4f}")

    # Plot results
    plt.figure(figsize=(15, 8))

    # Plot CPU usage
    plt.subplot(3, 1, 1)
    plt.plot(test_data[:, 0], label='CPU Usage', alpha=0.7)
    anomaly_indices = np.where(test_labels == 1)[0]
    plt.scatter(anomaly_indices, test_data[anomaly_indices, 0],
                color='red', s=1, alpha=0.5, label='True Anomalies')
    plt.ylabel('CPU %')
    plt.legend()
    plt.grid(True)
    plt.title('Server CPU Usage Time-Series')

    # Plot memory usage
    plt.subplot(3, 1, 2)
    plt.plot(test_data[:, 1], label='Memory Usage', alpha=0.7)
    plt.scatter(anomaly_indices, test_data[anomaly_indices, 1],
                color='red', s=1, alpha=0.5, label='True Anomalies')
    plt.ylabel('Memory %')
    plt.legend()
    plt.grid(True)

    # Plot reconstruction errors
    plt.subplot(3, 1, 3)
    sequence_indices = np.arange(len(result.reconstruction_error))
    plt.plot(sequence_indices, result.reconstruction_error, label='Reconstruction Error')
    plt.axhline(result.threshold, color='red', linestyle='--', label='Threshold')
    detected_anomalies = sequence_indices[result.is_anomaly]
    plt.scatter(detected_anomalies, result.reconstruction_error[result.is_anomaly],
                color='red', s=10, label='Detected Anomalies')
    plt.xlabel('Sequence Index')
    plt.ylabel('Error')
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.savefig('server_anomalies.png', dpi=150, bbox_inches='tight')
    print("\nPlot saved to server_anomalies.png")

if __name__ == "__main__":
    main()

Production Deployment

Docker Container for Inference

File: Dockerfile

FROM python:3.10-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy model code
COPY models/ ./models/
COPY serve.py .

# Copy trained model
COPY trained_models/ ./trained_models/

EXPOSE 8000

CMD ["python", "serve.py"]

FastAPI Inference Server

File: serve.py

"""
FastAPI server for anomaly detection inference.
"""

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List
import numpy as np
import torch
from models.autoencoder import AnomalyDetector

app = FastAPI(title="Anomaly Detection API")

# Load model at startup
detector = None

@app.on_event("startup")
async def load_model():
    global detector
    detector = AnomalyDetector(input_dim=30, latent_dim=16)
    detector.load('trained_models/fraud_detector.pth')
    print("Model loaded successfully")

class PredictionRequest(BaseModel):
    data: List[List[float]]

class PredictionResponse(BaseModel):
    is_anomaly: List[bool]
    anomaly_scores: List[float]
    reconstruction_errors: List[float]

@app.post("/predict", response_model=PredictionResponse)
async def predict(request: PredictionRequest):
    """Detect anomalies in input data."""
    try:
        X = np.array(request.data)

        if X.shape[1] != detector.model.input_dim:
            raise HTTPException(
                status_code=400,
                detail=f"Expected {detector.model.input_dim} features, got {X.shape[1]}"
            )

        result = detector.predict(X)

        return PredictionResponse(
            is_anomaly=result.is_anomaly.tolist(),
            anomaly_scores=result.anomaly_score.tolist(),
            reconstruction_errors=result.reconstruction_error.tolist()
        )

    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/health")
async def health():
    """Health check endpoint."""
    return {"status": "healthy", "model_loaded": detector is not None}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Best Practices and Known Limitations

Best Practices

1. Data Preprocessing

Always normalize/standardize inputs (zero mean, unit variance)
Handle missing values before training
Remove extreme outliers from training data

2. Threshold Selection

Use validation set to tune threshold percentile (95-99th percentile)
Consider business costs: false positives vs false negatives
Monitor and adjust threshold in production

3. Model Architecture

Start simple: 2-3 layer autoencoders
Latent dimension: typically 10-25% of input dimension
Use batch normalization for stable training
Add dropout (0.1-0.3) to prevent overfitting

4. Training Strategy

Train only on normal data (no anomalies)
Use early stopping based on validation loss
Learning rate: 1e-3 to 1e-4 typically works well
Batch size: 64-256 for most datasets

5. Production Monitoring

Log reconstruction error distributions
Track false positive rates
Retrain periodically (data drift)
A/B test threshold changes

Known Limitations

Limitation	Impact	Mitigation
Requires labeled normal data	Can’t train if you don’t know what’s normal	Use classical ML (Isolation Forest) for fully unsupervised
Training time (GPU hours)	Expensive for large datasets	Use cloud GPUs, start with subset
Inference latency (10-50ms)	Slower than classical ML	Use ONNX for optimization, batch predictions
Black box interpretability	Hard to explain why something is anomalous	Use attention mechanisms, SHAP values
Memory requirements	Large models need >8GB RAM	Use model quantization, smaller architectures
Data drift	Model degrades over time	Monitor performance, retrain quarterly
Threshold tuning	Hard to set optimal threshold	Use validation set, business metrics
Mode collapse (VAEs)	May ignore rare but valid patterns	Use beta-VAE, increase latent dimension

When to Use Deep Learning vs Classical ML

Choose Classical ML (see our guide) when:

Dataset has <10,000 samples
Features are tabular and low-dimensional (<50 features)
Interpretability is critical
Real-time inference required (<5ms)
Limited compute budget

Choose Deep Learning (this guide) when:

Dataset has >50,000 samples
High-dimensional data (images, text, audio)
Complex patterns (time-series dependencies)
GPU resources available
Willing to trade interpretability for accuracy

Conclusion

Deep learning offers powerful tools for anomaly detection in complex, high-dimensional data:

Autoencoders: General-purpose reconstruction-based detection
VAEs: Probabilistic detection with uncertainty estimates
LSTMs: Time-series and sequential data
Production deployment: Containerized inference servers

Key Takeaways:

Start with classical ML for simpler problems (our previous guide)
Use autoencoders for high-dimensional tabular data
Use VAEs when uncertainty quantification matters
Use LSTMs for time-series with temporal dependencies
Always validate on held-out normal data
Monitor and retrain in production

Next Steps:

Experiment with different architectures (deeper, wider networks)
Try convolutional autoencoders for image data
Explore Transformer-based models for very long sequences
Implement ensemble methods (combine multiple detectors)

Further Resources:

PyTorch Documentation: https://pytorch.org/docs/stable/index.html
Autoencoder Tutorial: https://pytorch.org/tutorials/beginner/basics/autoencoderyt_tutorial.html
VAE Paper: “Auto-Encoding Variational Bayes” (Kingma & Welling, 2013)
LSTM Anomaly Detection: “LSTM-based Encoder-Decoder for Multi-sensor Anomaly Detection” (Malhotra et al., 2016)
Classical ML Approach: Our guide on scikit-learn anomaly detection