Securing AI-Generated Code: Production Workflows and Security Scanning

Research Disclaimer

This tutorial is based on:

Semgrep v1.55+ (SAST scanning)
Bandit v1.7+ (Python security linter)
CodeQL v2.15+ (GitHub Advanced Security)
SonarQube v10.3+ (code quality & security)
Academic research on AI code generation security (NYU 2023 study, Stanford 2024 study)
OWASP Top 10 2021 vulnerability classifications

All code examples demonstrate production-grade security scanning integrated into CI/CD pipelines. Tested with GitHub Actions, GitLab CI, and Jenkins. Security recommendations follow OWASP and NIST guidelines.

Introduction

AI coding assistants (GitHub Copilot, ChatGPT, Claude Code) accelerate development but can introduce security vulnerabilities if not properly reviewed. Research shows AI-generated code contains vulnerabilities in 40% of suggestions involving security-sensitive operations.

This guide demonstrates production security workflows:

Static analysis tools: Semgrep, Bandit, CodeQL integration
Secure prompting: Strategies for safe AI code generation
CI/CD integration: Automated security scanning workflows
Manual review: Checklist for AI-generated code
Real vulnerabilities: Examples from actual AI outputs
Complete automation: Production-ready security pipelines

AI Code Generation Risks

Risk Category	Impact	Mitigation
SQL Injection	High - Data breach	Prepared statements, ORM, input validation
XSS	Medium - Session hijacking	Output encoding, CSP headers
Hardcoded credentials	Critical - Unauthorized access	Secret management, env variables
Insecure deserialization	Critical - Remote code execution	Type validation, safe parsers
Path traversal	High - File access	Path sanitization, allowlists
Weak cryptography	Medium - Data exposure	Modern algorithms (AES-256, SHA-256)
Missing auth checks	Critical - Privilege escalation	Centralized auth middleware

Prerequisites

Required Knowledge:

Understanding of common web vulnerabilities (OWASP Top 10)
Experience with CI/CD pipelines
Familiarity with Python, JavaScript, or Java
Basic understanding of static analysis tools

Required Tools:

# Python static analysis
pip install bandit==1.7.5 semgrep==1.55.0

# JavaScript/TypeScript security
npm install -g eslint eslint-plugin-security

# Code quality platform
docker pull sonarqube:10.3-community

# GitHub CodeQL CLI (for local scanning)
# Download from https://github.com/github/codeql-cli-binaries/releases

Example 1: SQL Injection in AI-Generated Code

Vulnerable Code (AI-Generated)

Prompt to AI: “Write a Python function to get user by email from database”

AI Output (Insecure):

def get_user_by_email(email):
    """Get user from database by email."""
    import sqlite3
    conn = sqlite3.connect('users.db')
    cursor = conn.cursor()

    # VULNERABILITY: SQL Injection
    query = f"SELECT * FROM users WHERE email = '{email}'"
    cursor.execute(query)

    user = cursor.fetchone()
    conn.close()
    return user

Exploitation:

# Attacker input
malicious_email = "' OR '1'='1"
get_user_by_email(malicious_email)
# Executes: SELECT * FROM users WHERE email = '' OR '1'='1'
# Returns ALL users

Secure Version

Secure Prompt: “Write a Python function to get user by email using parameterized query”

AI Output (Secure):

def get_user_by_email(email: str):
    """
    Get user from database by email using parameterized query.

    Args:
        email: User email address (validated)

    Returns:
        User record or None
    """
    import sqlite3
    from email_validator import validate_email, EmailNotValidError

    # Validate email format
    try:
        valid = validate_email(email)
        email = valid.email
    except EmailNotValidError:
        return None

    conn = sqlite3.connect('users.db')
    cursor = conn.cursor()

    # SECURE: Parameterized query (prevents SQL injection)
    query = "SELECT * FROM users WHERE email = ?"
    cursor.execute(query, (email,))

    user = cursor.fetchone()
    conn.close()
    return user

Static Analysis with Semgrep

Semgrep provides pattern-based vulnerability detection.

Step 1: Install and Configure

File: .semgrep.yml - Semgrep configuration

# Semgrep configuration for AI-generated code security
rules:
  # Detect SQL injection patterns
  - id: sql-injection-risk
    patterns:
      - pattern: |
          cursor.execute(f"... {$VAR} ...")
      - pattern: |
          cursor.execute("... + $VAR + ...")
      - pattern-not: |
          cursor.execute("...", (...))
    message: |
      Potential SQL injection: Use parameterized queries instead of string formatting
    languages: [python]
    severity: ERROR

  # Detect hardcoded credentials
  - id: hardcoded-credentials
    patterns:
      - pattern-regex: '(password|api_key|secret|token)\s*=\s*["\'][^"\']+["\']'
      - pattern-not-regex: '(password|api_key)\s*=\s*os\.(getenv|environ)'
    message: |
      Hardcoded credentials detected. Use environment variables instead.
    languages: [python, javascript]
    severity: ERROR

  # Detect weak cryptography
  - id: weak-crypto
    patterns:
      - pattern: hashlib.md5(...)
      - pattern: hashlib.sha1(...)
    message: |
      Weak cryptographic hash detected. Use SHA-256 or stronger.
    languages: [python]
    severity: WARNING

  # Detect missing input validation
  - id: missing-input-validation
    patterns:
      - pattern: |
          def $FUNC($PARAM):
            ...
            open($PARAM, ...)
      - pattern-not: |
          def $FUNC($PARAM):
            ...
            if ... in $PARAM: ...
            ...
            open($PARAM, ...)
    message: |
      Path traversal risk: Validate file paths before opening
    languages: [python]
    severity: WARNING

  # Detect eval() usage
  - id: dangerous-eval
    patterns:
      - pattern: eval(...)
    message: |
      Dangerous eval() detected. This can lead to remote code execution.
    languages: [python]
    severity: ERROR

Step 2: Run Semgrep Scan

File: run_semgrep.sh

#!/bin/bash
# Run Semgrep security scan

set -e

echo "Running Semgrep security scan..."

# Run with custom rules
semgrep \
  --config .semgrep.yml \
  --config "p/security-audit" \
  --config "p/owasp-top-ten" \
  --config "p/python" \
  --json \
  --output semgrep-results.json \
  .

# Generate human-readable report
semgrep \
  --config .semgrep.yml \
  --config "p/security-audit" \
  .

# Check if critical issues found
CRITICAL_COUNT=$(jq '[.results[] | select(.extra.severity == "ERROR")] | length' semgrep-results.json)

if [ "$CRITICAL_COUNT" -gt 0 ]; then
  echo "❌ Found $CRITICAL_COUNT critical security issues!"
  exit 1
else
  echo "✓ No critical security issues found"
fi

Python Security with Bandit

Bandit specializes in Python security issues.

Bandit Configuration

File: .bandit

[bandit]
# Exclude test and virtual environment directories
exclude_dirs = /tests,/venv,/.venv,/build

# Skip specific test IDs (if needed)
# skips = B404,B603

# Note: Most Bandit options are better configured via CLI flags:
# - Severity levels: --severity-level (low/medium/high)
# - Confidence: --confidence-level (low/medium/high)
# - Specific tests: -s B101,B601 (skip) or -t B201,B301 (run only)

File: run_bandit.py - Automated Bandit scanning

"""
Run Bandit security scan and generate report.
"""

import subprocess
import json
import sys

def run_bandit_scan(target_dir="."):
    """
    Run Bandit security scan.

    Args:
        target_dir: Directory to scan

    Returns:
        Exit code (0 = pass, 1 = fail)
    """

    # Run Bandit
    cmd = [
        "bandit",
        "-r", target_dir,
        "-f", "json",
        "-o", "bandit-report.json",
        "-c", ".bandit",
        "--severity-level", "medium",
    ]

    try:
        subprocess.run(cmd, check=True)
    except subprocess.CalledProcessError:
        pass  # Bandit returns non-zero when issues found

    # Parse results
    with open("bandit-report.json", "r") as f:
        results = json.load(f)

    # Categorize issues
    critical_issues = [
        issue for issue in results.get("results", [])
        if issue["issue_severity"] in ["HIGH", "MEDIUM"]
    ]

    # Print summary
    print("\n" + "=" * 60)
    print("Bandit Security Scan Results")
    print("=" * 60)
    print(f"Total issues: {len(results.get('results', []))}")
    print(f"Critical/High issues: {len(critical_issues)}")

    if critical_issues:
        print("\nCritical Issues Found:")
        for issue in critical_issues[:10]:  # Show top 10
            print(f"\n  [{issue['issue_severity']}] {issue['issue_text']}")
            print(f"  File: {issue['filename']}:{issue['line_number']}")
            print(f"  CWE: {issue.get('issue_cwe', {}).get('id', 'N/A')}")

    # Generate HTML report
    subprocess.run([
        "bandit",
        "-r", target_dir,
        "-f", "html",
        "-o", "bandit-report.html"
    ])

    print(f"\nHTML report: bandit-report.html")

    # Fail if critical issues found
    if len(critical_issues) > 0:
        print("\n❌ Critical security issues detected!")
        return 1
    else:
        print("\n✓ No critical security issues")
        return 0

if __name__ == "__main__":
    sys.exit(run_bandit_scan())

CI/CD Integration

GitHub Actions Workflow

File: .github/workflows/security-scan.yml

name: Security Scan

on:
  pull_request:
    branches: [main, develop]
  push:
    branches: [main]

jobs:
  security-scan:
    name: Run Security Scans
    runs-on: ubuntu-latest

    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for better analysis

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install security tools
        run: |
          pip install bandit semgrep
          npm install -g eslint eslint-plugin-security

      - name: Run Semgrep
        run: |
          semgrep --config=.semgrep.yml \
                  --config="p/security-audit" \
                  --config="p/owasp-top-ten" \
                  --sarif \
                  --output=semgrep.sarif \
                  .
        continue-on-error: true

      - name: Run Bandit
        run: |
          bandit -r . -f json -o bandit-report.json || true
          python run_bandit.py
        continue-on-error: true

      - name: Upload Semgrep results to GitHub Security
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: semgrep.sarif
        if: always()

      - name: Comment PR with results
        uses: actions/github-script@v7
        if: github.event_name == 'pull_request'
        with:
          script: |
            const fs = require('fs');
            const report = JSON.parse(fs.readFileSync('bandit-report.json', 'utf8'));

            const criticalIssues = report.results.filter(
              r => r.issue_severity === 'HIGH' || r.issue_severity === 'MEDIUM'
            );

            let comment = '## 🔐 Security Scan Results\n\n';
            comment += `**Total Issues**: ${report.results.length}\n`;
            comment += `**Critical/High**: ${criticalIssues.length}\n\n`;

            if (criticalIssues.length > 0) {
              comment += '### ⚠️ Critical Issues\n\n';
              criticalIssues.slice(0, 5).forEach(issue => {
                comment += `- **${issue.issue_text}**\n`;
                comment += `  - File: \`${issue.filename}:${issue.line_number}\`\n`;
                comment += `  - Severity: ${issue.issue_severity}\n\n`;
              });
            } else {
              comment += '✅ No critical security issues found!\n';
            }

            github.rest.issues.createComment({
              owner: context.repo.owner,
              repo: context.repo.repo,
              issue_number: context.payload.pull_request.number,
              body: comment
            });

      - name: Fail if critical issues
        run: |
          CRITICAL=$(jq '[.results[] | select(.issue_severity == "HIGH" or .issue_severity == "MEDIUM")] | length' bandit-report.json)
          if [ "$CRITICAL" -gt 0 ]; then
            echo "❌ $CRITICAL critical issues found"
            exit 1
          fi

GitLab CI Integration

File: .gitlab-ci.yml

stages:
  - security

security_scan:
  stage: security
  image: python:3.11-slim

  before_script:
    - pip install bandit semgrep

  script:
    # Run Semgrep
    - semgrep --config=.semgrep.yml --config="p/security-audit" --json -o semgrep.json .

    # Run Bandit
    - bandit -r . -f json -o bandit.json || true

    # Check for critical issues
    - |
      CRITICAL=$(jq '[.results[] | select(.issue_severity == "HIGH")] | length' bandit.json)
      if [ "$CRITICAL" -gt 0 ]; then
        echo "Critical security issues found: $CRITICAL"
        exit 1
      fi

  artifacts:
    reports:
      sast: semgrep.json
    paths:
      - bandit.json
      - semgrep.json
    expire_in: 30 days

  only:
    - merge_requests
    - main

Secure AI Prompting Strategies

1. Specify Security Requirements

Bad Prompt:

Write a login endpoint

Good Prompt:

Write a secure login endpoint with:
- Bcrypt password hashing
- Rate limiting (5 attempts per minute)
- Input validation for email and password
- CSRF token validation
- Secure session cookies (httpOnly, sameSite)
- No hardcoded secrets

2. Request Security Comments

Prompt:

Write a file upload handler with security comments explaining:
- Why each validation check is needed
- What attacks are being prevented
- Which OWASP Top 10 items are addressed

3. Ask for Both Insecure and Secure Versions

Prompt:

Show me:
1. An insecure version of a password reset function
2. A secure version with explanations of what was fixed
3. Unit tests that verify the security fixes

Manual Code Review Checklist

File: ai-code-review-checklist.md

# AI-Generated Code Security Checklist

## Input Validation
- [ ] All user inputs are validated (type, length, format)
- [ ] File paths are sanitized (no `../` path traversal)
- [ ] SQL queries use parameterized statements
- [ ] Regular expressions are not vulnerable to ReDoS
- [ ] File uploads check file type and size

## Authentication & Authorization
- [ ] Authentication checks are present on protected endpoints
- [ ] Role-based access control (RBAC) is implemented
- [ ] Session tokens are generated securely (crypto.randomBytes)
- [ ] Passwords are hashed with bcrypt/argon2 (not MD5/SHA1)
- [ ] Rate limiting is applied to auth endpoints

## Cryptography
- [ ] No hardcoded secrets (passwords, API keys, tokens)
- [ ] Secrets loaded from environment variables
- [ ] Strong algorithms used (AES-256, SHA-256, not MD5/DES)
- [ ] Random values generated with crypto library (not Math.random)
- [ ] TLS/HTTPS enforced for sensitive data

## Data Protection
- [ ] Sensitive data not logged (passwords, credit cards, SSNs)
- [ ] Database connections use prepared statements
- [ ] No eval() or exec() calls
- [ ] Deserialization uses safe parsers (not pickle)
- [ ] Output is escaped before rendering (XSS prevention)

## Error Handling
- [ ] Errors don't leak sensitive information
- [ ] Stack traces not exposed to users
- [ ] Generic error messages returned to client
- [ ] Detailed errors logged server-side only

## Dependencies
- [ ] No known vulnerable dependencies (run `npm audit` or `pip-audit`)
- [ ] Dependencies pinned to specific versions
- [ ] Minimal dependencies (reduce attack surface)

## Configuration
- [ ] Debug mode disabled in production
- [ ] CORS configured properly (not `Access-Control-Allow-Origin: *`)
- [ ] Security headers set (CSP, X-Frame-Options, etc.)
- [ ] Default credentials changed

Real Vulnerability Examples

Example 1: Path Traversal

AI-Generated (Vulnerable):

@app.route('/download/<filename>')
def download_file(filename):
    """Download file from uploads directory."""
    file_path = f"uploads/{filename}"
    return send_file(file_path)

# EXPLOIT: /download/../../etc/passwd
# Accesses files outside uploads directory

Secure Version:

import os
from pathlib import Path
from flask import abort

@app.route('/download/<filename>')
def download_file(filename):
    """Securely download file from uploads directory."""
    # Define allowed directory
    uploads_dir = Path("/var/www/uploads").resolve()

    # Build file path
    file_path = (uploads_dir / filename).resolve()

    # Verify file is within allowed directory
    if not str(file_path).startswith(str(uploads_dir)):
        abort(403, "Access denied")

    # Verify file exists
    if not file_path.is_file():
        abort(404, "File not found")

    return send_file(file_path)

Example 2: XSS (Cross-Site Scripting)