Production‑Ready n8n Deployment Checklist: Step-by-Step G…

Step by Step Guide to solve n8n production readiness checklist 
Step by Step Guide to solve n8n production readiness checklist


Who this is for: Platform engineers and DevOps teams that need to run n8n reliably at scale. We cover this in detail in the n8n Production Readiness & Scalability Risks Guide.


Quick Diagnosis

Problem: n8n works in development but fails under production load, leaks credentials, or loses workflow state.

Fast‑track fix: Run the checklist below, apply every Critical  item, and redeploy. This step resolves the most common production‑grade failures and satisfies featured‑snippet criteria.

*In production, this usually shows up when the DB connection drops or a webhook payload exceeds the default size limit.*


Core Infrastructure Requirements

If you encounter any common n8n architecture mistakes resolve them before continuing with the setup.

These items are the ones we watch when a fresh n8n install starts misbehaving under load.

Item Recommended Setting Why It Matters
Persistent Storage POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD mounted to a durable volume (Docker: -v n8n-data:/home/node/.n8n) Guarantees workflow definitions and execution data survive container restarts.
Dedicated DB External PostgreSQL ≥ 13, not SQLite for > 10k executions/month SQLite is a single file and prone to corruption under concurrency.
CPU / Memory 2 vCPU + 4 GiB RAM baseline; add 0.5 vCPU per 1 k concurrent executions Prevents OOM kills and CPU throttling during peak workflow runs.
Network Isolation Deploy n8n in its own Docker network or Kubernetes namespace Limits blast radius if a compromised workflow tries lateral movement.
TLS Termination Reverse proxy (Traefik, NGINX) with Let’s Encrypt certificates Encrypts API traffic and protects webhook payloads.

How to verify – run the commands listed in the right‑hand column of each row (e.g., docker exec <container> ls /home/node/.n8n or psql -U $POSTGRES_USER -d $POSTGRES_DB -c "\dt").

EEFA note: Never expose port 5678 directly to the internet; always place a TLS‑terminating proxy in front.


Security Hardening Checklist

If you encounter any hidden cost of cheap n8n hosting resolve them before continuing with the setup.

 Item Recommended Value Reason
N8N_BASIC_AUTH_ACTIVE true Disables anonymous UI access.
N8N_BASIC_AUTH_USER / N8N_BASIC_AUTH_PASSWORD Random 16‑plus‑char strings Prevents credential stuffing.
N8N_ENCRYPTION_KEY 32‑byte Base64 secret (openssl rand -base64 32) Encrypts stored credentials & secrets.
N8N_DISABLE_PRODUCTION_WARNINGS false Keeps safety warnings visible.
WEBHOOK_TUNNEL_URL Never set in production Stops accidental exposure of local tunnels.
Secret Management Store all env vars in a secret manager (AWS Secrets Manager, Vault) Avoids plaintext secrets in Dockerfiles or git.

EEFA warning: Changing N8N_ENCRYPTION_KEY after credentials are stored will render those credentials unusable. Migrate data before rotating the key.

Most teams forget to rotate the encryption key until they hit a credential‑related error.


High Availability & Scaling

If you encounter any n8n execution history time bomb resolve them before continuing with the setup.

1. Horizontal Scaling with Docker‑Compose (Swarm)

Docker‑Compose makes it easy to spin up a replicated stack, but remember that the volume must be shared across replicas.

Deploy a replicated stack – the snippet below defines a three‑replica service with resource limits.

version: "3.8"
services:
  n8n:
    image: n8nio/n8n:latest
    deploy:
      mode: replicated
      replicas: 3
    resources:
      limits:
        cpus: "1.0"
        memory: 2G
    environment:
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres
      - DB_POSTGRESDB_PORT=5432
      - DB_POSTGRESDB_DATABASE=n8n
      - DB_POSTGRESDB_USER=${POSTGRES_USER}
      - DB_POSTGRESDB_PASSWORD=${POSTGRES_PASSWORD}
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=${BASIC_USER}
      - N8N_BASIC_AUTH_PASSWORD=${BASIC_PASS}
      - N8N_ENCRYPTION_KEY=${ENC_KEY}
    volumes:
      - n8n-data:/home/node/.n8n
    ports:
      - "5678:5678"
    depends_on:
      - postgres
  postgres:
    image: postgres:13-alpine
    environment:
      POSTGRES_DB: n8n
      POSTGRES_USER: ${POSTGRES_USER}
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - pg-data:/var/lib/postgresql/data
volumes:
  n8n-data:
  pg-data:

Deploy with:

docker stack deploy -c docker-compose.yml n8n

2. Kubernetes (StatefulSet + Service)

Kubernetes will schedule each pod on a different node if resources allow, which removes a single point of failure.

StatefulSet definition – each replica receives its own persistent volume claim.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: n8n
spec:
  serviceName: "n8n"
  replicas: 3
  selector:
    matchLabels:
      app: n8n
  template:
    metadata:
      labels:
        app: n8n
    spec:
      containers:
        - name: n8n
          image: n8nio/n8n:latest
          envFrom:
            - secretRef:
                name: n8n-secrets
          ports:
            - containerPort: 5678
          volumeMounts:
            - name: n8n-data
              mountPath: /home/node/.n8n
      volumes:
        - name: n8n-data
          persistentVolumeClaim:
            claimName: n8n-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: n8n
spec:
  selector:
    app: n8n
  ports:
    - protocol: TCP
      port: 80
      targetPort: 5678
  type: ClusterIP

EEFA tip: Use a ReadWriteMany PVC (e.g., NFS, Ceph) only if you need shared storage across pods. Otherwise each replica gets its own copy, preventing divergent workflow states.

3. Load‑Balancing Webhooks

  • Sticky sessions are rarely required unless the DB cannot keep up with session state.
  • Ingress – route /webhook/* to the n8n service.
  • Sticky Sessions – enable sessionAffinity: ClientIP only when you have a single‑node DB; otherwise let the DB handle state.

Monitoring, Logging, & Alerting

Component Recommended Tool Config Snippet
Metrics Prometheus + node‑exporter Set METRICS=true to expose /metrics.
Logs Loki + Grafana docker run -d --log-driver=gelf --log-opt gelf-address=udp://loki:12201 n8nio/n8n
Health Checks K8s liveness/readiness probes httpGet: path: /healthz port: 5678
Alerting Alertmanager (CPU > 80% for 5 min, DB errors)
- alert: HighCpuUsage
  expr: sum(rate(container_cpu_usage_seconds_total{container="n8n"}[1m])) by (instance) > 0.8

EEFA note: Do not rely solely on the UI “Workflow Execution History” for production diagnostics; it truncates after 100 entries. Centralized logging is mandatory for forensic analysis.

In the field, we’ve seen alerts go silent if the metrics endpoint isn’t exposed.


Backup, Restore, & Disaster Recovery

  1. Database dump (PostgreSQL)
    pg_dump -U $POSTGRES_USER -Fc $POSTGRES_DB > n8n_$(date +%F).dump
    
  2. Workflow export via API
    curl -X GET "https://n8n.example.com/rest/workflows" \
         -H "Authorization: Bearer $API_TOKEN" \
         -o workflows_$(date +%F).json
    
  3. Automated snapshot – schedule a daily cron job (or K8s CronJob) that runs both steps and pushes the artifacts to an off‑site object store (AWS S3, GCS).
  4. Restore workflow
    pg_restore -U $POSTGRES_USER -d $POSTGRES_DB n8n_2024-12-01.dump
    
    curl -X POST "https://n8n.example.com/rest/workflows" \
         -H "Authorization: Bearer $API_TOKEN" \
         -H "Content-Type: application/json" \
         --data @workflows_2024-12-01.json
    

EEFA warning: Restoring a dump that contains encrypted credentials will fail if the N8N_ENCRYPTION_KEY has changed. Keep the key version‑controlled alongside your backup policy.

Make sure the backup job runs with the same ENCRYPTION_KEY that the live instance uses, otherwise restores will fail.


CI/CD & Automated Deployments

Step Tool Example
Lint & Test n8n-cli (n8n lint) + Jest for custom nodes npm run lint && npm test
Container Build GitHub Actions → Docker Buildx
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build & push
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ghcr.io/yourorg/n8n:${{ github.sha }}
Deploy Argo CD (K8s) or Docker Swarm stack update docker stack deploy -c docker-compose.yml n8n
Smoke Test Post‑deployment curl to /healthz curl -f https://n8n.example.com/healthz
Rollback Keep previous image tag; docker service update --image ... docker service update –image n8nio/n8n:1.24.0 n8n_n8n

EEFA tip: A quick smoke test after each deploy catches most misconfigurations before they hit users.

Freeze the N8N_ENCRYPTION_KEY in a secret manager and reference it via ${{ secrets.N8N_ENCRYPTION_KEY }}. Changing the key mid‑pipeline breaks all stored credentials.


Final Verification & Ongoing Audits

Checklist Item Pass/Fail Evidence
All critical env vars sourced from a secret manager
TLS terminates at the edge; HTTP → HTTPS redirect enforced curl -I http://n8n.example.com → 301 to https
Backup succeeded for the last 24 h S3 object list shows today’s dump
Prometheus metrics scraped without errors up{job=”n8n”} == 1 in Grafana
No default credentials exist docker exec n8n grep -i admin .env returns none
Load test ≥ 500 req/s with < 200 ms latency k6 script results attached

Run this verification after every major version upgrade. Document any deviation and create a JIRA ticket for remediation.

Running the checklist after each upgrade is a habit that saves a lot of firefighting later.


 

By systematically ticking each row in the tables above, you turn a bare‑bones n8n instance into a production‑grade automation engine that meets reliability, security, and compliance expectations.

Leave a Comment

Your email address will not be published. Required fields are marked *