n8n performance drops after scaling horizontally – why more workers isn’t enough

Step by Step Guide to solve why n8n performance drops after scaling horizontally
Step by Step Guide to solve why n8n performance drops after scaling horizontally


Who this is for: Site reliability engineers, platform engineers, or DevOps teams that have already deployed n8n in a Kubernetes (or similar) environment and are now adding pods to increase capacity. We cover this in detail in the n8n Performance Degradation & Stability Issues Guide.


Quick Diagnosis & Fix

Symptom Likely Root Cause One‑line Remedy
CPU spikes on every node, overall throughput ↓ Shared DB bottleneck (SQLite or single‑instance Postgres) Move to a dedicated, HA PostgreSQL cluster and enable connection pooling (pgbouncer).
Random workflow failures, “execution timed out” Stateless‑vs‑Stateful mismatch (workflows rely on in‑memory cache) Use Redis for the queue & cache; set N8N_QUEUE_MODE=redis and point all nodes to the same Redis instance.
Load‑balancer returns 502/504 under load Sticky‑session mis‑config (LB not preserving session) Enable N8N_DISABLE_PROXIES=true or configure sticky sessions (session affinity) on the LB.
Memory usage climbs on each replica, OOM kills Workflow memory leak (large payloads kept in process) Limit N8N_EXECUTIONS_PROCESS=main or off‑load heavy data to external storage (S3, DB) and set N8N_MAX_EXECUTION_TIMEOUT.
Overall latency ↑ despite more pods Insufficient pod resources / CPU throttling Raise resources.requests.cpu and resources.limits.cpu in the pod spec; monitor with kubectl top pod.

Quick test – After applying the appropriate remedy, re‑run a load test (e.g., hey -c 200 -n 10000 http://<lb>/webhook). Latency should drop and error rate stay below 1 %.


1. Horizontal Scaling Basics

If you encounter any n8n throughput plateau adding workers stops helping resolve them before continuing with the setup.

Component Single‑Node Default Change Required for Horizontal Scale
Workflow Engine In‑process Node.js loop Must be stateless – any pod can pick up any execution.
Queue In‑memory fallback Switch to a distributed queue (Redis, RabbitMQ, or n8n‑cloud).
Database SQLite file or single Postgres Use an HA, multi‑writer PostgreSQL cluster with pooling.
Cache Process memory External cache (Redis) for shared state (credentials, webhook IDs).
Load Balancer Direct to sole pod Distribute HTTP requests and preserve session affinity for webhook callbacks.

EEFA Note: n8n was originally built for “single‑node dev” use‑cases. Scaling without converting these components creates hidden contention points that manifest as performance drops.


2. Common Bottlenecks After Scaling

2.1 Database Saturation

Symptom Why it Happens Fix Checklist
Query latency > 200 ms, many ER_LOCK_DEADLOCK errors All workflow metadata (logs, credentials, definitions) are written to the same DB. Adding pods multiplies concurrent writes. • Switch to a dedicated PostgreSQL cluster (e.g., AWS RDS Aurora).
• Enable connection pooling (pgbouncer).
• Tune max_connections to 2 × (CPU cores × pods).
• (Optional) Add read replicas for UI queries.

PostgreSQL Upgrade Example

# Set environment variables for an external HA Postgres
export N8N_DB_TYPE=postgresdb
export N8N_DB_POSTGRESDB_HOST=<aurora-endpoint>
export N8N_DB_POSTGRESDB_PORT=5432
export N8N_DB_POSTGRESDB_USER=<user>
export N8N_DB_POSTGRESDB_PASSWORD=<password>

Enable pgbouncer Pooler

helm install pgbouncer bitnami/pgbouncer \
  --set postgresql.host=<aurora-endpoint> \
  --set postgresql.port=5432
# Point n8n to the pooler
export N8N_DB_POSTGRESDB_HOST=pgbouncer.default.svc.cluster.local

2.2 Queue Mis‑configuration

Symptom Root Cause Remedy
“Queue is full” errors, workflow stalls Default in‑memory queue cannot be shared; each pod maintains its own queue, causing duplicate work and lost jobs. Deploy a distributed Redis queue and configure n8n to use it.

Deploy Redis (replicated)

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install redis bitnami/redis --set architecture=replication

Configure n8n to use Redis (first half)

# values.yaml – queue mode
env:
  - name: N8N_QUEUE_MODE
    value: "redis"

Configure n8n to use Redis (second half)

  - name: N8N_REDIS_HOST
    value: "redis-master.default.svc.cluster.local"
  - name: N8N_REDIS_PORT
    value: "6379"
  - name: N8N_REDIS_DB
    value: "0"

EEFA Warning: Enable Redis persistence (appendonly yes) and ACLs; otherwise a single node failure can lose queued executions.

2.3 Sticky Sessions & Webhook Routing

Symptom Cause Fix Options
Webhook callbacks hit a different pod → 404 or duplicate execution Load balancer does not preserve client affinity. 1. Enable sticky sessions on the LB.
2. Make webhooks stateless by disabling proxy headers (N8N_DISABLE_PROXIES=true).
3. Use n8n’s built‑in webhook tunnel only for dev.

Option A – Enable Sticky Sessions (K8s Service)

apiVersion: v1
kind: Service
metadata:
  name: n8n
spec:
  selector:
    app: n8n
  ports:
  - protocol: TCP
    port: 80
    targetPort: 5678
  sessionAffinity: ClientIP   # <-- enable sticky sessions

Option B – Stateless Webhooks

export N8N_DISABLE_PROXIES=true
# Add to deployment env:
- name: N8N_DISABLE_PROXIES
  value: "true"

2.4 Memory Leaks in Long‑Running Workflows

Symptom Cause Mitigation
Pods OOM‑kill after 30‑60 min under load Large payloads (files, JSON) stay in the Node.js process until the workflow finishes. • Off‑load binaries to external storage (S3).
• Limit in‑memory payload size (N8N_MAX_BINARY_DATA_SIZE=5mb).
• Run heavy workflows in a separate worker process.
• Set a hard execution timeout.

Example Settings

export N8N_MAX_BINARY_DATA_SIZE=5mb
export N8N_EXECUTIONS_PROCESS=worker
export N8N_WORKER_CONCURRENCY=5
export N8N_MAX_EXECUTION_TIMEOUT=300   # seconds

2.5 CPU Throttling & Pod Resource Limits

Desired Throughput CPU Request CPU Limit Pods
100 req/s 500m 1 3
250 req/s 1 2 5
500+ req/s 2 4 8+

EEFA Tip: Use the Horizontal Pod Autoscaler (HPA) with a custom metric (n8n_executions_per_second) instead of only CPU.

Resource & Autoscaling Manifest (split for readability)

resources:
  requests:
    cpu: "1"
    memory: "1Gi"
  limits:
    cpu: "2"
    memory: "2Gi"
autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 12
  targetCPUUtilizationPercentage: 70
  metrics:
  - type: Pods
    pods:
      metric:
        name: n8n_executions_per_second
      target:
        type: AverageValue
        averageValue: "200"

3. Step‑by‑Step Stabilization Guide

3.1 Audit Your Current Stack

# Identify DB type
echo $N8N_DB_TYPE   # should be "postgresdb" in production

# Verify queue mode
echo $N8N_QUEUE_MODE   # must NOT be "memory"

# Check LB session affinity
kubectl get svc n8n -o yaml | grep sessionAffinity

If any command returns a default (sqlite, memory, None), you are in a bottleneck zone.

3.2 Deploy a Distributed Queue (Redis)

  1. Install Redis (see 2.2).
  2. Add env vars (see 2.2 snippets).
  3. Restart n8n pods:
    kubectl rollout restart deployment n8n
    

3.3 Upgrade to HA PostgreSQL

Step Action
Provision Create an RDS/Aurora instance (or any HA Postgres).
Secret kubectl create secret generic n8n-pg --from-literal=postgres_user=…
Env vars Apply the variables shown in the “PostgreSQL Upgrade Example”.
Pooler Deploy pgbouncer (see 2.1).
Point n8n Set N8N_DB_POSTGRESDB_HOST to the pgbouncer service.

3.4 Enforce Sticky Sessions or Stateless Webhooks

*Choose one approach that matches your ingress controller.*

  • Sticky Sessions – apply the Service manifest from 2.3 Option A.
  • Stateless Webhooks – add N8N_DISABLE_PROXIES=true to the deployment env (see 2.3 Option B).

3.5 Tune Resources & Enable Autoscaling

  1. Update the deployment with the resources block (see 2.5).
  2. Enable the HPA using the autoscaling block (see 2.5).
  3. Apply changes:
    helm upgrade --install n8n . -f values.yaml
    

4. Troubleshooting Checklist

Check How to Verify Expected
DB connection pool saturation SELECT count(*) FROM pg_stat_activity WHERE state='active'; < 80 % of max_connections.
Redis queue depth redis-cli LLEN n8n:queue < 500 (adjust per load).
Pod CPU throttling kubectl top pod n8n-xxxx CPU usage ≤ requests.
Webhook delivery latency kubectl logs n8n-xxxx | grep webhook < 200 ms.
Memory usage per pod kubectl exec n8n-xxxx -- free -m RSS ≤ 1 GiB (or within limit).
LB health checks curl -I http://<lb>/healthz 200 OK consistently.

If any metric exceeds the expected range, revisit the corresponding configuration block in Section 3.


5. Production‑Ready Best Practices (EEFA)

Practice Why it Matters How to Implement
Separate execution processes (worker mode) Isolates heavy workflows from the API server, preventing request‑latency spikes. export N8N_EXECUTIONS_PROCESS=worker
export N8N_WORKER_CONCURRENCY=5
Enable Prometheus metrics Real‑time visibility into queue length, execution time, DB latency. export N8N_METRICS=true and expose /metrics.
TLS termination at the LB Removes per‑pod TLS overhead and secures webhook callbacks. Configure LB cert; set N8N_ENDPOINT_WEBHOOK=https://<lb>/webhook.
Rotate credentials regularly Limits blast‑radius if a node is compromised. Use N8N_ENCRYPTION_KEY and rotate via CI pipeline.
Graceful shutdown hooks Guarantees in‑flight executions finish before pod termination, avoiding partial runs.
preStop:
  exec:
    command: ["curl", "-X", "POST", "http://localhost:5678/healthz/shutdown"]

Conclusion

Performance regressions after horizontal scaling almost always stem from five core regressions: database saturation, an in‑memory queue, missing sticky sessions, memory‑leaky workflows, and CPU throttling. By auditing each component, migrating to distributed services (HA PostgreSQL, Redis), enforcing session affinity or stateless webhooks, tuning pod resources, and applying the EEFA best practices above, you can scale n8n horizontally without sacrificing latency or reliability. Validate each change with load‑testing and the troubleshooting checklist, then let the autoscaler handle traffic spikes while your underlying services stay robust. Happy automating!

Leave a Comment

Your email address will not be published. Required fields are marked *