n8n Queue Mode High Concurrency Crash

Step by Step Guide to solve n8n queue mode high concurrency crash
Step by Step Guide to solve n8n queue mode high concurrency crash


Who this is for: Ops engineers and platform architects who run n8n in queue mode and need to sustain hundreds‑to‑thousands of parallel workflow executions. We cover this in detail in the n8n Queue Mode Errors Guide.


Quick Diagnosis

When the queue receives hundreds of simultaneous jobs, the worker process is killed (SIGKILL or “Out‑of‑memory”). Logs then show Failed to start queue consumer.


One‑Page Fix Checklist

If you encounter any n8n queue mode timeout error resolve them before continuing with the setup.

Action Setting / File Recommended Value
Raise Node heap NODE_OPTIONS (env) –max-old-space-size=4096
Limit per‑worker jobs WORKER_CONCURRENCY 50 (adjust up to 100)
Run executions in main process EXECUTIONS_PROCESS main
Expand Redis connection pool REDIS_MAX_CLIENTS (or maxclients in redis.conf) 10000
Increase DB pool size DB_MAX_POOL_SIZE (Postgres) 30
Enable graceful shutdown GRACEFUL_SHUTDOWN_TIMEOUT 30s
Add CPU/Memory limits (Docker/K8s) resources.limits cpu: “4”, memory: “8Gi”

Apply the checklist, restart n8n, and watch the worker logs for any remaining OOM or connection warnings.


1. Why High‑Concurrency Crashes Happen in n8n Queue Mode ?

Root Cause Symptom Underlying Mechanism
Node.js heap exhaustion JavaScript heap out of memory Default 1.5 GB heap is exceeded by large payload objects.
Worker‑to‑Redis overload ERR max number of clients reached Each worker opens a Redis connection per job; many jobs per second saturate maxclients.
PostgreSQL connection saturation too many connections Queue mode’s DB pool exceeds its max size.
Process‑level race conditions Failed to start queue consumer OS hits the ulimit -u (process count) limit when spawning many workers.
Container resource throttling SIGKILL from Docker/K8s Cgroup limits kill the process once memory usage exceeds the quota.

2. Prerequisites & Environment Checks

2.1 Identify Your Queue Backend

n8n supports Redis (default) or PostgreSQL as the queue store.

2.2 Verify Current Limits

Node heap statistics

# Node heap statistics
node -e "console.log(require('v8').getHeapStatistics())"

Redis maxclients

# Redis maxclients
redis-cli CONFIG GET maxclients

PostgreSQL max connections

# PostgreSQL max connections
psql -c "SHOW max_connections;"

2.3 Capture Baseline Metrics

Run a normal‑load test (~100 concurrent jobs) and record CPU/Memory with docker stats or kubectl top pod.


3. Configuring n8n for High Concurrency

3.1 Core Environment Variables

Add or update these in your .env (or Helm values, Docker‑compose overrides).

# Execution engine – avoid IPC overhead
EXECUTIONS_PROCESS=main
# Jobs per worker
WORKER_CONCURRENCY=50
# Node heap – 4 GB
NODE_OPTIONS=--max-old-space-size=4096
# Graceful shutdown timeout
GRACEFUL_SHUTDOWN_TIMEOUT=30s
# Redis client pool (ioredis)
REDIS_MAX_CLIENTS=10000

EEFA tip – If you need sandbox isolation, keep EXECUTIONS_PROCESS=worker but raise OS limits (ulimit -u, nofile) accordingly. If you encounter any n8n queue mode redis persistence lag resolve them before continuing with the setup.

3.2 Docker‑Compose Override

Service definition

version: "3.8"
services:
  n8n:
    image: n8nio/n8n:latest
    env_file: .env
    restart: unless-stopped
    depends_on:
      - redis

Resource & limit settings

    deploy:
      resources:
        limits:
          cpus: "4"
          memory: "8G"
    ulimits:
      nproc: 65535      # increase process count limit
      nofile: 65535     # increase open file descriptors

3.3 Kubernetes Deployment (Helm values)

Environment block

n8n:
  env:
    - name: EXECUTIONS_PROCESS
      value: "main"
    - name: WORKER_CONCURRENCY
      value: "75"
    - name: NODE_OPTIONS
      value: "--max-old-space-size=6144"

Resource requests & limits

  resources:
    limits:
      cpu: "8"
      memory: "12Gi"
    requests:
      cpu: "4"
      memory: "8Gi"

4. Scaling Workers – When One Process Isn’t Enough

Even with WORKER_CONCURRENCY=50, a single instance may hit OS limits beyond 1 000 concurrent jobs. Deploy multiple worker replicas behind the same queue.

Strategy Implementation Pros Cons
Docker‑Compose multiple services Duplicate the n8n service with distinct container_name Simple for small clusters Manual scaling, no auto‑heal
Kubernetes Deployment replicas Set replicaCount: 4 in the Helm chart Auto‑heal, easy roll‑out Requires K8s
Horizontal Pod Autoscaler (HPA) Define HPA targeting CPU utilization Scales with load Needs metrics server

Sample HPA definition

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: n8n-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: n8n
  minReplicas: 2
  maxReplicas: 12
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

EEFA warning – All replicas must share the same EXECUTIONS_PROCESS value; mixing main and worker causes duplicate executions because the queue does not deduplicate across process types.


5. Database & Queue Backend Tuning

5.1 PostgreSQL (if used as queue)

Parameter Recommended Setting
max_connections 200
shared_buffers 25% of RAM
work_mem 64MB
effective_cache_size 75% of RAM

Add to postgresql.conf (or Helm values):

postgresql:
  extendedConfiguration: |
    max_connections = 200
    shared_buffers = 4GB
    work_mem = 64MB
    effective_cache_size = 12GB

5.2 Redis Connection Pool

n8n creates a pool per worker. Increase the pool size via REDIS_MAX_CLIENTS (already set) **and** adjust ioredis options if you customize the client.

// custom-redis.js (optional)
const Redis = require('ioredis');
module.exports = new Redis({
  host: process.env.REDIS_HOST,
  port: process.env.REDIS_PORT,
  password: process.env.REDIS_PASSWORD,
  maxRetriesPerRequest: null,
});

EEFA note – Managed Redis services (AWS Elasticache, Azure Cache) may cap maxclients. If you hit the ceiling, upgrade the tier or enable **cluster mode** to shard connections.


6. Monitoring, Alerting & Post‑Mortem

Metric Tool Alert Threshold
Node RSS memory Prometheus process_resident_memory_bytes > 7 GiB (80 % of container limit)
Redis client connections Redis connected_clients > 80 % of maxclients
PostgreSQL connections pg_stat_activity > 75 % of max_connections
Worker crash count Loki / Grafana logs (Failed to start queue consumer) > 0 in 5 min
Queue backlog length Redis llen:n8n:queue > 5000 jobs

Prometheus rule example

- alert: N8NHighMemoryUsage
  expr: process_resident_memory_bytes{job="n8n"} > 7.5e9
  for: 2m
  labels:
    severity: critical
  annotations:
    summary: "n8n worker memory > 7.5 GiB"
    description: "The n8n container is consuming excessive memory, likely due to unbounded concurrency."

Post‑Mortem Checklist

Step Command / Action Expected Result
Verify env vars loaded printenv | grep -E ‘EXECUTIONS_PROCESS|WORKER_CONCURRENCY|NODE_OPTIONS’ All values present
Check Node heap node -e “console.log(require(‘v8’).getHeapStatistics().total_available_size)” > 3 GB (≈4 GB after setting)
Confirm Redis pool redis-cli CLIENT LIST | wc -l ≤ REDIS_MAX_CLIENTS
Validate DB connections psql -c “SELECT count(*) FROM pg_stat_activity;” ≤ DB_MAX_POOL_SIZE
Look for OOM kills dmesg | grep -i kill No “Out of memory” entries
Examine worker crash logs docker logs n8n | grep -i ‘Failed to start queue consumer’ No recent errors
Review CPU throttling docker stats n8n –no-trunc CPU usage < 80 % of limit
Test with reduced concurrency Set WORKER_CONCURRENCY=10 and rerun load test Crash disappears → concurrency limit was root cause

7. Frequently Asked Questions

  • Q1: Should I keep EXECUTIONS_PROCESS=main in production?
    A: Yes, if the built‑in JavaScript VM meets your security requirements. For ultra‑high‑security environments, stay on worker mode but raise OS limits (ulimit -u, nofile) and connection pool sizes.
  • Q2: My Redis is on AWS Elasticache and I cannot change maxclients. What else can I do?
    A: Enable **Redis Cluster** to spread connections across shards, or switch to **Redis Streams** (QUEUE_BULL_REDIS_STREAMS=true) which reduces per‑worker connections.
  • Q3: The crash still occurs after applying all settings.
    A: Look for memory leaks in custom JavaScript inside workflows. Attach the Node inspector (node --inspect) to a worker pod and capture heap snapshots during load.
  • Q4: Can I use a different queue backend (e.g., RabbitMQ)?
    A: n8n currently supports only Redis or PostgreSQL for queue mode. For RabbitMQ you would need to run n8n in “execute‑once” mode and orchestrate jobs externally.

Conclusion

By raising Node’s heap, limiting per‑worker concurrency, running executions in the main process, and expanding Redis/DB connection pools, you eliminate the primary sources of OOM and connection‑saturation failures. Combine these settings with proper resource limits, worker replication, and vigilant monitoring, and your n8n deployment will reliably handle thousands of parallel workflows in production.

Leave a Comment

Your email address will not be published. Required fields are marked *