n8n freezes under load but doesn’t crash – worker concurrency fix

Step by Step Guide to solve n8n freezes under load but doesnt crash
Step by Step Guide to solve n8n freezes under load but doesnt crash


Who this is for: Ops engineers and platform developers running n8n in Docker or Kubernetes who need to keep the service responsive under production traffic. We cover this in detail in the n8n Performance Degradation & Stability Issues Guide.


Quick diagnosis

When n8n stops responding while the Docker container stays up, the usual suspects are:

  • Event‑loop blockage – sync‑heavy code or endless loops.
  • Memory pressure – V8 can’t reclaim fast enough.
  • Resource limits – CPU throttling or DB connection caps.

If the UI hangs, first look at CPU/Memory (docker stats) and the event‑loop lag (≈ 200 ms is a warning). Reduce concurrent executions or raise the EXECUTIONS_PROCESS count, then restart the container.


1. Why n8n “freezes” instead of crashing?

If you encounter any n8n becomes unstable after high volume runs why and fix resolve them before continuing with the setup.

Symptom Underlying mechanism
HTTP requests time‑out but container stays alive Node.js event loop blocked (sync‑heavy code, large JSON parsing, endless loops)
CPU spikes at 100 % and UI stops updating Single‑thread saturation – all executions share one Node.js process by default
Memory climbs to the Docker limit and OOM‑killer does not kill V8 garbage collector can’t keep up; Docker’s soft limit only throttles
Logs show “Waiting for execution” indefinitely Job queue back‑pressure – internal queue full, workers waiting for DB connections

EEFA note: Docker’s OOM‑killer only terminates the process when the hard memory limit is exceeded. Most “freezes” happen because the process is still alive but can’t make progress.


2. Core resources that dictate n8n’s throughput

Resource Default Production recommendation
CPU cores 1 (single‑threaded) –cpus=2 or more (Docker)
Node workers (EXECUTIONS_PROCESS) 1 2‑4 (match CPU count)
DB connection pool (DB_MAX_POOL_SIZE) 10 20‑30 for PostgreSQL/MySQL
Memory limit 512 MiB (Docker default) 2‑4 GiB (adjust to payload size)
Execution timeout (EXECUTIONS_TIMEOUT) 3600 s 300 s (or lower)

EEFA tip: Set Docker --memory-swap to the same value as --memory to disable swap; swapping inflates latency and makes the UI appear frozen. If you encounter any n8n starts fast but degrades under continuous load resolve them before continuing with the setup.


3. Step‑by‑step: Diagnose a frozen instance

3.1 Inspect container metrics

docker stats $(docker ps -q --filter "name=n8n")

*Look for CPU > 90 % and Memory > 80 %.*

3.2 Check the Node.js event‑loop lag

docker exec -it <container-id> bash
node -e "setInterval(()=>{const start=process.hrtime.bigint(); while(Number(process.hrtime.bigint()-start)<1e9){}}, 1000)"

*If the interval drifts > 200 ms, the loop is blocked.*

3.3 Query n8n’s internal metrics (Prometheus exporter)

curl http://localhost:5678/metrics | grep n8n_execution_

*High n8n_execution_queue_length signals back‑pressure.*

3.4 Review recent logs

docker logs --tail 100 <container-id> | grep -i "error\|warning"

*Watch for DB timeouts, ERR_WORKFLOW_EXECUTION_TIMEOUT, or ERR_MAX_QUEUE_SIZE.*

3.5 Take a heap snapshot (optional, for memory leaks)

docker exec -it <container-id> node --inspect-brk
# Open chrome://inspect and capture a heap snapshot.

4. Proven fixes – from “just works” to production‑grade

4.1 Scale the worker pool

Set the number of Node.js workers

services:
  n8n:
    environment:
      - EXECUTIONS_PROCESS=3          # three independent workers
      - EXECUTIONS_TIMEOUT=300       # abort long‑running jobs

Allocate CPU and memory resources

    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 3G

*Why it works:* Each worker runs its own event loop, so a single heavy workflow no longer stalls the whole system.

4.2 Optimize workflow design

Anti‑pattern Remedy
Large JSON.parse on a 10 MB payload Use the **Binary Data** node to stream chunks; avoid full in‑memory parsing.
Nested loops with no break condition Add a **max‑iterations** limit or break early with IF nodes.
Repeated DB writes inside a loop Batch writes using multi‑row INSERT syntax in the **Execute Query** node.
Custom JavaScript node that blocks Refactor to asynchronous (await) code or move heavy computation to an external microservice (e.g., AWS Lambda).

4.3 Tune the database pool

# .env (or Docker env)
DB_TYPE=postgresdb
DB_MAX_POOL_SIZE=25
DB_TIMEOUT=20000   # 20 s before DB request fails

EEFA warning: Setting DB_MAX_POOL_SIZE too high can exhaust the DB’s max connections. Keep it ≤ 80 % of the DB server’s max_connections. If you encounter any n8n works in staging but slows down in production resolve them before continuing with the setup.

4.4 Enable Prometheus monitoring & alerts

services:
  n8n:
    environment:
      - METRICS=true
      - METRICS_PORT=5679

Alert rule for event‑loop lag

- alert: N8NEventLoopLag
  expr: avg_over_time(nodejs_eventloop_lag_seconds[1m]) > 0.2
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "n8n event‑loop lag > 200 ms"
    description: "Investigate heavy workflows or custom nodes."

4.5 Graceful restarts with healthchecks

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:5678/healthz"]
  interval: 30s
  timeout: 5s
  retries: 3
  start_period: 10s

When the healthcheck fails, Docker restarts the container, clearing stuck workers before users notice a freeze.


5. Production‑ready checklist

Item Verification command
CPU ≥ 2 cores allocated docker inspect <container> –format='{{.HostConfig.NanoCpus}}’
EXECUTIONS_PROCESS ≥ 2 docker exec n8n env | grep EXECUTIONS_PROCESS
Memory limit ≥ 2 GiB docker stats → Memory column
DB pool size tuned docker exec n8n env | grep DB_MAX_POOL_SIZE
Event‑loop lag < 200 ms Run the lag script from §3 and observe drift
Prometheus metrics scraped curl http://localhost:5679/metrics | grep n8n_
Healthcheck passes docker inspect –format='{{json .State.Health}}’ <container>
No long‑running sync JavaScript Code review; enforce await usage
Alert for queue length in place n8n_execution_queue_length > 50 → Slack/PagerDuty

6. Frequently asked “edge” questions

Question Short answer
Why does increasing EXECUTIONS_PROCESS sometimes make things worse? With only one CPU core, extra workers compete for the same core, adding context‑switch overhead. Pair workers with matching CPU allocation.
Can I use Redis as a queue instead of the built‑in DB? Yes – set QUEUE_BROKER=redis and configure REDIS_HOST. This offloads queuing and reduces DB contention.
Is there a way to auto‑scale n8n workers in Kubernetes? Deploy n8n as a **Deployment** with a **HorizontalPodAutoscaler** that watches CPUUtilization and the custom metric n8n_execution_queue_length.
My workflow imports a large CSV and still freezes. Any tricks? Stream the CSV with **Read Binary File** + **Parse CSV** in *chunked* mode, or pre‑process the file in a separate service (e.g., AWS Lambda) and feed only needed rows to n8n.

Bottom line: A frozen n8n instance under load is almost always a resource‑orchestration issue rather than a core engine bug. By profiling the event loop, scaling workers, tuning DB pools, and monitoring key metrics, you can turn a “seems‑stuck” system into a resilient, production‑grade automation hub.

Leave a Comment

Your email address will not be published. Required fields are marked *