n8n freezes under load but doesn't crash - worker concurrency fix

Step by Step Guide to solve n8n freezes under load but doesnt crash

Who this is for: Ops engineers and platform developers running n8n in Docker or Kubernetes who need to keep the service responsive under production traffic. We cover this in detail in the n8n Performance Degradation & Stability Issues Guide.

Quick diagnosis

When n8n stops responding while the Docker container stays up, the usual suspects are:

Event‑loop blockage – sync‑heavy code or endless loops.
Memory pressure – V8 can’t reclaim fast enough.
Resource limits – CPU throttling or DB connection caps.

If the UI hangs, first look at CPU/Memory (docker stats) and the event‑loop lag (≈ 200 ms is a warning). Reduce concurrent executions or raise the EXECUTIONS_PROCESS count, then restart the container.

1. Why n8n “freezes” instead of crashing?

If you encounter any n8n becomes unstable after high volume runs why and fix resolve them before continuing with the setup.

Symptom	Underlying mechanism
HTTP requests time‑out but container stays alive	Node.js event loop blocked (sync‑heavy code, large JSON parsing, endless loops)
CPU spikes at 100 % and UI stops updating	Single‑thread saturation – all executions share one Node.js process by default
Memory climbs to the Docker limit and OOM‑killer does not kill	V8 garbage collector can’t keep up; Docker’s soft limit only throttles
Logs show “Waiting for execution” indefinitely	Job queue back‑pressure – internal queue full, workers waiting for DB connections

EEFA note: Docker’s OOM‑killer only terminates the process when the hard memory limit is exceeded. Most “freezes” happen because the process is still alive but can’t make progress.

2. Core resources that dictate n8n’s throughput

Resource	Default	Production recommendation
CPU cores	1 (single‑threaded)	–cpus=2 or more (Docker)
Node workers (`EXECUTIONS_PROCESS`)	1	2‑4 (match CPU count)
DB connection pool (`DB_MAX_POOL_SIZE`)	10	20‑30 for PostgreSQL/MySQL
Memory limit	512 MiB (Docker default)	2‑4 GiB (adjust to payload size)
Execution timeout (`EXECUTIONS_TIMEOUT`)	3600 s	300 s (or lower)

EEFA tip: Set Docker --memory-swap to the same value as --memory to disable swap; swapping inflates latency and makes the UI appear frozen. If you encounter any n8n starts fast but degrades under continuous load resolve them before continuing with the setup.

3. Step‑by‑step: Diagnose a frozen instance

3.1 Inspect container metrics

docker stats $(docker ps -q --filter "name=n8n")

*Look for CPU > 90 % and Memory > 80 %.*

3.2 Check the Node.js event‑loop lag

docker exec -it <container-id> bash
node -e "setInterval(()=>{const start=process.hrtime.bigint(); while(Number(process.hrtime.bigint()-start)<1e9){}}, 1000)"

*If the interval drifts > 200 ms, the loop is blocked.*

3.3 Query n8n’s internal metrics (Prometheus exporter)

curl http://localhost:5678/metrics | grep n8n_execution_

*High n8n_execution_queue_length signals back‑pressure.*

3.4 Review recent logs

docker logs --tail 100 <container-id> | grep -i "error\|warning"

*Watch for DB timeouts, ERR_WORKFLOW_EXECUTION_TIMEOUT, or ERR_MAX_QUEUE_SIZE.*

3.5 Take a heap snapshot (optional, for memory leaks)

docker exec -it <container-id> node --inspect-brk
# Open chrome://inspect and capture a heap snapshot.

4. Proven fixes – from “just works” to production‑grade

4.1 Scale the worker pool

Set the number of Node.js workers

services:
  n8n:
    environment:
      - EXECUTIONS_PROCESS=3          # three independent workers
      - EXECUTIONS_TIMEOUT=300       # abort long‑running jobs

Allocate CPU and memory resources

    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 3G

*Why it works:* Each worker runs its own event loop, so a single heavy workflow no longer stalls the whole system.

4.2 Optimize workflow design

Anti‑pattern	Remedy
Large `JSON.parse` on a 10 MB payload	Use the Binary Data node to stream chunks; avoid full in‑memory parsing.
Nested loops with no break condition	Add a max‑iterations limit or break early with `IF` nodes.
Repeated DB writes inside a loop	Batch writes using multi‑row `INSERT` syntax in the Execute Query node.
Custom JavaScript node that blocks	Refactor to asynchronous (`await`) code or move heavy computation to an external microservice (e.g., AWS Lambda).

4.3 Tune the database pool

# .env (or Docker env)
DB_TYPE=postgresdb
DB_MAX_POOL_SIZE=25
DB_TIMEOUT=20000   # 20 s before DB request fails

EEFA warning: Setting DB_MAX_POOL_SIZE too high can exhaust the DB’s max connections. Keep it ≤ 80 % of the DB server’s max_connections. If you encounter any n8n works in staging but slows down in production resolve them before continuing with the setup.

4.4 Enable Prometheus monitoring & alerts

services:
  n8n:
    environment:
      - METRICS=true
      - METRICS_PORT=5679

Alert rule for event‑loop lag

- alert: N8NEventLoopLag
  expr: avg_over_time(nodejs_eventloop_lag_seconds[1m]) > 0.2
  for: 2m
  labels:
    severity: warning
  annotations:
    summary: "n8n event‑loop lag > 200 ms"
    description: "Investigate heavy workflows or custom nodes."

4.5 Graceful restarts with healthchecks

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:5678/healthz"]
  interval: 30s
  timeout: 5s
  retries: 3
  start_period: 10s

When the healthcheck fails, Docker restarts the container, clearing stuck workers before users notice a freeze.

5. Production‑ready checklist

Item	Verification command
CPU ≥ 2 cores allocated	docker inspect <container> –format='{{.HostConfig.NanoCpus}}’
`EXECUTIONS_PROCESS` ≥ 2	docker exec n8n env \| grep EXECUTIONS_PROCESS
Memory limit ≥ 2 GiB	docker stats → Memory column
DB pool size tuned	docker exec n8n env \| grep DB_MAX_POOL_SIZE
Event‑loop lag < 200 ms	Run the lag script from §3 and observe drift
Prometheus metrics scraped	curl http://localhost:5679/metrics \| grep n8n_
Healthcheck passes	docker inspect –format='{{json .State.Health}}’ <container>
No long‑running sync JavaScript	Code review; enforce `await` usage
Alert for queue length in place	`n8n_execution_queue_length > 50` → Slack/PagerDuty

6. Frequently asked “edge” questions

Question	Short answer
Why does increasing `EXECUTIONS_PROCESS` sometimes make things worse?	With only one CPU core, extra workers compete for the same core, adding context‑switch overhead. Pair workers with matching CPU allocation.
Can I use Redis as a queue instead of the built‑in DB?	Yes – set `QUEUE_BROKER=redis` and configure `REDIS_HOST`. This offloads queuing and reduces DB contention.
Is there a way to auto‑scale n8n workers in Kubernetes?	Deploy n8n as a Deployment with a HorizontalPodAutoscaler that watches `CPUUtilization` and the custom metric `n8n_execution_queue_length`.
My workflow imports a large CSV and still freezes. Any tricks?	Stream the CSV with Read Binary File + Parse CSV in chunked mode, or pre‑process the file in a separate service (e.g., AWS Lambda) and feed only needed rows to n8n.

Bottom line: A frozen n8n instance under load is almost always a resource‑orchestration issue rather than a core engine bug. By profiling the event loop, scaling workers, tuning DB pools, and monitoring key metrics, you can turn a “seems‑stuck” system into a resilient, production‑grade automation hub.

n8n freezes under load but doesn’t crash – worker concurrency fix

Quick diagnosis

1. Why n8n “freezes” instead of crashing?

2. Core resources that dictate n8n’s throughput

3. Step‑by‑step: Diagnose a frozen instance

3.1 Inspect container metrics

3.2 Check the Node.js event‑loop lag

3.3 Query n8n’s internal metrics (Prometheus exporter)

3.4 Review recent logs

3.5 Take a heap snapshot (optional, for memory leaks)

4. Proven fixes – from “just works” to production‑grade

4.1 Scale the worker pool

4.2 Optimize workflow design

4.3 Tune the database pool

4.4 Enable Prometheus monitoring & alerts

4.5 Graceful restarts with healthchecks

5. Production‑ready checklist

6. Frequently asked “edge” questions

Leave a Comment Cancel Reply

Sign up for Newsletter

Quick diagnosis

1. Why n8n “freezes” instead of crashing?

2. Core resources that dictate n8n’s throughput

3. Step‑by‑step: Diagnose a frozen instance

3.1 Inspect container metrics

3.2 Check the Node.js event‑loop lag

3.3 Query n8n’s internal metrics (Prometheus exporter)

3.4 Review recent logs

3.5 Take a heap snapshot (optional, for memory leaks)

4. Proven fixes – from “just works” to production‑grade

4.1 Scale the worker pool

4.2 Optimize workflow design

4.3 Tune the database pool

4.4 Enable Prometheus monitoring & alerts

4.5 Graceful restarts with healthchecks

5. Production‑ready checklist

6. Frequently asked “edge” questions

Must Read

Leave a Comment Cancel Reply