
Who this is for: Ops engineers and platform developers running n8n in Docker or Kubernetes who need to keep the service responsive under production traffic. We cover this in detail in the n8n Performance Degradation & Stability Issues Guide.
Quick diagnosis
When n8n stops responding while the Docker container stays up, the usual suspects are:
- Event‑loop blockage – sync‑heavy code or endless loops.
- Memory pressure – V8 can’t reclaim fast enough.
- Resource limits – CPU throttling or DB connection caps.
If the UI hangs, first look at CPU/Memory (docker stats) and the event‑loop lag (≈ 200 ms is a warning). Reduce concurrent executions or raise the EXECUTIONS_PROCESS count, then restart the container.
1. Why n8n “freezes” instead of crashing?
If you encounter any n8n becomes unstable after high volume runs why and fix resolve them before continuing with the setup.
| Symptom | Underlying mechanism |
|---|---|
| HTTP requests time‑out but container stays alive | Node.js event loop blocked (sync‑heavy code, large JSON parsing, endless loops) |
| CPU spikes at 100 % and UI stops updating | Single‑thread saturation – all executions share one Node.js process by default |
| Memory climbs to the Docker limit and OOM‑killer does not kill | V8 garbage collector can’t keep up; Docker’s soft limit only throttles |
| Logs show “Waiting for execution” indefinitely | Job queue back‑pressure – internal queue full, workers waiting for DB connections |
EEFA note: Docker’s OOM‑killer only terminates the process when the hard memory limit is exceeded. Most “freezes” happen because the process is still alive but can’t make progress.
2. Core resources that dictate n8n’s throughput
| Resource | Default | Production recommendation |
|---|---|---|
| CPU cores | 1 (single‑threaded) | –cpus=2 or more (Docker) |
Node workers (EXECUTIONS_PROCESS) |
1 | 2‑4 (match CPU count) |
DB connection pool (DB_MAX_POOL_SIZE) |
10 | 20‑30 for PostgreSQL/MySQL |
| Memory limit | 512 MiB (Docker default) | 2‑4 GiB (adjust to payload size) |
Execution timeout (EXECUTIONS_TIMEOUT) |
3600 s | 300 s (or lower) |
EEFA tip: Set Docker
--memory-swapto the same value as--memoryto disable swap; swapping inflates latency and makes the UI appear frozen. If you encounter any n8n starts fast but degrades under continuous load resolve them before continuing with the setup.
3. Step‑by‑step: Diagnose a frozen instance
3.1 Inspect container metrics
docker stats $(docker ps -q --filter "name=n8n")
*Look for CPU > 90 % and Memory > 80 %.*
3.2 Check the Node.js event‑loop lag
docker exec -it <container-id> bash
node -e "setInterval(()=>{const start=process.hrtime.bigint(); while(Number(process.hrtime.bigint()-start)<1e9){}}, 1000)"
*If the interval drifts > 200 ms, the loop is blocked.*
3.3 Query n8n’s internal metrics (Prometheus exporter)
curl http://localhost:5678/metrics | grep n8n_execution_
*High n8n_execution_queue_length signals back‑pressure.*
3.4 Review recent logs
docker logs --tail 100 <container-id> | grep -i "error\|warning"
*Watch for DB timeouts, ERR_WORKFLOW_EXECUTION_TIMEOUT, or ERR_MAX_QUEUE_SIZE.*
3.5 Take a heap snapshot (optional, for memory leaks)
docker exec -it <container-id> node --inspect-brk # Open chrome://inspect and capture a heap snapshot.
4. Proven fixes – from “just works” to production‑grade
4.1 Scale the worker pool
Set the number of Node.js workers
services:
n8n:
environment:
- EXECUTIONS_PROCESS=3 # three independent workers
- EXECUTIONS_TIMEOUT=300 # abort long‑running jobs
Allocate CPU and memory resources
deploy:
resources:
limits:
cpus: '2.0'
memory: 3G
*Why it works:* Each worker runs its own event loop, so a single heavy workflow no longer stalls the whole system.
4.2 Optimize workflow design
| Anti‑pattern | Remedy |
|---|---|
Large JSON.parse on a 10 MB payload |
Use the **Binary Data** node to stream chunks; avoid full in‑memory parsing. |
| Nested loops with no break condition | Add a **max‑iterations** limit or break early with IF nodes. |
| Repeated DB writes inside a loop | Batch writes using multi‑row INSERT syntax in the **Execute Query** node. |
| Custom JavaScript node that blocks | Refactor to asynchronous (await) code or move heavy computation to an external microservice (e.g., AWS Lambda). |
4.3 Tune the database pool
# .env (or Docker env) DB_TYPE=postgresdb DB_MAX_POOL_SIZE=25 DB_TIMEOUT=20000 # 20 s before DB request fails
EEFA warning: Setting
DB_MAX_POOL_SIZEtoo high can exhaust the DB’s max connections. Keep it ≤ 80 % of the DB server’smax_connections. If you encounter any n8n works in staging but slows down in production resolve them before continuing with the setup.
4.4 Enable Prometheus monitoring & alerts
services:
n8n:
environment:
- METRICS=true
- METRICS_PORT=5679
Alert rule for event‑loop lag
- alert: N8NEventLoopLag
expr: avg_over_time(nodejs_eventloop_lag_seconds[1m]) > 0.2
for: 2m
labels:
severity: warning
annotations:
summary: "n8n event‑loop lag > 200 ms"
description: "Investigate heavy workflows or custom nodes."
4.5 Graceful restarts with healthchecks
healthcheck: test: ["CMD", "curl", "-f", "http://localhost:5678/healthz"] interval: 30s timeout: 5s retries: 3 start_period: 10s
When the healthcheck fails, Docker restarts the container, clearing stuck workers before users notice a freeze.
5. Production‑ready checklist
| Item | Verification command |
|---|---|
| CPU ≥ 2 cores allocated | docker inspect <container> –format='{{.HostConfig.NanoCpus}}’ |
EXECUTIONS_PROCESS ≥ 2 |
docker exec n8n env | grep EXECUTIONS_PROCESS |
| Memory limit ≥ 2 GiB | docker stats → Memory column |
| DB pool size tuned | docker exec n8n env | grep DB_MAX_POOL_SIZE |
| Event‑loop lag < 200 ms | Run the lag script from §3 and observe drift |
| Prometheus metrics scraped | curl http://localhost:5679/metrics | grep n8n_ |
| Healthcheck passes | docker inspect –format='{{json .State.Health}}’ <container> |
| No long‑running sync JavaScript | Code review; enforce await usage |
| Alert for queue length in place | n8n_execution_queue_length > 50 → Slack/PagerDuty |
6. Frequently asked “edge” questions
| Question | Short answer |
|---|---|
Why does increasing EXECUTIONS_PROCESS sometimes make things worse? |
With only one CPU core, extra workers compete for the same core, adding context‑switch overhead. Pair workers with matching CPU allocation. |
| Can I use Redis as a queue instead of the built‑in DB? | Yes – set QUEUE_BROKER=redis and configure REDIS_HOST. This offloads queuing and reduces DB contention. |
| Is there a way to auto‑scale n8n workers in Kubernetes? | Deploy n8n as a **Deployment** with a **HorizontalPodAutoscaler** that watches CPUUtilization and the custom metric n8n_execution_queue_length. |
| My workflow imports a large CSV and still freezes. Any tricks? | Stream the CSV with **Read Binary File** + **Parse CSV** in *chunked* mode, or pre‑process the file in a separate service (e.g., AWS Lambda) and feed only needed rows to n8n. |
Bottom line: A frozen n8n instance under load is almost always a resource‑orchestration issue rather than a core engine bug. By profiling the event loop, scaling workers, tuning DB pools, and monitoring key metrics, you can turn a “seems‑stuck” system into a resilient, production‑grade automation hub.



