Who this is for: DevOps, platform engineers, and n8n administrators who run self‑hosted n8n instances and need to keep CPU, memory, and cloud costs under control. We cover this in detail in the n8n Cost, Scaling & Infrastructure Economics Guide.
Quick Diagnosis: Is Your n8n Instance Running Too Many Workers?
| Symptom | Typical Cause | Featured‑Snippet Fix |
|---|---|---|
| CPU > 80 % on idle periods, memory spikes, and queue length stays near 0 | Over‑provisioned worker pool – more EXECUTIONS_WORKER_COUNT than the workload needs |
Reduce EXECUTIONS_WORKER_COUNT to the smallest value that keeps the queue < 5 tasks at peak, then restart n8n. |
One‑line fix (Docker‑Compose):
sed -i 's/EXECUTIONS_WORKER_COUNT=.*/EXECUTIONS_WORKER_COUNT=2/' .env && \ docker compose up -d n8n
TL;DR: High CPU or memory while the execution queue stays empty indicates you can lower
EXECUTIONS_WORKER_COUNTand watch the queue for a few minutes.
In production this shows up when the instance is idle but still burns CPU.
1. Understanding n8n Worker Architecture
If you encounter any reducing n8n infrastructure cost resolve them before continuing with the setup.
1.1 What a “Worker” Does
- Pulls execution jobs from Redis (or the DB) queue.
- Runs each workflow step in an isolated JavaScript VM, providing sandboxing.
- Handles one execution at a time unless you enable
MAX_EXECUTIONS_PER_WORKER> 1.
Each worker is a tiny, independent process – essentially a dedicated engine that runs only when there’s work.
1.2 How Workers Are Spawned
| Deployment type | Default worker count | Override method |
|---|---|---|
| Docker (official image) | 1 (unless overridden) | ENV EXECUTIONS_WORKER_COUNT=4 |
| Kubernetes Helm chart | 2 (via worker.replicaCount) |
Set worker.replicaCount=3 in values.yaml |
| Self‑hosted (npm) | 1 (single‑process mode) | n8n start --worker-count=3 |
EEFA note: In production each worker should have a dedicated CPU‑core limit (e.g.,
cpu: "500m"in K8s) to avoid noisy‑neighbor problems.
2. When Over‑Provisioning Happens
| Trigger | Why It Leads to Over‑Provisioning |
|---|---|
Static EXECUTIONS_WORKER_COUNT set high for a short traffic spike |
Workers stay alive after the load drops. |
| Auto‑scaling rules based only on CPU > 70 % | CPU spikes from background tasks spin up extra workers unnecessarily. |
MAX_EXECUTIONS_PER_WORKER > 1 combined with many workers |
Each worker tries to run multiple executions, inflating memory use. |
Legacy Docker‑Compose with restart: always |
Crashed workers are instantly respawned, creating duplicates. |
3. Right‑Sizing Workers for Your Load
Gather real metrics, compute the minimum worker count, apply the new settings, then verify they match expectations. If you encounter any n8n idle resource waste explained resolve them before continuing with the setup.
3.1 Gather Baseline Metrics
Expose Prometheus metrics, then pull the worker‑related values:
docker compose exec n8n curl -s http://localhost:5678/metrics | \ grep n8n_worker
| Metric | Meaning | Target |
|---|---|---|
| n8n_worker_active_total | Currently running workers | ≤ peak concurrent executions |
| n8n_execution_queue_length | Jobs waiting for a worker | < 5 (ideal) |
| process_cpu_seconds_total (per worker) | CPU usage per worker | < 0.5 CPU on average |
EEFA tip: Ship these metrics to Prometheus + Grafana and set alerts on
queue_length > 10orcpu_seconds_total > 0.8.
3.2 Calculate Minimum Workers
Apply this formula:
minimum_workers = ceil(peak_concurrent_executions / MAX_EXECUTIONS_PER_WORKER)
– Peak concurrent executions = max simultaneous workflows (check n8n_execution_active_total).
– MAX_EXECUTIONS_PER_WORKER defaults to 1; raise it only if you have high‑memory nodes.
Example
| Observation | Value |
|---|---|
| Peak concurrent executions (last 24 h) | 7 |
| MAX_EXECUTIONS_PER_WORKER (custom) | 2 |
| Required workers | ceil(7 / 2) = 4 |
3.3 Apply the New Worker Count
Docker‑Compose
Add the environment variables to .env, then force a fresh start:
EXECUTIONS_WORKER_COUNT=4 MAX_EXECUTIONS_PER_WORKER=2
docker compose up -d --force-recreate n8n
Kubernetes (Helm)
Update values.yaml with the desired replica count and env var:
worker:
replicaCount: 4
env:
- name: MAX_EXECUTIONS_PER_WORKER
value: "2"
Apply the change:
helm upgrade --install n8n n8n/n8n -f values.yaml
3.4 Validate the New Count
Query the metrics endpoint again:
curl -s http://localhost:5678/metrics | \ grep n8n_worker_active_total
| Expected | Observed |
|---|---|
| n8n_worker_active_total 4 | ✅ |
At this point, checking the metric is usually faster than waiting for a slow queue to fill.
4. Advanced Auto‑Scaling Strategies (Avoid Over‑Provisioning)
| Strategy | How It Works | Pros | Cons |
|---|---|---|---|
| CPU‑only HPA (K8s) | Scales when cpuUtilization > 70 % |
Simple to configure | Ignores queue backlog; may add workers during brief CPU spikes. |
| Queue‑Length HPA (custom metric) | Uses n8n_execution_queue_length as scaling metric |
Directly matches workload demand | Needs Prometheus Adapter or custom metrics server. |
| Hybrid Policy (CPU + Queue) | Scale up if **either** CPU > 70 % **or** queue > 10 | Balances responsiveness and cost | More complex rule set. |
| Scheduled Scaling | Pre‑scale during known traffic windows (e.g., nightly batch jobs) | Predictable cost | Requires accurate schedule; may still over‑provision if jobs finish early. |
If you’ve trimmed the worker count and still see spikes, auto‑scaling is probably required. If you encounter any cost efficient scaling strategies n8n resolve them before continuing with the setup.
4.1 Example: Queue‑Length HPA (Kubernetes)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: n8n-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: n8n-worker
minReplicas: 2
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: n8n_execution_queue_length
selector:
matchLabels:
app: n8n
target:
type: AverageValue
averageValue: "5"
EEFA note: Ensure the Prometheus Adapter exposes
n8n_execution_queue_lengthwith the labelapp=n8n; otherwise the HPA will never fire.
5. Troubleshooting Over‑Provisioned Workers
| Symptom | Likely Root Cause | Fix |
|---|---|---|
| Multiple identical containers after a crash | restart: always + docker compose up without --scale |
Remove stale containers: docker compose rm -f then redeploy. |
| Memory OOM kills despite low queue | MAX_EXECUTIONS_PER_WORKER > 1 causing memory accumulation |
Set MAX_EXECUTIONS_PER_WORKER=1 or increase container memory limits. |
| Persistent high CPU after scaling down | Workers not shutting down gracefully (bug in older n8n v0.220) | Upgrade to the latest n8n version (npm i -g n8n@latest or pull latest Docker image). |
| Queue never empties after reducing workers | Back‑pressure from throttled external APIs | Add exponential back‑off in the affected node or raise EXECUTIONS_TIMEOUT. |
5.1 Checklist for a Clean Worker Reset
- Stop n8n (
docker compose downorkubectl scale deployment n8n-worker --replicas=0). - Purge stale Redis keys:
redis-cli KEYS "n8n:queue:*" | xargs redis-cli DEL
- Verify no leftover Docker containers:
docker ps -a | grep n8n-worker. - Restart with the new worker count.
- Monitor for at least 10 minutes; ensure
queue_lengthstays ≤ 5.
Most teams run into stale containers after a crash, not on day one.
6. Cost‑Optimization Summary
| Action | Estimated Savings (per month) | Impact on Performance |
|---|---|---|
Reduce EXECUTIONS_WORKER_COUNT from 8 → 4 (CPU 2 vCPU each) |
~ $120 (cloud instance) | No impact if peak concurrency ≤ 4 |
| Switch from **CPU‑only HPA** to **Queue‑Length HPA** | ~ $45 (fewer idle pods) | Faster response to spikes |
Set MAX_EXECUTIONS_PER_WORKER=2 on high‑memory nodes |
Up to 30 % fewer pods | Slight latency increase (≈ 0.2 s per execution) |
A quick reduction in worker count often pays for itself within a week.
Conclusion
Over‑provisioning workers wastes CPU, memory, and money. It doesn’t improve throughput. By measuring real concurrency, applying a data‑driven worker count, and using queue‑aware auto‑scaling, you keep n8n lean, responsive, and cost‑effective. Implement the steps above, monitor the key metrics, and let the system scale only when the execution queue truly needs it.



