Who this is for: Developers and ops engineers running n8n in production who need reliable, low‑latency workflow execution. We cover this in detail in the n8n Production Readiness & Scalability Risks Guide.
Quick Diagnosis
If the Node.js event loop in n8n regularly spikes > 100 ms latency, the system is likely experiencing event‑loop starvation. The most common culprits are:
| # | Real cause | Typical symptom | Quick fix |
|---|---|---|---|
| 1 | Synchronous, CPU‑heavy code in custom nodes | Workflow hangs for seconds; logs show “Execution timed out” | Move heavy work to a worker thread or async function |
| 2 | Unbounded parallel executions (maxConcurrency) |
Sudden latency spikes when many workflows start together | Lower maxConcurrency or enable queue‑mode |
| 3 | Blocking I/O (slow HTTP calls, DB queries) | Event‑loop lag correlates with external API latency | Use time‑outs & retries, or off‑load to a separate process |
| 4 | Mis‑configured worker pool (workerThreads) |
“Cannot create more workers” errors, high CPU | Increase workerPoolSize or cap concurrent workers |
| 5 | Large JSON payloads & deep cloning | Memory spikes, GC pauses → event‑loop stalls | Stream data, avoid JSON.parse/stringify on huge blobs |
In typical deployments symptoms appear after a few weeks of steady traffic, not on day one.
Snippet Ready
Run the snippet below on each n8n instance. If the printed lag consistently exceeds 100 ms, starvation is present.
// monitor-event-loop.js – import & configure the monitor
const { monitorEventLoopDelay } = require('perf_hooks');
const delay = monitorEventLoopDelay({ resolution: 20 });
delay.enable();
// monitor-event-loop.js – report lag every 5 s
setInterval(() => {
const lag = delay.mean / 1e6; // ns → ms
console.log(`Event‑loop lag: ${lag.toFixed(2)} ms`);
if (lag > 100) console.warn('⚠️ High lag detected – investigate!');
}, 5000);
Execute with node monitor-event-loop.js.
1. Why Event‑Loop Starvation Matters for n8n?
If you encounter any n8n worker memory ownership resolve them before continuing with the setup.
n8n runs every node inside the same Node.js process (unless worker threads are enabled). When any part of that process blocks the single‑threaded event loop, all workflows suffer, leading to timeouts, missed triggers, and a cascade of failed jobs. The workflow engine amplifies the impact because each incoming webhook, cron, or queue entry competes for the same loop.
EEFA note: In production, a single blocked micro‑second can multiply across dozens of concurrent workflows, inflating latency exponentially. Treat the event loop as a shared resource with strict SLAs.
2. Synchronous CPU‑Intensive Code in Custom Nodes
If you encounter any long json payloads n8n performance resolve them before continuing with the setup.
Micro‑summary: Identify and replace blocking patterns (large loops, sync crypto, massive JSON ops) with async or worker‑thread alternatives.
2.1 Typical blocking patterns
| Pattern | Example | Why it blocks |
|---|---|---|
| `for` loops over large arrays | `for (let i = 0; i < data.length; i++) …` | Runs on the main thread, no I/O |
| `JSON.stringify` on > 10 MB objects | `const payload = JSON.stringify(bigObj);` | Full traversal before returning |
| Crypto hashing in sync mode | `crypto.createHash(‘sha256’).update(buf).digest(‘hex’);` | CPU‑bound, no async fallback |
2.2 Refactor to async / worker thread
Original (blocking) implementation
// custom-node.ts – blocking version
export async function execute(input: any) {
const result = heavySyncCalc(input);
return { result };
}
Refactored (non‑blocking) version – using a worker
// custom-node.ts – spawn a worker
import { Worker } from 'worker_threads';
export async function execute(input: any) {
return new Promise((resolve, reject) => {
const worker = new Worker('./heavy-calc-worker.js', {
workerData: input,
});
worker.on('message', resolve);
worker.on('error', reject);
});
}
EEFA warning: Do not spawn an unlimited number of workers per execution; cap them with
workerPoolSize(see Section 5).
At this point, moving the work to a worker is usually faster than micro‑optimising the loop.
3. Unbounded Parallel Execution
If you encounter any n8n redis latency impact resolve them before continuing with the setup.
Micro‑summary: Throttle concurrent workflows and optionally queue excess jobs to protect the event loop.
3.1 Detecting overload with a health endpoint
// health endpoint – basic Express setup
import express from 'express';
const app = express();
app.get('/health/loop', (req, res) => {
const pending = global.activeWorkflows?.size ?? 0;
res.json({ pending, maxConcurrency: process.env.N8N_MAX_CONCURRENCY });
});
app.listen(5678);
If pending consistently approaches maxConcurrency, throttling is required.
3.2 Mitigation steps
- Lower
N8N_MAX_CONCURRENCYin~/.n8n/.envN8N_MAX_CONCURRENCY=4
- Enable queue mode – n8n buffers excess executions instead of rejecting them.
N8N_QUEUE_MODE=true N8N_QUEUE_MAX=200 # max queued jobs
- Horizontal scaling – run multiple n8n containers behind a load balancer.
EEFA tip: Queue mode adds latency but protects the event loop; monitor queue length with the health endpoint above.
4. Blocking I/O – External APIs & Database Drivers
Micro‑summary: Ensure every network or file operation is asynchronous, has a timeout, and retries where appropriate.
4.1 Common blockers
| Blocker | Symptom | Fix |
|---|---|---|
axios without timeout |
Requests hang → event loop stalls while waiting | axios({ timeout: 5000 }) |
Synchronous DB driver (e.g., pg-sync) |
Whole process waits for DB response | Switch to async driver (pg with promises) |
fs.readFileSync on large files |
Immediate CPU freeze | Stream with fs.createReadStream |
4.2 Safe HTTP request example
// safeRequest.ts – import & options
import got from 'got';
export async function safeRequest(url: string) {
try {
const response = await got(url, {
timeout: { request: 4000 },
retry: { limit: 2 },
});
return JSON.parse(response.body);
} catch (err) {
throw new Error(`Request failed: ${err.message}`);
}
}
EEFA note: Set a request timeout lower than the workflow’s executionTimeout (default 60 s). A hanging request would otherwise consume the loop for the full timeout period.
5. Mis‑Configured Worker Pool & Execution Timeouts
Micro‑summary: Size the worker pool to match CPU capacity and enforce sensible memory limits.
5.1 Diagnosing worker exhaustion
# n8n ≥ 0.210 provides a health endpoint curl http://localhost:5678/health/worker
Typical JSON response:
{
"total": 4,
"busy": 4,
"queued": 12
}
5.2 Recommended environment settings
| Environment var | Recommended value | Reason |
|---|---|---|
N8N_WORKER_THREADS |
8 (≈ CPU cores × 2) | Enough parallelism without oversubscribing |
N8N_WORKER_MAX_MEMORY |
256 MB per worker | Prevents OOM that would stall the loop |
N8N_EXECUTION_TIMEOUT |
30 s (or lower) | Forces runaway jobs to abort early |
EEFA caution: Raising
N8N_WORKER_THREADSbeyond available cores can cause context‑switch thrashing, worsening starvation.
Remember to restart n8n after adjusting these variables.
6. Large JSON Payloads & Deep Cloning
Micro‑summary: Reduce payload size before cloning and use streaming where possible.
6.1 Spotting the issue
- Enable debug logs:
N8N_LOG_LEVEL=debug. Look forCloneOperation took X ms. - Observe GC pause spikes with
node --trace-gc.
6.2 Mitigation checklist
| Action | Implementation |
|---|---|
| Stream data instead of loading whole payload | Use node-fetch streams or csv-parser for large CSVs |
| Trim unnecessary fields before the clone | Add a “Set” node early in the workflow to keepOnly required keys |
| Increase V8 heap if unavoidable | NODE_OPTIONS="--max-old-space-size=4096" |
EEFA reminder: Over‑allocating heap can mask the problem but raises memory cost; prefer data reduction.
7. Diagnostic Checklist & Monitoring Setup
| Check | How to verify | Remediation |
|---|---|---|
| Event‑loop lag < 100 ms | Run monitor-event-loop.js |
Optimize blocking code |
| Worker pool not saturated | curl /health/worker |
Increase N8N_WORKER_THREADS or lower concurrency |
| No sync‑blocking calls in custom nodes | Search repo for *Sync( |
Refactor to async/worker |
| HTTP timeouts set | Review node configs or code | Add timeout options |
| Payload size < 5 MB before clone | Log item.length in a “Set” node |
Trim data, stream instead |
| Queue length < 50 | /health/loop endpoint | Adjust maxConcurrency or scale horizontally |
Add the health endpoints to Docker Compose or systemd services for continuous observability.
8. Immediate Remediation Steps (Step‑by‑Step)
Micro‑summary: A quick, repeatable process to bring lag back under control.
- Deploy the event‑loop monitor on every n8n instance.
- Identify the top‑3 lag sources using the tables in Sections 2‑6.
- Patch blocking code – replace sync functions with async/worker equivalents.
- Cap concurrency (
N8N_MAX_CONCURRENCY=4) and enable queue mode. - Restart n8n with updated env vars and verify lag drops below 100 ms.
If lag persists after step 5, proceed to the long‑term strategies in Section 9.
9. Long‑Term Prevention Strategies
| Strategy | Implementation details |
|---|---|
| Horizontal scaling | Run multiple n8n containers behind a reverse proxy; share jobs via Redis queue (N8N_QUEUE_BULL_REDIS_URL). |
| Static analysis CI | Add ESLint rule no-sync and a custom script that flags *Sync( usage in src/nodes/**/*.ts. |
| Automated load testing | Use k6 or Artillery to simulate webhook bursts; monitor /health/loop for regressions before each release. |
| Observability stack | Export process.hrtime and workerPool metrics to Prometheus; alert on event_loop_lag_seconds > 0.1. |
| Version lock | Keep n8n on a stable LTS release; major upgrades often include event‑loop optimizations. |
Conclusion
Event‑loop starvation in n8n is almost always traceable to one of the five real causes listed above. By instrumenting lag monitoring, refactoring blocking code, throttling concurrency, configuring a healthy worker pool, and keeping payloads small, the shared event loop remains responsive and workflow execution meets SLA expectations. Apply the immediate remediation checklist now, then adopt the long‑term strategies to keep n8n performant as automation volume grows.



