Who this is for: Engineers running n8n in production who need to keep their execution queues thin, CPU low, and external APIs happy. We cover this in detail in the n8n Performance & Scaling Guide.
Quick Diagnosis
| Step | Action | Config Detail |
|---|---|---|
| 1 | Disable global “Retry on Failure” for low‑risk nodes | node.retryOnFail = false |
| 2 | Add a Retry node with exponential back‑off (max 3 attempts, 2 s base) | {{ $json["attempt"] || 0 }} + Math.pow(2, $json["attempt"]) * 1000 |
| 3 | Insert a Circuit Breaker Function node to pause calls after 5 consecutive failures for 30 s | if (failCount >= 5) return [{ pause: true }]; |
| 4 | Route all errors to a dedicated Error Workflow that logs, alerts, and optionally re‑queues | Use “Execute Workflow” node with Error Trigger |
| 5 | Enable Rate Limiting on external API calls (e.g., 10 req/s) | Set maxConcurrent in the HTTP Request node |
Apply these five steps and you’ll eliminate retry storms, lower CPU load, and keep the execution queue moving.
1. Default Error Handling in n8n
If you encounter any fallback and retry strategies resolve them before continuing with the setup.
| Component | Default Behaviour |
|---|---|
| Node‑level retry | Retries instantly up to 5 times (configurable per node) |
| Workflow‑level “Continue On Fail” | Skips failed nodes, continues downstream |
| Error Trigger | Starts a new workflow only when a node throws an error |
Why it matters – The out‑of‑the‑box retry policy favors reliability but can flood the queue when an upstream service is down. In high‑throughput environments you must tighten retries to avoid retry storms.
2. Efficient Retry Strategies
2.1 Use the Retry Node (v1.2+)
The Retry node lets you define back‑off logic in a single place.
Retry node definition (≈5 lines)
{
"name": "Retry HTTP",
"type": "n8n-nodes-base.retry",
"typeVersion": 1,
"parameters": {
"maxAttempts": 3,
"delay": "={{ Math.pow(2, $json.attempt) * 1000 }}"
}
}
HTTP request node (turn off its own retry)
{
"name": "HTTP Request",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 1,
"parameters": {
"url": "https://api.example.com/data",
"method": "GET",
"retryOnFail": false
}
}
Connection – Wire Retry HTTP → HTTP Request.
Result: Exponential back‑off (1 s → 2 s → 4 s) with a hard limit of three attempts, preventing runaway queues.
2.2 Global Retry Overrides (n8n.config.js)
module.exports = {
workflow: {
defaultRetry: {
maxAttempts: 2,
delay: 2000 // 2 seconds fixed
},
},
};
Tip – Test this change in a staging environment; it affects every workflow lacking an explicit retry configuration.
3. Circuit‑Breaker Pattern
A circuit breaker stops calls to a flaky service after a failure threshold, then pauses before allowing new attempts. If you encounter any concurrency management resolve them before continuing with the setup.
3.1 Function Node – Setup (Redis client & constants)
const redis = require('redis').createClient();
const key = 'circuit:api.example.com';
const maxFails = 5;
const pauseMs = 30000; // 30 s
3.2 Retrieve Current State
let state = await redis.get(key);
state = state ? JSON.parse(state) : { failCount: 0, lockedUntil: 0 };
3.3 Evaluate Circuit & Short‑Circuit if Open
if (Date.now() < state.lockedUntil) {
return [{ json: { error: 'Circuit open, request paused' } }];
}
3.4 Update State Based on Outcome
if ($json.success) {
state = { failCount: 0, lockedUntil: 0 };
} else {
state.failCount += 1;
if (state.failCount >= maxFails) {
state.lockedUntil = Date.now() + pauseMs;
await this.helpers.sendMessageToWebhook('https://hooks.slack.com/...', {
text: `🚨 Circuit breaker opened for api.example.com`,
});
}
}
await redis.set(key, JSON.stringify(state));
return [{ json: $json }];
Wiring – HTTP Request → Circuit Breaker Function → downstream nodes. Connect the function’s *Error Trigger* to an error‑handling workflow for metrics.
EEFA note – Redis must be HA (Sentinel or cluster) to avoid a single point of failure that could block all traffic.
4. Dedicated Error Workflows
Isolate heavy logging, alerting, and optional re‑queue logic from the main data path. If you encounter any webhook throughput resolve them before continuing with the setup.
4.1 Error Trigger Node
{
"name": "Error Trigger",
"type": "n8n-nodes-base.errorTrigger",
"typeVersion": 1
}
4.2 Log to Elasticsearch
{
"name": "Log to Elasticsearch",
"type": "n8n-nodes-base.elasticsearch",
"typeVersion": 1,
"parameters": {
"operation": "index",
"index": "n8n-errors",
"document": "={{ $json }}"
}
}
4.3 Slack Alert Node
{
"name": "Slack Alert",
"type": "n8n-nodes-base.slack",
"typeVersion": 1,
"parameters": {
"channel": "#n8n-alerts",
"text": "❗️ n8n error in workflow {{ $workflow.name }}: {{ $json.message }}"
}
}
4.4 Connections
{
"connections": {
"Error Trigger": {
"main": [
[
{ "node": "Log to Elasticsearch", "type": "main", "index": 0 },
{ "node": "Slack Alert", "type": "main", "index": 0 }
]
]
}
}
}
Hook in the main workflow – Add an Execute Workflow node, enable Run on Error, and point to the error workflow above. Keep the error workflow lightweight; defer heavy processing to a batch job or separate queue.
5. Rate Limiting & Concurrency Controls
5.1 Throttle Node (rate‑limit)
{
"name": "Throttle API Calls",
"type": "n8n-nodes-base.throttle",
"typeVersion": 1,
"parameters": {
"mode": "rate",
"rateLimit": 10,
"burst": 20
}
}
Place this node before the HTTP Request node.
5.2 maxConcurrent on HTTP Request
Set in the node’s Options tab, e.g., maxConcurrent = 8.
6. Performance Checklist & Tuning
| Checklist Item | Recommended Setting |
|---|---|
Disable per‑node retryOnFail where not needed |
false |
| Use Retry node with exponential back‑off | maxAttempts ≤ 3, delay = 2^attempt * 1000 ms |
| Implement circuit breaker | failThreshold = 5, pause = 30 s |
| Route errors to a dedicated error workflow | Execute Workflow → Run on Error |
Apply Throttle or maxConcurrent |
maxConcurrent = 8, rateLimit = 10 req/s |
Enable Prometheus metrics (n8n_execution_queue_length) |
n8n_metrics_enabled: true |
| Store circuit‑breaker state in a resilient cache (Redis HA) | Redis Sentinel / Cluster |
EEFA warning – Over‑throttling can increase latency for time‑critical pipelines. After each change, benchmark latency vs. failure rate.
7. Real‑World Troubleshooting Scenarios
| Symptom | Likely Cause | Fix |
|---|---|---|
| Queue length climbs, CPU ≈ 90 % | Global node retries set to 5+ with immediate back‑off | Reduce maxAttempts, enable exponential back‑off |
| Same external API error repeats every minute | No circuit breaker, service down | Add circuit‑breaker Function node, set pause ≥ 30 s |
| Slack alerts flood with duplicate messages | Error workflow re‑tries itself | Set Continue On Fail for alert nodes, add deduplication key |
| Redis connection timeout blocks all requests | Single Redis instance, no failover | Deploy Redis Sentinel or switch to n8n’s built‑in “Workflow Data Store” for low‑volume use |
Conclusion
By tightening retry policies, adding exponential back‑off, and protecting flaky services with a circuit breaker, you stop runaway retry storms that choke the n8n queue. Routing failures to a lightweight, dedicated error workflow isolates heavy logging and alerting, while rate limiting and concurrency caps keep upstream APIs from being overwhelmed. Together these patterns deliver a resilient, production‑ready n8n deployment that maintains low CPU usage, predictable latency, and reliable throughput.



