n8n Redis workflow retry failure - what happens when Redis goes down mid-execution

Step by Step Guide to solve n8n workflow fails due to Redis

Who this is for: n8n developers and DevOps engineers who run production‑grade workflows backed by Redis and need a reliable strategy to survive Redis outages. For a complete overview of Redis usage, errors, performance tuning, and scaling in n8n, check out our detailed guide on Redis for n8n Workflows.

Quick Diagnosis

Recover a stopped n8n workflow in three steps:

Step	Action	n8n node / setting
Detect	Catch Redis‑related errors with an Error Trigger.	Error Trigger → If (filter `error.message contains "Redis"`).
Retry	Run a Retry Loop with exponential back‑off (max 5 attempts).	Function → `await this.helpers.wait(2 ** $index * 1000);` → Execute Workflow.
Fallback	Persist the payload to a durable store (PostgreSQL, S3) before re‑queueing.	Postgres / S3 node in the error branch.

Add a Cron‑based watchdog (every 5 min) that re‑runs any “stuck” executions logged in the fallback table.

1. Why Redis‑Related Failures Stop a Workflow

Redis is n8n’s default cache & queue backend. When the Redis client throws an exception (e.g., connection loss, max‑memory limit, corrupted data), the execution engine aborts the current run and marks it failed. Because the execution state lives in Redis, the engine cannot resume without a healthy connection. Facing connection issues with Redis in n8n? Explore our full n8n Redis guide for solutions and best practices.

Typical Redis errors that trigger a failure

Redis error	n8n symptom
ECONNREFUSED	Immediate abort, no retries (`Error: connect ECONNREFUSED 127.0.0.1:6379`).
ETIMEDOUT	Step hangs, then fails (`Error: Redis connection timed out`).
OOM command not allowed	Write attempts rejected (`Error: OOM command not allowed when used memory > 'maxmemory'`).
WRONGTYPE	Data‑type mismatch (`Error: WRONGTYPE Operation against a key holding the wrong kind of value`).

2. Diagnosing the Root Cause

Before adding recovery logic, decide whether the failure is transient (network glitch) or systemic (mis‑configuration, memory pressure).

Diagnostic checklist

Tool	Command / UI	What to look for
redis-cli ping	redis-cli -h <host> -p <port> ping	PONG → reachable; otherwise network error.
Redis logs	/var/log/redis/redis-server.log	Repeated OOM or MAXMEMORY warnings.
n8n Execution Log	UI → Executions → Failed → Details	Full stack trace, error code.
Prometheus / Grafana	redis_up{instance=”redis:6379″} == 0	Alert on downtime.

3. Building a Robust Error‑Handling Branch

3.1 Add an Error Trigger

Insert an *Error Trigger* node that fires on any workflow failure.

{
  "name": "Error Trigger",
  "type": "n8n-nodes-base.errorTrigger",
  "typeVersion": 1,
  "position": [250, 300]
}

Filter for Redis errors

Place an *If* node after the trigger to keep only Redis‑related messages.

{
  "name": "Redis Error Filter",
  "type": "n8n-nodes-base.if",
  "typeVersion": 1,
  "parameters": {
    "conditions": {
      "string": [
        {
          "value1": "{{$json[\"error\"].message}}",
          "operation": "contains",
          "value2": "Redis"
        }
      ]
    }
  },
  "position": [450, 300]
}

3.2 Exponential Back‑off Retry Loop

Attempt	Wait (ms)	Formula
1	1 000	2^0 * 1000
2	2 000	2^1 * 1000
3	4 000	2^2 * 1000
4	8 000	2^3 * 1000
5	16 000	2^4 * 1000

**Function node – calculate wait time and pass retry metadata**

await this.helpers.wait(Math.pow(2, $index) * 1000);
return {
  json: {
    retryCount: $index + 1,
    originalPayload: $json,
  },
};

Connect the **Function** → **Execute Workflow** (same workflow ID) → **If** (max attempts reached).
**EEFA note** – never set maxAttempts to Infinity. Infinite loops will saturate Redis once it becomes available again.

3.3 Fallback Persistence

When the retry limit is exceeded, store the payload in a durable medium before giving up.

{
  "name": "Postgres Fallback",
  "type": "n8n-nodes-base.postgres",
  "typeVersion": 1,
  "parameters": {
    "operation": "insert",
    "table": "n8n_redis_fallback",
    "columns": [
      { "name": "executionId", "value": "{{$json[\"executionId\"]}}" },
      { "name": "payload",     "value": "{{$json[\"originalPayload\"]}}" },
      { "name": "failedAt",    "value": "{{$now}}" }
    ]
  },
  "position": [850, 300]
}

Why this works – The fallback table lives outside Redis, so even a prolonged outage leaves your data intact and ready for later replay. Want to log and monitor Redis errors in n8n efficiently? Check our comprehensive n8n Redis guide.

4. Automated Recovery of Stuck Executions

4.1 Scheduler (Cron node)

{
  "name": "Recovery Scheduler",
  "type": "n8n-nodes-base.cron",
  "typeVersion": 1,
  "parameters": { "cronExpression": "*/5 * * * *" },
  "position": [250, 200]
}

4.2 Fetch pending fallback rows

{
  "name": "Fetch Stuck Executions",
  "type": "n8n-nodes-base.postgres",
  "typeVersion": 1,
  "parameters": {
    "operation": "select",
    "sql": "SELECT * FROM n8n_redis_fallback WHERE processed = false"
  },
  "position": [450, 200]
}

4.3 Replay the original workflow

{
  "name": "Replay Workflow",
  "type": "n8n-nodes-base.executeWorkflow",
  "typeVersion": 1,
  "parameters": {
    "workflowId": "",
    "inputData": "={{$json}}"
  },
  "position": [650, 200]
}

4.4 Mark the row as processed

{
  "name": "Mark Processed",
  "type": "n8n-nodes-base.postgres",
  "typeVersion": 1,
  "parameters": {
    "operation": "update",
    "sql": "UPDATE n8n_redis_fallback SET processed = true WHERE id = {{$json[\"id\"]}}"
  },
  "position": [850, 200]
}

EEFA warning – Ensure the replayed workflow is idempotent. Guard any side‑effects (e.g., email sending) with a check that a “sent” flag exists in your database before repeating the action.

5. Monitoring & Alerting

Metric	Recommended threshold	Alert action
redis_up (Prometheus)	0 for > 30 s	Slack / PagerDuty “Redis down”.
n8n_execution_failed_total{error=~”.Redis.”}	> 5/min	Trigger the Recovery Scheduler immediately.
n8n_fallback_queue_length	> 100	Investigate memory pressure or scale Redis.

Add a Grafana panel visualising n8n_fallback_queue_length to spot growing backlogs before they become critical. Confused by Redis command errors in n8n? Our n8n Redis guide has all the fixes you need.

6. Quick Diagnostic Checklist (Copy‑Paste)

- [ ] Ping Redis (`redis-cli ping`) → is it PONG?
- [ ] Review Redis logs for OOM or maxmemory warnings.
- [ ] Verify n8n env vars: REDIS_HOST, REDIS_PORT, REDIS_PASSWORD.
- [ ] Test network connectivity (`telnet <host> <port>`).
- [ ] Inspect n8n Execution Log → error message contains “Redis”.
- [ ] Run a minimal workflow that only writes a key to Redis.
- [ ] If transient, enable the retry loop (max 5 attempts).
- [ ] If persistent, enable fallback persistence (Postgres/S3).
- [ ] Deploy the Cron recovery workflow and monitor fallback table size.

7. Production‑Grade Best Practices

Separate Redis instances – Use one for cache, another for the n8n queue to prevent cache churn from starving the job queue.
Set maxmemory-policy to volatile-lru on the queue DB so only expiring keys are evicted.
Enable client-output-buffer-limit to stop a single stalled worker from exhausting server memory.
Run n8n in Docker Swarm / Kubernetes with a readiness probe that runs redis-cli ping. A failed probe restarts the pod, avoiding half‑started executions.
Log every fallback entry with a UUID to correlate with the original execution in your audit trail.

Next Steps

Implement the error‑handling branch in a staging environment.
Simulate Redis downtime (docker stop redis) and verify that the retry loop and fallback persist correctly.
Once stable, promote to production and enable the monitoring alerts described in § 5.

All JSON snippets are ready‑to‑paste into the n8n UI (JSON import) or a Code node. Adjust connection credentials, workflow IDs, and table names to match your environment.

n8n Redis workflow retry failure – what happens when Redis goes down mid-execution

Quick Diagnosis

1. Why Redis‑Related Failures Stop a Workflow

Typical Redis errors that trigger a failure

2. Diagnosing the Root Cause

Diagnostic checklist

3. Building a Robust Error‑Handling Branch

3.1 Add an Error Trigger

Filter for Redis errors

3.2 Exponential Back‑off Retry Loop

3.3 Fallback Persistence

4. Automated Recovery of Stuck Executions

4.1 Scheduler (Cron node)

4.2 Fetch pending fallback rows

4.3 Replay the original workflow

4.4 Mark the row as processed

5. Monitoring & Alerting

6. Quick Diagnostic Checklist (Copy‑Paste)

7. Production‑Grade Best Practices

Next Steps

Leave a Comment Cancel Reply

Sign up for Newsletter

Quick Diagnosis

1. Why Redis‑Related Failures Stop a Workflow

Typical Redis errors that trigger a failure

2. Diagnosing the Root Cause

Diagnostic checklist

3. Building a Robust Error‑Handling Branch

3.1 Add an Error Trigger

Filter for Redis errors

3.2 Exponential Back‑off Retry Loop

3.3 Fallback Persistence

4. Automated Recovery of Stuck Executions

4.1 Scheduler (Cron node)

4.2 Fetch pending fallback rows

4.3 Replay the original workflow

4.4 Mark the row as processed

5. Monitoring & Alerting

6. Quick Diagnostic Checklist (Copy‑Paste)

7. Production‑Grade Best Practices

Next Steps

Must Read

Leave a Comment Cancel Reply