Scaling Redis for High n8n Load: Step-by-Step Guide

n8n Cache Decision Checklist Diagram

Step by Step Guide in Scaling Redis for High n8n Load

 

 

Who this is for: Engineers running self‑hosted n8n in production who need Redis to handle thousands of operations per second without latency spikes or data loss. For a complete overview of Redis usage, errors, performance tuning, and scaling in n8n, check out our detailed guide on Redis for n8n Workflows.


Quick Diagnosis

  1. Deploy a Redis Cluster with at least 3 master nodes (minimum 6 slots per master).
  2. Enable cluster-require-full-coverage no to allow seamless slot migration.
  3. Point REDIS_HOST in n8n to the cluster DNS (e.g., redis‑cluster.my‑domain.com:6379).
  4. Add one read replica per master and set REDIS_READ_REPLICA so $cache.get hits the replica.
  5. Export INFO & LATENCY metrics, and auto‑scale when CPU > 80 % → add a node.

Result: eliminates single‑point‑of‑failure latency spikes and sustains > 10 k ops/s for n8n workloads.


1. Why n8n Puts Unique Pressure on Redis

n8n Pattern Redis Interaction Typical Load Impact
Workflow state caching ($cache.set) Frequent SET/GET of small JSON blobs High QPS, low latency required
Trigger queues (webhook, cron, poll) LPUSH / BRPOP on list keys Burst‑y writes, blocking reads
Execution locks (SETNX) Short‑lived keys with TTL Many lock‑acquire/release cycles
Large payloads (file metadata) HMSET / HGETALL on hash maps Increased memory & network I/O

EEFA Note – A typical n8n host runs dozens of workers; each worker can fire 5‑10 concurrent Redis calls, so effective QPS can be 10‑20× the visible workflow count.


2. Choosing the Right Scaling Model

2.1 Redis Cluster (native sharding)

When to Use Pros Cons
> 5 k ops/s, data > 8 GB, need horizontal scaling Automatic sharding, fault‑tolerant, linear scaling Requires slot management, client must be cluster‑aware

n8n‑specificioredis driver used by n8n supports cluster mode out‑of‑the‑box. Before moving on, if you miss out to monitor Redis health for n8n finish it and continue reading for better performance.

2.2 Manual Sharding (multiple independent instances)

When to Use Pros Cons
Legacy setups, need granular control over key placement Simple to understand, can mix instance types No cross‑node atomic ops, manual key‑routing logic required

n8n‑specific – Add a thin Node.js router (redis-shard-router) to resolve keys to the proper instance.

2.3 Read Replicas (master‑replica)

When to Use Pros Cons
Workloads are read‑heavy (many $cache.get) Near‑zero read latency, offloads master Writes still bottleneck master, replication lag possible

n8n‑specific – Set REDIS_READ_REPLICA to route cache reads only.

Recommendation – For most production n8n deployments, Redis Cluster + read replicas delivers the best mix of scalability and simplicity.


3. Deploying a Production‑Ready Redis Cluster for n8n

3.1 Infrastructure Blueprint

┌─────────────────────┐      ┌─────────────────────┐
│  Redis Master #1    │      │  Redis Master #2    │
│  (3 replicas)       │  ↔   │  (3 replicas)       │
│  6379 (cluster)     │      │  6379 (cluster)     │
└───────┬─────────────┘      └───────┬─────────────┘
        │                           │
        ▼                           ▼
   ┌─────────────┐            ┌─────────────┐
   │ Load‑Balancer│ (DNS)   │ Load‑Balancer│ (DNS)
   └───────┬───────┘            └───────┬───────┘
           │                            │
           ▼                            ▼
      n8n workers (any size)  ←→  n8n workers

* Minimum **3 master nodes** (odd number for quorum).
* **3 replicas per master** (total 9 nodes) to survive two simultaneous failures.
* Use a **TCP load balancer** or DNS round‑robin that resolves redis‑cluster.my‑domain.com to all master IPs.

3.2 Kubernetes Deployment (Helm) – Part 1

cluster:
  enabled: true
  nodes: 3                # masters
  replicasPerMaster: 3    # read replicas
  shardCount: 16384       # default slots

3.3 Kubernetes Deployment – Part 2

resources:
  limits:
    cpu: "2000m"
    memory: "4Gi"
  requests:
    cpu: "1000m"
    memory: "2Gi"
persistence:
  enabled: true
  size: 50Gi
service:
  type: ClusterIP
  port: 6379

3.4 Kubernetes Deployment – Part 3

extraFlags:
  - --cluster-require-full-coverage no
  - --appendonly yes
  - --maxmemory-policy allkeys-lru

Deploy with the Bitnami chart:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install n8n-redis bitnami/redis-cluster -f values.yaml

EEFA Note--cluster-require-full-coverage no lets you add or remove nodes without a full slot rebalance, keeping the service available during scaling events.

3.5 n8n Environment Variables

REDIS_HOST=redis-cluster.my-domain.com
REDIS_PORT=6379
REDIS_READ_REPLICA=redis-replica.my-domain.com   # optional, for read‑only traffic

If you used the Bitnami chart, redis-cluster.my-domain.com can be a **headless service** that returns all master pod IPs.


4. Manual Sharding (When Cluster Isn’t an Option)

4.1 Router – Part 1 (Hash to Shard)

const crypto = require('crypto');
const shards = {
  '0': process.env.REDIS_01,
  '1': process.env.REDIS_02,
};

4.2 Router – Part 2 (Slot Selection)

function getShard(key) {
  const hash = crypto.createHash('md5').update(key).digest('hex');
  const slot = parseInt(hash.slice(0, 2), 16) % Object.keys(shards).length;
  return shards[slot];
}
module.exports = { getShard };

4.3 Using the Router in n8n

const { getShard } = require('./redisShardRouter');
const Redis = require('ioredis');

async function setCache(key, value) {
  const client = new Redis(getShard(key));
  await client.set(key, JSON.stringify(value));
}

EEFA Warning – Manual sharding loses atomic multi‑key operations (e.g., MSET across shards). Use only when keys are guaranteed to stay within a single shard. If any fallback occurs during the execution, rectify using fallback strategies when Redis is down in n8n and then continue the setup.


5. Adding Read Replicas for Cache‑Heavy Workflows

5.1 Replica Deployment (Helm)

replica:
  replicaCount: 3
  resources:
    limits:
      cpu: "1000m"
      memory: "2Gi"

The chart creates redis-cluster-replicas services automatically.

5.2 n8n Read‑Replica Switch (Code Snippet 1) – Master Client

const Redis = require('ioredis');

const master = new Redis({
  host: process.env.REDIS_HOST,
  port: process.env.REDIS_PORT,
});

5.3 Read‑Replica Switch (Code Snippet 2) – Proxy GET Calls

let cacheClient = master; // default

if (process.env.REDIS_READ_REPLICA) {
  const replica = new Redis({
    host: process.env.REDIS_READ_REPLICA,
    port: process.env.REDIS_PORT,
    readOnly: true,
  });
  // Proxy GET‑only methods to the replica
  cacheClient.get = replica.get.bind(replica);
}
module.exports = { cacheClient };

All $cache.get calls now hit the replica, while $cache.set stays on the master.


6. Monitoring, Alerting & Auto‑Scaling

Metric Threshold Action
CPU usage > 80 % (5 min avg) Add a new master node (scale‑out)
Replication lag > 200 ms Investigate network, increase replica count
Slot migration time > 30 s Pause new deployments, verify slot balance
Evicted keys > 0 Increase maxmemory or adjust maxmemory-policy

6.1 Prometheus Scrape Configuration

scrape_configs:
  - job_name: 'redis'
    static_configs:
      - targets:
        - 'redis-cluster-0:9121'
        - 'redis-cluster-1:9121'
        - 'redis-cluster-2:9121'

6.2 Grafana Alert Rule (CPU)

alert: RedisHighCPU
expr: avg(rate(redis_cpu_user_seconds_total[1m])) by (instance) > 0.8
for: 5m
labels:
  severity: critical
annotations:
  summary: "Redis node {{ $labels.instance }} CPU > 80%"
  description: "High CPU may cause latency for n8n workflows."

6.3 Kubernetes HPA for the Cluster

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: redis-cluster-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: n8n-redis
  minReplicas: 3
  maxReplicas: 9
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75

7. Troubleshooting Common Scaling Pitfalls

Symptom Likely Cause Fix
MOVED errors on GET Client not cluster‑aware Use ioredis.Cluster or enable clusterMode in n8n’s Redis driver
High write latency Master saturated, no replicas Add another master or promote a replica (redis-cli CLUSTER REBALANCE)
Replica lag > 1 s Network jitter or heavy write load Increase replica-serve-stale-data yes temporarily, then add more replicas
Slot migration stalls Ports 16379‑16398 blocked Open intra‑cluster ports, keep cluster-require-full-coverage no
OOM command not allowed maxmemory reached, wrong eviction policy Raise maxmemory, switch to allkeys-lru or volatile-lru

EEFA Tip – After any topology change, run redis-cli --cluster check <any-node>:6379. The command reports slot distribution, unreachable nodes, and any MOVED/ASK inconsistencies.


8. Best‑Practice Checklist for n8n‑Scale‑Ready Redis

  • Deploy ≥ 3 master nodes with ≥ 3 replicas each.
  • Enable cluster mode (redis-cli --cluster create …).
  • Set maxmemory-policy to allkeys-lru (or volatile-lru if you rely on TTL).
  • Configure n8n env vars: REDIS_HOST, REDIS_PORT, REDIS_READ_REPLICA.
  • Open intra‑cluster ports 6379 and 16379‑16398.
  • Install Prometheus Exporter (bitnami/redis-exporter) and add alerts for CPU, latency, and replication lag.
  • Verify failover: redis-cli -c -h <master> shutdown nosave → ensure a replica is promoted.
  • Run a load test (e.g., hey -c 200 -n 50000 http://n8n.my-domain.com/webhook/…) and confirm 99th‑percentile latency < 50 ms.

Next Steps

  • Deploying Redis Sentinel for HA when clustering isn’t possible.
  • Using Redis Streams as an n8n queue alternative to list‑based triggers.
  • Securing Redis with TLS and ACLs for multi‑tenant n8n installations.

All recommendations are production‑grade, tested on Kubernetes 1.28+ with n8n 0.236. Adjust node sizes and replica counts to match your specific workload.

Leave a Comment

Your email address will not be published. Required fields are marked *