
Who this is for: Ops engineers and platform teams running n8n in production who need a reliable, low‑latency automation pipeline. We cover this in detail in the n8n Performance Degradation & Stability Issues Guide.
Quick Diagnosis: Is Your n8n Instance Slowing Down?
| Symptom | Immediate Check | One‑Line Fix (Featured‑Snippet Ready) |
|---|---|---|
| Workflow latency spikes after a few minutes | CPU > 80 % or Memory > 75 % in docker stats / top |
Scale the execution workers (EXECUTIONS_PROCESS=main,queue) or add a Redis queue to off‑load jobs. |
| “Too many connections” errors from PostgreSQL | SELECT count(*) FROM pg_stat_activity; shows > max_connections |
Raise POSTGRESQL_MAX_CONNECTIONS or enable connection pooling (pgbouncer). |
“Queue is full” in logs (BullMQ warnings) |
redis-cli LLEN n8n:queue > 10 000 |
Increase QUEUE_BULL_MAX_JOBS or split workers across multiple containers. |
| Memory keeps growing even after workflows finish | docker exec <container> node -e "console.log(process.memoryUsage())" shows steady rise |
Switch from SQLite to PostgreSQL + Redis and enable EXECUTIONS_MODE=queue. |
Bottom‑line: If the first 5‑10 minutes are snappy but latency climbs thereafter, the culprit is usually resource saturation (CPU, memory, DB connections) or missing queue infrastructure. Apply the appropriate fix from the sections below and monitor the metrics for 10 min to confirm the trend reverses.
1. n8n’s Execution Model — Why It Matters Under Load?
If you encounter any n8n becomes unstable after high volume runs why and fix resolve them before continuing with the setup.
| Component | Role | Default Production Setting |
|---|---|---|
| Workflow Engine | Parses and executes nodes | EXECUTIONS_PROCESS=main (single‑process) |
| Database | Stores workflow definitions, execution data | SQLite (DB_TYPE=sqlite) |
| Queue (BullMQ) | Optional job queue for async execution | Disabled by default |
| Redis | Back‑end for BullMQ & cache | Not required |
EEFA Note: Running n8n with SQLite and no queue is acceptable only for dev or low‑traffic demos. Production workloads should always use PostgreSQL (or MySQL) and Redis to guarantee isolation between workflow executions.
2. Common Bottlenecks That Appear Only Under Continuous Load
If you encounter any n8n freezes under load but doesnt crash resolve them before continuing with the setup.
| Bottleneck | Symptom | Root Cause | Fix |
|---|---|---|---|
| DB Connection Exhaustion | “Error: too many clients” from PostgreSQL | max_connections too low or n8n opening a new connection per workflow |
Use a connection pool (pgbouncer) or increase POSTGRESQL_MAX_CONNECTIONS. |
| Event‑Loop Blocking | CPU spikes, latency ↑, “blocked for X ms” in logs | Heavy JavaScript (e.g., large JSON transforms) running in the main process | Move to queue mode (EXECUTIONS_MODE=queue) and spin up multiple workers (EXECUTIONS_PROCESS=main,queue). |
| Redis Queue Saturation | BullMQ warnings: “Job stalled” | Queue length > QUEUE_BULL_MAX_JOBS or Redis memory limit reached |
Raise QUEUE_BULL_MAX_JOBS, enable Redis eviction policy volatile-lru, or shard queues across multiple Redis instances. |
| Large Payloads | Memory climbs, OOM kills | Nodes that fetch big files keep data in RAM | Stream data (node-fetch with stream:true), limit payload size via MAX_PAYLOAD_SIZE env var. |
| Memory Leaks in Custom Code | Memory never releases after workflow finishes | Custom JavaScript nodes retaining references | Refactor code to avoid closures; in dev run Node with --expose-gc and call global.gc(). |
3. Step‑by‑Step Diagnostic Checklist
| Diagnostic Step | Command / Action | Expected Result | |
|---|---|---|---|
| 1. Verify execution mode | docker exec n8n printenv EXECUTIONS_MODE |
queue (recommended) or main. |
|
| 2. Check worker count | docker exec n8n printenv EXECUTIONS_PROCESS |
main,queue (at least two workers). |
|
| 3. Inspect DB health | psql -U $POSTGRES_USER -c "SELECT count(*) FROM pg_stat_activity;" |
< max_connections (default 100). |
|
| 4. Monitor Redis queue depth | redis-cli LLEN n8n:queue |
< 5 000 (adjustable). | |
| 5. Profile CPU/Memory | docker stats n8n (or htop inside) |
CPU < 70 %, Mem < 70 % of limit. | |
| 6. Look for “blocked” logs | docker logs n8n | grep "blocked" |
No recent entries. | |
| 7. Validate container limits | docker inspect n8n --format '{{.HostConfig.Memory}}' |
> 2 GB for moderate load. | |
| 8. Test a high‑frequency workflow | Create a “ping” workflow that runs every 5 s for 10 min. | Execution time stays ~ < 200 ms. | |
| 9. Review error rates | docker logs n8n | grep -i error |
< 1 % of total executions. | |
| 10. Enable Prometheus metrics | Add METRICS=true env var, scrape /metrics. |
Metrics visible in Grafana. |
4. Optimizing n8n for Sustained Load
4.1 Docker‑Compose Example (PostgreSQL + Redis)
Core services declaration
version: "3.8"
services:
db:
image: postgres:16-alpine
environment:
POSTGRES_USER: n8n
Database configuration
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: n8n
volumes:
- db_data:/var/lib/postgresql/data
restart: unless-stopped
Redis with memory limits
redis:
image: redis:7-alpine
command: ["redis-server", "--maxmemory", "2gb", "--maxmemory-policy", "volatile-lru"]
restart: unless-stopped
n8n container with queue and scaling
n8n:
image: n8nio/n8n:latest
ports:
- "5678:5678"
environment:
DB_TYPE: postgres
DB_POSTGRESDB_HOST: db
DB_POSTGRESDB_PORT: 5432
DB_POSTGRESDB_DATABASE: n8n
DB_POSTGRESDB_USER: n8n
DB_POSTGRESDB_PASSWORD: ${POSTGRES_PASSWORD}
EXECUTIONS_MODE: queue
EXECUTIONS_PROCESS: main,queue
QUEUE_BULL_REDIS_HOST: redis
QUEUE_BULL_REDIS_PORT: 6379
WORKER_COUNT: 4 # Parallel queue workers
MAX_PAYLOAD_SIZE: 10mb # Guard against huge payloads
METRICS: "true"
depends_on:
- db
- redis
restart: unless-stopped
deploy:
resources:
limits:
cpus: "2"
memory: 4g
Volume definition
volumes: db_data:
EEFA Note: Never run n8n with DB_TYPE=sqlite in a container that restarts automatically; SQLite files can become corrupted during abrupt shutdowns under load.
4.2 Kubernetes Deployment (Helm‑style)
Deployment skeleton with replica count
apiVersion: apps/v1
kind: Deployment
metadata:
name: n8n
spec:
replicas: 2 # Horizontal scaling
selector:
matchLabels:
app: n8n
Pod template and container env vars (part 1)
template:
metadata:
labels:
app: n8n
spec:
containers:
- name: n8n
image: n8nio/n8n:latest
env:
- name: DB_TYPE
value: "postgresdb"
- name: DB_POSTGRESDB_HOST
value: "postgres.default.svc.cluster.local"
Pod template and container env vars (part 2)
- name: EXECUTIONS_MODE
value: "queue"
- name: EXECUTIONS_PROCESS
value: "main,queue"
- name: QUEUE_BULL_REDIS_HOST
value: "redis-master.default.svc.cluster.local"
- name: WORKER_COUNT
value: "6"
- name: METRICS
value: "true"
Resource limits and port
resources:
limits:
cpu: "2000m"
memory: "4Gi"
requests:
cpu: "500m"
memory: "1Gi"
ports:
- containerPort: 5678
4.3 Tuning Individual Environment Variables
| Variable | Recommended Value (Continuous Load) | What It Controls |
|---|---|---|
| EXECUTIONS_MODE | queue | Switches from in‑process to BullMQ queue. |
| EXECUTIONS_PROCESS | main,queue | Starts both the API server and separate queue workers. |
| WORKER_COUNT | 2‑8 (depending on CPU cores) | Number of parallel queue workers. |
| MAX_PAYLOAD_SIZE | 5mb‑20mb | Caps inbound data to protect RAM. |
| QUEUE_BULL_MAX_JOBS | 20000 | Upper bound for queued jobs before back‑pressure. |
| POSTGRESQL_MAX_CONNECTIONS | 200 (or pgbouncer pool) | Prevents “too many connections” errors. |
| REDIS_MAXMEMORY | 2gb | Stops Redis from swapping and evicts least‑recently‑used keys. |
5. Monitoring, Alerting & Observability
5.1 Prometheus Scrape Config
scrape_configs:
- job_name: 'n8n'
static_configs:
- targets: ['n8n:5678']
metrics_path: /metrics
Key metrics to watch:
| Metric | Threshold (Alert) | Meaning |
|---|---|---|
| process_cpu_seconds_total | > 80 % of allocated CPU for 5 min | CPU saturation. |
| process_resident_memory_bytes | > 75 % of memory limit | Memory pressure → OOM risk. |
| n8n_queue_length | > 10 000 | Queue backlog; consider scaling workers. |
| n8n_workflow_execution_duration_seconds_bucket{le=”0.5″} | < 70 % of executions in ≤ 0.5 s | Healthy latency. |
| postgres_connections | > 80 % of max_connections |
DB connection pool near exhaustion. |
5.2 Health‑Check Endpoint
curl -s http://localhost:5678/healthz | jq .
Expected JSON:
{
"status":"ok",
"db":"connected",
"redis":"connected",
"queue":"ready"
}
Configure your load balancer or Kubernetes liveness probe to call this endpoint every 30 s. If you encounter any n8n works in staging but slows down in production resolve them before continuing with the setup.
6. Production‑Grade Fixes & EEFA (Expert‑First Advice)
| Issue | Why the Naïve Fix Fails | Production‑Ready Remedy |
|---|---|---|
| Using SQLite | File‑level locking blocks concurrent writes → latency spikes. | Migrate to PostgreSQL (or MySQL) **before** traffic exceeds 10 rps. |
| Running in “main” mode only | All workflows share one Node.js event loop → one long job stalls the rest. | Enable **queue mode** and allocate **≥ 2 workers** (WORKER_COUNT). |
| No Redis | BullMQ falls back to an in‑memory queue that disappears on container restart, causing lost jobs. | Deploy a dedicated Redis cluster (or at least a single‑node with persistence). |
| Unlimited container resources | Container may consume host RAM → OOM killer restarts n8n, losing state. | Set **hard CPU/memory limits** (docker compose deploy.resources.limits). |
| Ignoring back‑pressure | High inbound webhook rate floods the queue, leading to “Job stalled” warnings. | Implement **rate‑limiting** on incoming webhooks (RATE_LIMIT=50), and enable QUEUE_BULL_MAX_JOBS back‑pressure. |
| Skipping TLS/Authentication on Redis | In production, an unauthenticated Redis is a security hole. | Use REDIS_TLS=true and provide REDIS_PASSWORD. |
Final EEFA Checklist
- Database = PostgreSQL (or MySQL) with connection pool.
- Queue = BullMQ backed by Redis.
- Execution mode =
queuewithmain,queueprocesses. - Worker count =
CPU cores × 2(adjust after load test). - Resource limits = CPU ≤ 2 cores, Memory ≤ 4 GiB per container.
- Monitoring = Prometheus + Grafana dashboards.
- Health‑checks =
/healthzendpoint + liveness probes.
Conclusion
The “fast‑then‑slow” symptom is almost always a mis‑aligned execution model (single‑process + SQLite) combined with resource exhaustion. Switch to queue mode, provision PostgreSQL + Redis, and enforce resource caps. After applying the checklist above, latency should remain flat even under continuous, high‑throughput loads, delivering a stable automation platform for production workloads.



