n8n workflows slow after weeks in production execution table bloat fix

Step by Step Guide to solve n8n workflows slow after weeks in production root cause analysis 
Step by Step Guide to solve n8n workflows slow after weeks in production root cause analysis


Who this is for: Developers and DevOps engineers running n8n in a production environment who need to diagnose and prevent gradual workflow slowdown. We cover this in detail in the n8n Performance Degradation & Stability Issues Guide.


Quick Diagnosis

Step Action
1. Open Execution List → “Last 24 h” and note the average execution time.
2. Run the workflow in Debug mode with “Run from start” and enable “Show execution details”.
3. If any node shows durationMs > 2 s, open its Advanced settings and enable Cache results or adjust Batch size.
4. Restart the n8n worker (pm2 restart n8n or docker restart n8n).
5. If execution time stays > 500 ms, proceed to the full root‑cause analysis below.

1. Core Causes of Degrading Performance

If you encounter any why n8n execution time increases over time resolve them before continuing with the setup.

1.1 Memory‑Related Issues

Trigger Manifestation
Custom JavaScript that keeps references in global or long‑living closures Node.js heap grows → GC pauses → 5‑30 s spikes.

1.2 Data‑Store Issues

Trigger Manifestation
executions table never purged, large JSON blobs in workflow_execution Full‑table scans on every run, slowing all workflows.

1.3 Queue & Concurrency Problems

Trigger Manifestation
Execute Workflow node with maxConcurrency = 0 (unlimited) Backlog builds, each new execution waits for the previous ones.

1.4 External API Throttling

Trigger Manifestation
Rate‑limited services (Salesforce, HubSpot, etc.) Retries with exponential back‑off add seconds per node.

1.5 Inefficient Node Logic

Trigger Manifestation
Large CSV parsing, nested loops in a “Function” node CPU spikes, especially on low‑end containers.

1.6 Version Drift

Trigger Manifestation
Upgrading n8n core without revisiting custom nodes New defaults (e.g., stricter validation) cause hidden re‑processing.

EEFA Note: In production, the silent killer is almost always a memory leak from user‑supplied JavaScript. Even a single stray global.someArray = [] that keeps growing will stall the worker process.


2. Step‑by‑Step Root‑Cause Investigation

2.1 Capture Baseline Metrics

Export recent execution stats (last 7 days):

bash
curl -X GET "http://localhost:5678/rest/executions?filter[finished]=true&filter[createdAt][gte]=$(date -d '-7 days' +%s)" \
  -H "Authorization: Bearer $API_TOKEN" > execs.json

Calculate the average duration (requires jq):

bash
jq '[.data[].durationMs] | add / length / 1000' execs.json

*If the average > 1 s, a systemic issue exists.*

2.2 Profile Memory Usage

Attach the Node.js inspector (default port 9229) and take a heap snapshot after ~30 executions:

bash
node --inspect-brk $(which n8n) &

Open Chrome → chrome://inspect → “Open dedicated DevTools for Node”. Compare the snapshot with one taken from a fresh start; look for growing arrays or detached objects. If you encounter any n8n uses more memory every day leak or design issue resolve them before continuing with the setup.

2.3 Audit the Database

Count rows and size of the execution table (PostgreSQL example):

sql
SELECT COUNT(*) FROM execution;
SELECT pg_total_relation_size('execution')/1024/1024 AS mb;

EEFA Warning: Deleting rows directly (DELETE FROM execution;) bypasses n8n’s cleanup logic. Use the built‑in purge endpoint or CLI instead.

CLI purge of old executions:

bash
n8n execution:clear --older-than 30d

2.4 Identify Hot Nodes

  1. Open Workflow → Settings → Execution → Show execution details.
  2. Sort the “Node execution time (ms)” column.
  3. Flag any node with durationMs > 2000 ms.

Typical hot spots: Function, HTTP Request, Spreadsheet File nodes.

2.5 Review Concurrency Settings

Set a realistic limit based on CPU cores:

yaml
# n8n environment configuration
EXECUTIONS_PROCESS=main
MAX_EXECUTIONS=5          # limit concurrent runs
EXECUTIONS_TIMEOUT=60000 # 60 s timeout

3. Targeted Fixes for Each Root Cause

3.1 Memory Leak in Custom JavaScript

Problematic pattern (leaky):

js
// BAD – uses global mutable state
global.cache = global.cache || [];
global.cache.push(item);
return global.cache;

Stateless alternative (safe):

js
// GOOD – local variable only
const cache = [];
cache.push(item);
return cache;

*Scope variables locally and avoid mutating global.*

3.2 Database Bloat

Enable automatic retention and schedule regular purges:

bash
export EXECUTIONS_DATA_RETENTION_DAYS=30   # keep 30 days of data
n8n start

Add a cron job (runs daily at 02:00 am):

cron
0 2 * * * /usr/local/bin/n8n execution:clear --older-than 30d >> /var/log/n8n-purge.log 2>&1

3.3 Unbounded Queues

Set maxConcurrency on the **Execute Workflow** node (UI → Advanced) or globally via MAX_EXECUTIONS as shown earlier.

3.4 External API Throttling

Add a **Retry** node with custom back‑off parameters:

json
{
  "retry": {
    "maxAttempts": 5,
    "delay": 2000
  }
}

Cache frequent responses where possible.

3.5 Inefficient Parsing

Stream CSV data instead of loading the whole file:

js
const csv = require('csv-parser');
stream.pipe(csv()).on('data', row => {
  // process each row here
});

3.6 Version Drift

After any n8n upgrade, validate all workflows:

bash
n8n workflow:validate --all

Fix any reported breaking changes before traffic resumes.

EEFA Insight: Always restart the worker after applying a fix to clear stale memory and reset DB connection pools. In Docker, docker compose restart n8n is the safest approach. If you encounter any n8n slows down even with low cpu usage resolve them before continuing with the setup.


4. Preventive Maintenance Checklist

Item Verification Method Frequency
Enable execution retention (EXECUTIONS_DATA_RETENTION_DAYS) echo $EXECUTIONS_DATA_RETENTION_DAYS returns a non‑zero number Once (deployment)
Schedule DB purge (cron) crontab -l shows n8n execution:clear command Daily
Monitor heap size (Prometheus nodejs_heap_size_used_bytes) Grafana panel stays < 70 % of limit Continuous
Cap concurrent runs (MAX_EXECUTIONS) n8n config:get MAX_EXECUTIONS matches CPU_COUNT * 2 After scaling
Audit custom JS (static analysis) Run eslint with no-global rule on repo PR review
Validate all workflows after upgrades CLI exits with status 0 Post‑upgrade
Alert on slow node execution (> 2 s) Alert on n8n_node_execution_time_seconds > 2 Continuous

5. Real‑World Example: Fixing a 3‑Week Slowdown

Scenario: An e‑commerce integration workflow (order-sync) started at 200 ms per run and rose to 8 s after four weeks.

Root causes discovered

  1. A Function node built a global.orderCache array that never cleared.
  2. The execution table grew to 2.3 M rows (~12 GB).

Fixes applied

Removed the global cache:

js
// Old leaky code
global.orderCache = global.orderCache || [];
global.orderCache.push(newOrder);
return global.orderCache;
js
// New stateless code
return [newOrder]; // only return current payload

Purged old executions and restarted the worker:

bash
n8n execution:clear --older-than 14d
pm2 restart n8n

Outcome: Execution time dropped back to 180 ms and stayed stable for the next 90 days.

EEFA Note: The leak was invisible because the Function node didn’t log its global usage. Always audit custom code for side‑effects before deploying.


Fix Checklist

  1. Check recent avg execution time (> 1 s → investigate).
  2. Run workflow in Debug → note any node > 2 s.
  3. If a Function node, ensure no global or long‑living objects.
  4. Purge old executions: n8n execution:clear --older-than 30d.
  5. Restart n8n worker (pm2 restart n8n or Docker restart).
  6. Set MAX_EXECUTIONS to a safe limit (e.g., CPU_COUNT * 2).

If performance still degrades, follow the full root‑cause analysis steps above.

All commands assume a Unix‑like environment and that the n8n CLI is in $PATH. Adjust paths, tokens, and environment variables to match your deployment.

Leave a Comment

Your email address will not be published. Required fields are marked *