Memory Leak Detection and Prevention in n8n Production

Step by Step Guide to solve memory leak prevention

Who this is for: DevOps engineers and n8n administrators responsible for production‑grade, continuously‑running n8n deployments. We cover this in detail in the n8n Performance & Scaling Guide.

Quick Diagnosis

Problem: An n8n server that runs continuously shows a steady increase in RSS/heap size and eventually crashes with “Out‑of‑Memory” or is killed by the container orchestrator.

One‑line solution:

Enable live memory metrics (process.memoryUsage() or Prometheus exporter).
Identify the leaking node by isolating the workflow that spikes memory, then inspect custom code or large payload handling.
Apply a mitigation – trim payload size, add explicit global.gc() (if --expose-gc), set strict execution limits, and schedule a graceful restart (PM2 / Docker restart: always).

Implement the three steps and you’ll see RSS stabilise within 1‑2 hours of the fix.

1. How n8n Consumes Memory ?

1.1 Heap vs. RSS

Metric	Where it lives	Healthy size range
Heap	Managed by V8 (`process.memoryUsage().heapUsed`)	150 – 300 MiB
RSS	Total memory mapped (heap + native + stack)	250 – 500 MiB
External	Buffers, binary data (e.g., file uploads)	0 – 200 MiB

EEFA note: In Docker the container’s memory.limit_in_bytes caps RSS, not heap. If the heap grows beyond this limit the kernel OOM‑killer terminates the container before V8 can throw a “heap out of memory” error.

1.2 Execution Modes that Affect Memory

Mode	Description	Memory impact
`EXECUTIONS_PROCESS=main`	All workflow steps run in the same Node process	Highest per‑instance memory
`EXECUTIONS_PROCESS=queue`	Steps are delegated to a separate worker queue (e.g., BullMQ)	Isolates memory per worker, reduces footprint
`webhook` vs `polling`	Webhook mode stays idle between calls; polling adds periodic timers	Polling can retain hidden timers that keep references alive

*Switch to queue when you expect many concurrent long‑running workflows.*

2. Typical Memory‑Leak Patterns in n8n Workflows

Leak source	Why it leaks	Example node / code
Large JSON payloads stored in node parameters	Parameters stay in memory for the whole execution	`Set` node with a 10 MiB JSON object
Custom `Function` / `FunctionItem` nodes that retain global references	`global` or module‑level vars persist across executions	`global.myCache = …` inside a Function node
Infinite loops or unbounded recursion	Execution never reaches a GC point	`while (true) { … }` in a Function node
Binary data (files, PDFs) kept in memory instead of streaming	Buffers stay allocated until the workflow ends	`Execute Command` that reads a file into a variable
Uncleared event listeners	Listeners attached on each run accumulate	`process.on('exit', …)` inside a Function node

EEFA warning: Never increase the Node.js heap limit (--max-old-space-size) as a primary fix. It merely postpones the OOM and can cause the container to be evicted by the orchestrator.

3. Real‑Time Detection: Monitoring & Metrics

3.1 Quick‑Start Prometheus Exporter (Docker)

Add the built‑in metrics endpoint to your docker‑compose.yml.

services:
  n8n:
    image: n8nio/n8n
    environment:
      - N8N_METRICS=true            # enable Prometheus metrics
      - N8N_METRICS_PORT=9464
    ports:
      - "5678:5678"
      - "9464:9464"                  # Prometheus scrapes here

Prometheus query to spot a leak (increase > 50 MiB in 15 min):

increase(process_resident_memory_bytes{job="n8n"}[15m]) > 50 * 1024 * 1024

EEFA tip: Pair this with an alert that triggers a graceful restart (docker kill -s SIGTERM <container>). A SIGTERM lets n8n finish in‑flight executions before stopping.

3.2 In‑Process Diagnostics (One‑Liners)

Print a full memory snapshot from an **Execute Command** node:

node -e "console.log(JSON.stringify(process.memoryUsage(), null, 2))"

Or view the snapshot directly inside the container:

docker exec -it n8n-node bash -c "node -p 'process.memoryUsage()'"

3.3 Checklist – Is This a Leak?

RSS increases monotonically over > 48 h without plateau.
Heap growth > 30 % per 1 k executions of the same workflow.
No corresponding increase in incoming data volume.
Process restarts reset memory usage to baseline.

If all are true → proceed to remediation.

4. Step‑by‑Step Remediation

4.1 Trim Payloads Early

Keep only the fields you need before passing data downstream.

# Set node – keep required fields only
{
  "json": {
    "id": "{{$json.id}}",
    "status": "{{$json.status}}"
  }
}

4.2 Stream Large Binaries

Pipe files directly to S3 (or another sink) without loading them into RAM.

# Execute Command node – stream via STDIN
aws s3 cp - "s3://my-bucket/{{ $json.fileName }}" --no-progress

4.3 Clean Up Custom Code

Nulling stale references – drop global caches at the end of each run.

if (global.myCache) {
  global.myCache = null; // release reference
}
return items;

Force a GC pass (requires --expose-gc).

if (process.env.NODE_OPTIONS?.includes('--expose-gc')) {
  global.gc(); // trigger V8 garbage collection
}

Enable --expose-gc in Docker:

environment:
  - NODE_OPTIONS=--expose-gc

4.4 Enforce Execution Limits

Variable	Recommended value	Effect
MAX_EXECUTION_TIME	300 (seconds)	Stops runaway loops after 5 min
MAX_WORKFLOW_SIZE	10 (MiB)	Blocks huge payloads from entering the engine
WORKFLOW_DEFAULT_TIMEOUT	120 (seconds)	Auto‑cancels stalled workflows

Add the limits to your docker‑compose.yml environment block.

environment:
  - MAX_EXECUTION_TIME=300
  - MAX_WORKFLOW_SIZE=10
  - WORKFLOW_DEFAULT_TIMEOUT=120

4.5 Graceful Restart Strategy

PM2 can automatically restart n8n when RSS exceeds a threshold.

pm2 start n8n --name n8n \
  --watch \
  --max-restarts 5 \
  --restart-delay 5000 \
  --max-memory-restart 500M

--max-memory-restart forces a restart once RSS > 500 MiB, ensuring a fresh heap.

5. Production‑Grade Configuration to Cap Memory

Env var	Example value	Why it matters
EXECUTIONS_PROCESS	queue	Isolates each workflow in its own worker, limiting per‑process memory.
WORKFLOW_DEFAULT_TIMEOUT	120	Guarantees a hard stop for long‑running steps.
MAX_EXECUTION_TIME	300	Prevents infinite loops from hogging RAM.
MAX_WORKFLOW_SIZE	10 (MiB)	Blocks huge JSON objects from entering the engine.
NODE_OPTIONS	–max-old-space-size=512 –expose-gc	Caps V8 heap at 512 MiB and enables manual GC.
LOG_LEVEL	error	Reduces log‑volume noise that can fill buffers.
LOG_OUTPUT	stdout	Keeps logs in the container’s standard output for centralized collection.
METRICS	true	Exposes Prometheus metrics for monitoring.
METRICS_PORT	9464	Port for Prometheus to scrape.

Full docker‑compose.yml for a memory‑tight deployment (split for readability).

version: "3.8"
services:
  n8n:
    image: n8nio/n8n
    restart: always
    ports:
      - "5678:5678"

    environment:
      - EXECUTIONS_PROCESS=queue
      - WORKFLOW_DEFAULT_TIMEOUT=120
      - MAX_EXECUTION_TIME=300
      - MAX_WORKFLOW_SIZE=10
      - NODE_OPTIONS=--max-old-space-size=512 --expose-gc
      - LOG_LEVEL=error
      - LOG_OUTPUT=stdout
      - N8N_METRICS=true
      - N8N_METRICS_PORT=9464

    mem_limit: 1g            # Docker‑level hard limit
    mem_reservation: 800m    # Soft reservation

EEFA caution: Setting mem_limit lower than --max-old-space-size will cause the OOM‑killer to terminate the container before V8 can reclaim memory. Align the two values (+ 10 % headroom) to avoid silent restarts.

6. Automated Health‑Check & Alert Pipeline (Optional)

Docker/Kubernetes health‑check that exits with status 1 when RSS exceeds 600 MiB.

healthcheck:
  test: ["CMD", "node", "-e", "process.exit(process.memoryUsage().rss > 600*1024*1024 ? 1 : 0)"]
  interval: 30s
  timeout: 5s
  retries: 3
  start_period: 10s

– Kubernetes: Use the same command as a livenessProbe.
– Alertmanager: Wire the Prometheus query from §3.1 to a Slack or email notification.

Conclusion

Memory leaks in long‑running n8n instances are almost always traceable to oversized payloads, lingering global references, or uncontrolled loops. By:

Instrumenting live memory metrics,
Isolating the offending workflow, and
Applying payload trimming, streaming, GC, execution limits, and a graceful‑restart policy,

you can keep RSS stable, prevent OOM kills, and maintain a reliable production deployment. Align Docker memory limits with V8’s --max-old-space-size and let Prometheus‑driven alerts handle the rest—your n8n instance stays healthy without endless manual restarts.

Memory Leak Detection and Prevention in n8n Production

Quick Diagnosis

1. How n8n Consumes Memory ?

1.1 Heap vs. RSS

1.2 Execution Modes that Affect Memory

2. Typical Memory‑Leak Patterns in n8n Workflows

3. Real‑Time Detection: Monitoring & Metrics

3.1 Quick‑Start Prometheus Exporter (Docker)

3.2 In‑Process Diagnostics (One‑Liners)

3.3 Checklist – Is This a Leak?

4. Step‑by‑Step Remediation

4.1 Trim Payloads Early

4.2 Stream Large Binaries

4.3 Clean Up Custom Code

4.4 Enforce Execution Limits

4.5 Graceful Restart Strategy

5. Production‑Grade Configuration to Cap Memory

6. Automated Health‑Check & Alert Pipeline (Optional)

Conclusion

Leave a Comment Cancel Reply

Sign up for Newsletter

Quick Diagnosis

1. How n8n Consumes Memory ?

1.1 Heap vs. RSS

1.2 Execution Modes that Affect Memory

2. Typical Memory‑Leak Patterns in n8n Workflows

3. Real‑Time Detection: Monitoring & Metrics

3.1 Quick‑Start Prometheus Exporter (Docker)

3.2 In‑Process Diagnostics (One‑Liners)

3.3 Checklist – Is This a Leak?

4. Step‑by‑Step Remediation

4.1 Trim Payloads Early

4.2 Stream Large Binaries

4.3 Clean Up Custom Code

4.4 Enforce Execution Limits

4.5 Graceful Restart Strategy

5. Production‑Grade Configuration to Cap Memory

6. Automated Health‑Check & Alert Pipeline (Optional)

Conclusion

Must Read

Leave a Comment Cancel Reply