Memory Leak Detection and Prevention in n8n Production

Step by Step Guide to solve memory leak prevention 
Step by Step Guide to solve memory leak prevention


Who this is for: DevOps engineers and n8n administrators responsible for production‑grade, continuously‑running n8n deployments. We cover this in detail in the n8n Performance & Scaling Guide.


Quick Diagnosis

Problem: An n8n server that runs continuously shows a steady increase in RSS/heap size and eventually crashes with “Out‑of‑Memory” or is killed by the container orchestrator.

One‑line solution:

  1. Enable live memory metrics (process.memoryUsage() or Prometheus exporter).
  2. Identify the leaking node by isolating the workflow that spikes memory, then inspect custom code or large payload handling.
  3. Apply a mitigation – trim payload size, add explicit global.gc() (if --expose-gc), set strict execution limits, and schedule a graceful restart (PM2 / Docker restart: always).

Implement the three steps and you’ll see RSS stabilise within 1‑2 hours of the fix.


1. How n8n Consumes Memory ?

1.1 Heap vs. RSS

Metric Where it lives Healthy size range
Heap Managed by V8 (process.memoryUsage().heapUsed) 150 – 300 MiB
RSS Total memory mapped (heap + native + stack) 250 – 500 MiB
External Buffers, binary data (e.g., file uploads) 0 – 200 MiB

EEFA note: In Docker the container’s memory.limit_in_bytes caps RSS, not heap. If the heap grows beyond this limit the kernel OOM‑killer terminates the container before V8 can throw a “heap out of memory” error.

1.2 Execution Modes that Affect Memory

Mode Description Memory impact
EXECUTIONS_PROCESS=main All workflow steps run in the same Node process Highest per‑instance memory
EXECUTIONS_PROCESS=queue Steps are delegated to a separate worker queue (e.g., BullMQ) Isolates memory per worker, reduces footprint
webhook vs polling Webhook mode stays idle between calls; polling adds periodic timers Polling can retain hidden timers that keep references alive

*Switch to queue when you expect many concurrent long‑running workflows.*


2. Typical Memory‑Leak Patterns in n8n Workflows

Leak source Why it leaks Example node / code
Large JSON payloads stored in node parameters Parameters stay in memory for the whole execution Set node with a 10 MiB JSON object
Custom Function / FunctionItem nodes that retain global references global or module‑level vars persist across executions global.myCache = … inside a Function node
Infinite loops or unbounded recursion Execution never reaches a GC point while (true) { … } in a Function node
Binary data (files, PDFs) kept in memory instead of streaming Buffers stay allocated until the workflow ends Execute Command that reads a file into a variable
Uncleared event listeners Listeners attached on each run accumulate process.on('exit', …) inside a Function node

EEFA warning: Never increase the Node.js heap limit (--max-old-space-size) as a primary fix. It merely postpones the OOM and can cause the container to be evicted by the orchestrator.


3. Real‑Time Detection: Monitoring & Metrics

3.1 Quick‑Start Prometheus Exporter (Docker)

Add the built‑in metrics endpoint to your docker‑compose.yml.

services:
  n8n:
    image: n8nio/n8n
    environment:
      - N8N_METRICS=true            # enable Prometheus metrics
      - N8N_METRICS_PORT=9464
    ports:
      - "5678:5678"
      - "9464:9464"                  # Prometheus scrapes here

Prometheus query to spot a leak (increase > 50 MiB in 15 min):

increase(process_resident_memory_bytes{job="n8n"}[15m]) > 50 * 1024 * 1024

EEFA tip: Pair this with an alert that triggers a graceful restart (docker kill -s SIGTERM <container>). A SIGTERM lets n8n finish in‑flight executions before stopping.

3.2 In‑Process Diagnostics (One‑Liners)

Print a full memory snapshot from an **Execute Command** node:

node -e "console.log(JSON.stringify(process.memoryUsage(), null, 2))"

Or view the snapshot directly inside the container:

docker exec -it n8n-node bash -c "node -p 'process.memoryUsage()'"

3.3 Checklist – Is This a Leak?

  • RSS increases monotonically over > 48 h without plateau.
  • Heap growth > 30 % per 1 k executions of the same workflow.
  • No corresponding increase in incoming data volume.
  • Process restarts reset memory usage to baseline.

If all are true → proceed to remediation.


4. Step‑by‑Step Remediation

4.1 Trim Payloads Early

Keep only the fields you need before passing data downstream.

# Set node – keep required fields only
{
  "json": {
    "id": "{{$json.id}}",
    "status": "{{$json.status}}"
  }
}

4.2 Stream Large Binaries

Pipe files directly to S3 (or another sink) without loading them into RAM.

# Execute Command node – stream via STDIN
aws s3 cp - "s3://my-bucket/{{ $json.fileName }}" --no-progress

4.3 Clean Up Custom Code

Nulling stale references – drop global caches at the end of each run.

if (global.myCache) {
  global.myCache = null; // release reference
}
return items;

Force a GC pass (requires --expose-gc).

if (process.env.NODE_OPTIONS?.includes('--expose-gc')) {
  global.gc(); // trigger V8 garbage collection
}

Enable --expose-gc in Docker:

environment:
  - NODE_OPTIONS=--expose-gc

4.4 Enforce Execution Limits

Variable Recommended value Effect
MAX_EXECUTION_TIME 300 (seconds) Stops runaway loops after 5 min
MAX_WORKFLOW_SIZE 10 (MiB) Blocks huge payloads from entering the engine
WORKFLOW_DEFAULT_TIMEOUT 120 (seconds) Auto‑cancels stalled workflows

Add the limits to your docker‑compose.yml environment block.

environment:
  - MAX_EXECUTION_TIME=300
  - MAX_WORKFLOW_SIZE=10
  - WORKFLOW_DEFAULT_TIMEOUT=120

4.5 Graceful Restart Strategy

PM2 can automatically restart n8n when RSS exceeds a threshold.

pm2 start n8n --name n8n \
  --watch \
  --max-restarts 5 \
  --restart-delay 5000 \
  --max-memory-restart 500M

--max-memory-restart forces a restart once RSS > 500 MiB, ensuring a fresh heap.


5. Production‑Grade Configuration to Cap Memory

Env var Example value Why it matters
EXECUTIONS_PROCESS queue Isolates each workflow in its own worker, limiting per‑process memory.
WORKFLOW_DEFAULT_TIMEOUT 120 Guarantees a hard stop for long‑running steps.
MAX_EXECUTION_TIME 300 Prevents infinite loops from hogging RAM.
MAX_WORKFLOW_SIZE 10 (MiB) Blocks huge JSON objects from entering the engine.
NODE_OPTIONS –max-old-space-size=512 –expose-gc Caps V8 heap at 512 MiB and enables manual GC.
LOG_LEVEL error Reduces log‑volume noise that can fill buffers.
LOG_OUTPUT stdout Keeps logs in the container’s standard output for centralized collection.
METRICS true Exposes Prometheus metrics for monitoring.
METRICS_PORT 9464 Port for Prometheus to scrape.

Full docker‑compose.yml for a memory‑tight deployment (split for readability).

version: "3.8"
services:
  n8n:
    image: n8nio/n8n
    restart: always
    ports:
      - "5678:5678"
    environment:
      - EXECUTIONS_PROCESS=queue
      - WORKFLOW_DEFAULT_TIMEOUT=120
      - MAX_EXECUTION_TIME=300
      - MAX_WORKFLOW_SIZE=10
      - NODE_OPTIONS=--max-old-space-size=512 --expose-gc
      - LOG_LEVEL=error
      - LOG_OUTPUT=stdout
      - N8N_METRICS=true
      - N8N_METRICS_PORT=9464
    mem_limit: 1g            # Docker‑level hard limit
    mem_reservation: 800m    # Soft reservation

EEFA caution: Setting mem_limit lower than --max-old-space-size will cause the OOM‑killer to terminate the container before V8 can reclaim memory. Align the two values (+ 10 % headroom) to avoid silent restarts.


6. Automated Health‑Check & Alert Pipeline (Optional)

Docker/Kubernetes health‑check that exits with status 1 when RSS exceeds 600 MiB.

healthcheck:
  test: ["CMD", "node", "-e", "process.exit(process.memoryUsage().rss > 600*1024*1024 ? 1 : 0)"]
  interval: 30s
  timeout: 5s
  retries: 3
  start_period: 10s

Kubernetes: Use the same command as a livenessProbe.
Alertmanager: Wire the Prometheus query from §3.1 to a Slack or email notification.


Conclusion

Memory leaks in long‑running n8n instances are almost always traceable to oversized payloads, lingering global references, or uncontrolled loops. By:

  1. Instrumenting live memory metrics,
  2. Isolating the offending workflow, and
  3. Applying payload trimming, streaming, GC, execution limits, and a graceful‑restart policy,

you can keep RSS stable, prevent OOM kills, and maintain a reliable production deployment. Align Docker memory limits with V8’s --max-old-space-size and let Prometheus‑driven alerts handle the rest—your n8n instance stays healthy without endless manual restarts.

Leave a Comment

Your email address will not be published. Required fields are marked *