Who this is for: DevOps engineers, SREs, and n8n administrators who need production‑grade observability for workflow latency, error rates, and resource usage. We cover this in detail in the n8n Performance & Scaling Guide.
Quick Diagnosis
- Problem: Without real‑time metrics you can’t troubleshoot n8n performance or plan capacity.
- Featured‑snippet solution: Deploy the official n8n Prometheus exporter, scrape its
/metricsendpoint with Prometheus, and connect a Grafana data source to visualize the key metrics on a ready‑made dashboard.
1. Prerequisites & Environment Checklist
| Item | Description | Recommended Version |
|---|---|---|
| n8n instance | Running (Docker, Kubernetes, or binary) | ≥ 0.230 |
| Prometheus server | Collector for metrics | 2.45+ |
| Grafana UI | Dashboard visualizer | 10.2+ |
| Network access | n8n /metrics reachable from Prometheus |
– |
| Optional: Alertmanager | For alerts on SLA breaches | 0.27+ |
EEFA note – In production, isolate the Prometheus endpoint behind a firewall or protect it with basic auth to avoid exposing internal metrics publicly. If you encounter any logging optimization resolve them before continuing with the setup.
2. Enable the n8n Prometheus Exporter
2.1 Docker Compose (most common)
Add the exporter configuration to your docker‑compose.yml:
version: "3.8"
services:
n8n:
image: n8nio/n8n:latest
ports:
- "5678:5678"
environment:
- N8N_METRICS_ENABLED=true
- N8N_METRICS_PORT=9464
expose:
- "9464"
EEFA warning – Do not expose port 9464 to the public internet. Keep it on an internal Docker network or front it with a reverse‑proxy that enforces authentication.
2.2 Kubernetes (Helm)
Enable the exporter via Helm values:
helm upgrade --install n8n n8n/n8n \ --set metrics.enabled=true \ --set metrics.port=9464 \ --set service.annotations."prometheus\.io/scrape"="true" \ --set service.annotations."prometheus\.io/port"="9464"
The chart automatically adds the required Prometheus annotations. If you encounter any benchmarking tools resolve them before continuing with the setup.
3. Configure Prometheus to Scrape n8n
3.1 Add a scrape job
Insert the following into prometheus.yml (or use the UI → *Configuration* → *Add Scrape Target*):
scrape_configs:
- job_name: 'n8n'
static_configs:
- targets: ['n8n:9464']
metrics_path: /metrics
scheme: http
3.2 Reload Prometheus
docker exec prometheus kill -HUP 1
EEFA tip – Use
relabel_configsto drop internal metrics you never query; this reduces storage bloat.
3.3 Verify the scrape
A quick curl against the Prometheus API confirms health:
curl http://localhost:9090/api/v1/targets \
| jq '.data.activeTargets[] | select(.scrapeUrl|contains("n8n"))'
You should see "health":"up" and a recent scrape timestamp.
4. Build the Grafana Dashboard
4.1 Add Prometheus as a data source
- Configuration → Data Sources → Add data source
- Choose Prometheus
- URL:
http://prometheus:9090(adjust to your network) - Click Save & Test – you should see *Data source is working*.
4.2 Import the JSON dashboard (split for readability)
4.2.1 Dashboard metadata & templating
{
"dashboard": {
"title": "n8n Monitoring",
"uid": "n8n-monitoring",
"templating": {
"list": [
{
"name": "job",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(n8n_workflow_executions_total, job)",
"refresh": 1,
"includeAll": false
}
]
},
4.2.2 Core panels
"panels": [
{
"type": "graph",
"title": "Workflow Execution Rate (per min)",
"datasource": "Prometheus",
"targets": [{ "expr": "rate(n8n_workflow_executions_total[1m])", "legendFormat": "Exec/min" }],
"gridPos": {"x":0,"y":0,"w":12,"h":8}
},
{
"type": "stat",
"title": "Active Workers",
"datasource": "Prometheus",
"targets": [{ "expr": "n8n_worker_active", "legendFormat": "Workers" }],
"gridPos": {"x":12,"y":0,"w":6,"h":4}
},
{
"type": "graph",
"title": "CPU Usage (%)",
"datasource": "Prometheus",
"targets": [{ "expr": "rate(process_cpu_seconds_total{job=\"n8n\"}[30s]) * 100", "legendFormat": "CPU %" }],
"gridPos": {"x":12,"y":4,"w":12,"h":8}
},
{
"type": "graph",
"title": "Memory RSS (bytes)",
"datasource": "Prometheus",
"targets": [{ "expr": "process_resident_memory_bytes{job=\"n8n\"}", "legendFormat": "RSS" }],
"gridPos": {"x":0,"y":8,"w":12,"h":8}
},
{
"type": "table",
"title": "Top 5 Slowest Workflows (last 5 min)",
"datasource": "Prometheus",
"targets": [{
"expr": "topk(5, avg_over_time(n8n_workflow_execution_duration_seconds[5m]))",
"format": "table",
"legendFormat": "{{workflow_id}}"
}],
"gridPos": {"x":12,"y":12,"w":12,"h":8}
}
]
},
"overwrite": true
}
Copy the full JSON (metadata + panels) into **Dashboard → Manage → Import → Upload JSON file. If you encounter any security impact on performance resolve them before continuing with the setup.
4.3 Customising panels
| Panel | Core metric | Recommended alert |
|---|---|---|
| Workflow Execution Rate | rate(n8n_workflow_executions_total[1m]) |
< 5 exec/min → Warning |
| Active Workers | n8n_worker_active |
= 0 → Critical |
| CPU Usage | rate(process_cpu_seconds_total[30s]) * 100 |
> 80 % for 5 min → Warning |
| Memory RSS | process_resident_memory_bytes |
> 80 % of container limit → Critical |
| Slowest Workflows | avg_over_time(n8n_workflow_execution_duration_seconds[5m]) |
> 30 s → Info |
EEFA note – In Kubernetes use
container_cpu_usage_seconds_totalandcontainer_memory_working_set_bytesinstead of the genericprocess_*metrics to avoid double‑counting across pods.
5. Alerting with Prometheus Alertmanager
5.1 Define alert rules
groups:
- name: n8n.alerts
rules:
- alert: n8nHighCPU
expr: rate(process_cpu_seconds_total{job="n8n"}[2m]) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on n8n"
description: "CPU usage has been above 80 % for the last 5 minutes."
- alert: n8nNoWorkers
expr: n8n_worker_active == 0
for: 2m
labels:
severity: critical
annotations:
summary: "No active n8n workers"
description: "All worker processes are down – workflows will not execute."
5.2 Wire the rule file into Prometheus
rule_files: - "/etc/prometheus/alert.rules.yml"
5.3 Configure Alertmanager (example: Slack webhook)
receivers:
- name: slack
slack_configs:
- webhook_url: https://hooks.slack.com/services/XXXXX/XXXXX/XXXXX
channel: "#alerts"
route:
receiver: slack
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
EEFA caution – Add a maintenance “silence” window for scheduled deployments; otherwise you’ll generate alert fatigue.
6. Advanced Troubleshooting & Performance Tuning
| Symptom | Likely cause (metric) | Quick fix |
|---|---|---|
Spike in n8n_workflow_execution_duration_seconds |
DB latency (pg_stat_activity high) |
Increase DB pool (N8N_DB_MAX_CONNECTIONS) |
n8n_worker_active drops to 0 |
OOM kill of worker container | Raise memory limit or enable swap (if allowed) |
| Prometheus scrape errors | 401 Unauthorized on /metrics |
Verify exporter auth; add basic_auth to scrape job |
| Grafana panel shows “NaN” | Metric name typo | Check metric list via http://n8n:9464/metrics |
Tip – Keep a “Metrics health” dashboard that only shows up{job="n8n"} and scrape_duration_seconds{job="n8n"}. If those go red, the monitoring stack itself needs attention before investigating downstream symptoms.
Conclusion
By exposing the built‑in n8n Prometheus exporter, configuring Prometheus to scrape it, and importing a purpose‑built Grafana dashboard, you gain real‑time visibility into workflow throughput, worker health, CPU, and memory consumption. Coupled with concise Alertmanager rules, this stack provides early warning of performance regressions and ensures your automation pipelines stay reliable in production. Implement the steps above, tailor alert thresholds to your SLAs, and you’ll have a production‑grade observability solution without unnecessary complexity.



