Fix 5 n8n Monitoring Dashboard Setup Issues Fast

Step by Step Guide to solve monitoring dashboard setup 
Step by Step Guide to solve monitoring dashboard setup


Who this is for: DevOps engineers, SREs, and n8n administrators who need production‑grade observability for workflow latency, error rates, and resource usage. We cover this in detail in the n8n Performance & Scaling Guide.


Quick Diagnosis

  • Problem: Without real‑time metrics you can’t troubleshoot n8n performance or plan capacity.
  • Featured‑snippet solution: Deploy the official n8n Prometheus exporter, scrape its /metrics endpoint with Prometheus, and connect a Grafana data source to visualize the key metrics on a ready‑made dashboard.

1. Prerequisites & Environment Checklist

Item Description Recommended Version
n8n instance Running (Docker, Kubernetes, or binary) ≥ 0.230
Prometheus server Collector for metrics 2.45+
Grafana UI Dashboard visualizer 10.2+
Network access n8n /metrics reachable from Prometheus
Optional: Alertmanager For alerts on SLA breaches 0.27+

EEFA note – In production, isolate the Prometheus endpoint behind a firewall or protect it with basic auth to avoid exposing internal metrics publicly. If you encounter any logging optimization resolve them before continuing with the setup.


2. Enable the n8n Prometheus Exporter

2.1 Docker Compose (most common)

Add the exporter configuration to your docker‑compose.yml:

version: "3.8"
services:
  n8n:
    image: n8nio/n8n:latest
    ports:
      - "5678:5678"
    environment:
      - N8N_METRICS_ENABLED=true
      - N8N_METRICS_PORT=9464
    expose:
      - "9464"

EEFA warning – Do not expose port 9464 to the public internet. Keep it on an internal Docker network or front it with a reverse‑proxy that enforces authentication.

2.2 Kubernetes (Helm)

Enable the exporter via Helm values:

helm upgrade --install n8n n8n/n8n \
  --set metrics.enabled=true \
  --set metrics.port=9464 \
  --set service.annotations."prometheus\.io/scrape"="true" \
  --set service.annotations."prometheus\.io/port"="9464"

The chart automatically adds the required Prometheus annotations. If you encounter any benchmarking tools resolve them before continuing with the setup.


3. Configure Prometheus to Scrape n8n

3.1 Add a scrape job

Insert the following into prometheus.yml (or use the UI → *Configuration* → *Add Scrape Target*):

scrape_configs:
  - job_name: 'n8n'
    static_configs:
      - targets: ['n8n:9464']
    metrics_path: /metrics
    scheme: http

3.2 Reload Prometheus

docker exec prometheus kill -HUP 1

EEFA tip – Use relabel_configs to drop internal metrics you never query; this reduces storage bloat.

3.3 Verify the scrape

A quick curl against the Prometheus API confirms health:

curl http://localhost:9090/api/v1/targets \
  | jq '.data.activeTargets[] | select(.scrapeUrl|contains("n8n"))'

You should see "health":"up" and a recent scrape timestamp.


4. Build the Grafana Dashboard

4.1 Add Prometheus as a data source

  1. Configuration → Data Sources → Add data source
  2. Choose Prometheus
  3. URL: http://prometheus:9090 (adjust to your network)
  4. Click Save & Test – you should see *Data source is working*.

4.2 Import the JSON dashboard (split for readability)

4.2.1 Dashboard metadata & templating

{
  "dashboard": {
    "title": "n8n Monitoring",
    "uid": "n8n-monitoring",
    "templating": {
      "list": [
        {
          "name": "job",
          "type": "query",
          "datasource": "Prometheus",
          "query": "label_values(n8n_workflow_executions_total, job)",
          "refresh": 1,
          "includeAll": false
        }
      ]
    },

4.2.2 Core panels

    "panels": [
      {
        "type": "graph",
        "title": "Workflow Execution Rate (per min)",
        "datasource": "Prometheus",
        "targets": [{ "expr": "rate(n8n_workflow_executions_total[1m])", "legendFormat": "Exec/min" }],
        "gridPos": {"x":0,"y":0,"w":12,"h":8}
      },
      {
        "type": "stat",
        "title": "Active Workers",
        "datasource": "Prometheus",
        "targets": [{ "expr": "n8n_worker_active", "legendFormat": "Workers" }],
        "gridPos": {"x":12,"y":0,"w":6,"h":4}
      },
      {
        "type": "graph",
        "title": "CPU Usage (%)",
        "datasource": "Prometheus",
        "targets": [{ "expr": "rate(process_cpu_seconds_total{job=\"n8n\"}[30s]) * 100", "legendFormat": "CPU %" }],
        "gridPos": {"x":12,"y":4,"w":12,"h":8}
      },
      {
        "type": "graph",
        "title": "Memory RSS (bytes)",
        "datasource": "Prometheus",
        "targets": [{ "expr": "process_resident_memory_bytes{job=\"n8n\"}", "legendFormat": "RSS" }],
        "gridPos": {"x":0,"y":8,"w":12,"h":8}
      },
      {
        "type": "table",
        "title": "Top 5 Slowest Workflows (last 5 min)",
        "datasource": "Prometheus",
        "targets": [{
          "expr": "topk(5, avg_over_time(n8n_workflow_execution_duration_seconds[5m]))",
          "format": "table",
          "legendFormat": "{{workflow_id}}"
        }],
        "gridPos": {"x":12,"y":12,"w":12,"h":8}
      }
    ]
  },
  "overwrite": true
}

Copy the full JSON (metadata + panels) into **Dashboard → Manage → Import → Upload JSON file. If you encounter any security impact on performance resolve them before continuing with the setup.

4.3 Customising panels

Panel Core metric Recommended alert
Workflow Execution Rate rate(n8n_workflow_executions_total[1m]) < 5 exec/min → Warning
Active Workers n8n_worker_active = 0 → Critical
CPU Usage rate(process_cpu_seconds_total[30s]) * 100 > 80 % for 5 min → Warning
Memory RSS process_resident_memory_bytes > 80 % of container limit → Critical
Slowest Workflows avg_over_time(n8n_workflow_execution_duration_seconds[5m]) > 30 s → Info

EEFA note – In Kubernetes use container_cpu_usage_seconds_total and container_memory_working_set_bytes instead of the generic process_* metrics to avoid double‑counting across pods.


5. Alerting with Prometheus Alertmanager

5.1 Define alert rules

groups:
  - name: n8n.alerts
    rules:
      - alert: n8nHighCPU
        expr: rate(process_cpu_seconds_total{job="n8n"}[2m]) * 100 > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on n8n"
          description: "CPU usage has been above 80 % for the last 5 minutes."
      - alert: n8nNoWorkers
        expr: n8n_worker_active == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "No active n8n workers"
          description: "All worker processes are down – workflows will not execute."

5.2 Wire the rule file into Prometheus

rule_files:
  - "/etc/prometheus/alert.rules.yml"

5.3 Configure Alertmanager (example: Slack webhook)

receivers:
  - name: slack
    slack_configs:
      - webhook_url: https://hooks.slack.com/services/XXXXX/XXXXX/XXXXX
        channel: "#alerts"
route:
  receiver: slack
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h

EEFA caution – Add a maintenance “silence” window for scheduled deployments; otherwise you’ll generate alert fatigue.


6. Advanced Troubleshooting & Performance Tuning

Symptom Likely cause (metric) Quick fix
Spike in n8n_workflow_execution_duration_seconds DB latency (pg_stat_activity high) Increase DB pool (N8N_DB_MAX_CONNECTIONS)
n8n_worker_active drops to 0 OOM kill of worker container Raise memory limit or enable swap (if allowed)
Prometheus scrape errors 401 Unauthorized on /metrics Verify exporter auth; add basic_auth to scrape job
Grafana panel shows “NaN” Metric name typo Check metric list via http://n8n:9464/metrics

Tip – Keep a “Metrics health” dashboard that only shows up{job="n8n"} and scrape_duration_seconds{job="n8n"}. If those go red, the monitoring stack itself needs attention before investigating downstream symptoms.


Conclusion

By exposing the built‑in n8n Prometheus exporter, configuring Prometheus to scrape it, and importing a purpose‑built Grafana dashboard, you gain real‑time visibility into workflow throughput, worker health, CPU, and memory consumption. Coupled with concise Alertmanager rules, this stack provides early warning of performance regressions and ensures your automation pipelines stay reliable in production. Implement the steps above, tailor alert thresholds to your SLAs, and you’ll have a production‑grade observability solution without unnecessary complexity.

Leave a Comment

Your email address will not be published. Required fields are marked *