<figure class="wp-block-image aligncenter"><img src="https://flowgenius.in/wp-content/uploads/2026/01/production-grade-n8n-architecture.png" alt="Step by Step Guide to solve production grade n8n architecture" /> <figcaption style="text-align: center;">Step by Step Guide to solve production grade n8n architecture</p>
<hr />
</figcaption></figure>
<p style="margin-bottom: 2em; line-height: 1.9;"><strong>Who this is for:</strong> SREs, DevOps engineers, and backend developers who need a reliable, horizontally‑scalable n8n deployment in production. <strong>We cover this in detail in the </strong>Production‑Grade n8n Architecture.</p>
<hr style="border: none; margin: 55px 0;" />
<h2 style="margin-bottom: 45px; line-height: 1.3;">Quick Diagnosis</h2>
<p style="margin-bottom: 2em; line-height: 1.9;">Your n8n workflows are missing executions, exposing credentials, or failing under load. The root cause is typically a <strong>stateful, single‑node setup</strong>. Re‑architect with a stateless execution layer, external PostgreSQL, a durable queue, and proper monitoring.</p>
<blockquote style="margin: 0 0 2em 0; padding-left: 1em; border-left: 4px solid #e0e0e0;">
<p style="margin: 0; line-height: 1.9;"><strong>TL;DR</strong> – Deploy n8n on Kubernetes (or Docker‑Compose for small scale) with PostgreSQL, Redis, Prometheus‑Grafana, TLS, RBAC, and daily backups.<br />
<em>In production you’ll notice it once you have more than a handful of concurrent runs.</em></p>
</blockquote>
<hr style="border: none; margin: 55px 0;" />
<h2 style="margin-bottom: 45px; line-height: 1.3;">1. Core Production Requirements</h2>
<p>If you encounter any <a href="/n8n-architecture-anti-patterns">n8n architecture anti patterns </a>resolve them before continuing with the setup.</p>
<table style="border-collapse: collapse; width: 100%; margin-bottom: 2em;">
<thead>
<tr>
<th style="border: 1px solid #e0e0e0; padding: 13px; text-align: left;">Requirement</th>
<th style="border: 1px solid #e0e0e0; padding: 13px; text-align: left;">Why It Matters</th>
<th style="border: 1px solid #e0e0e0; padding: 13px; text-align: left;">n8n Default</th>
<th style="border: 1px solid #e0e0e0; padding: 13px; text-align: left;">Production‑Ready Alternative</th>
</tr>
</thead>
<tbody>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Stateless execution</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Enables horizontal scaling and zero‑downtime deploys</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">In‑process execution (single‑process)</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Separate <strong>worker</strong> pods/containers that pull jobs from a queue</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Durable data store</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Guarantees workflow state, credentials, logs</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">SQLite (file‑based)</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;"><strong>PostgreSQL 13+</strong> (managed or self‑hosted)</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Reliable job queue</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Prevents lost executions when the API crashes</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">In‑memory (no queue)</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;"><strong>Redis</strong> (or RabbitMQ) as a broker for <code>EXECUTIONS_MODE=queue</code></td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">TLS & Auth</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Protects credentials in transit and at rest</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Optional, self‑signed</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Ingress TLS + OAuth2 / API‑Key enforcement</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">High‑availability (HA)</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">No single point of failure</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Single pod</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Replicated DB, multiple workers, load‑balanced API</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Observability</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Early detection of bottlenecks & failures</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Minimal logs</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;"><strong>Prometheus</strong> metrics + <strong>Grafana</strong> dashboards + structured logging</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Backup & DR</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Prevent data loss</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Manual file copy</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Automated <strong>PGDump</strong> + WAL archiving, point‑in‑time restore</td>
</tr>
</tbody>
</table>
<blockquote style="margin: 0 0 2em 0; padding-left: 1em; border-left: 4px solid #e0e0e0;">
<p style="margin: 0; line-height: 1.9;"><strong>EEFA Note:</strong> Switching from SQLite to PostgreSQL requires a data migration (<code>n8n export:db</code> → import). Perform this in a maintenance window to avoid corrupting running workflows.</p>
</blockquote>
<p style="margin-bottom: 2em; line-height: 1.9;"><em>Most teams hit these gaps after a few weeks of traffic, not on day one.</em></p>
<hr style="border: none; margin: 55px 0;" />
<h2 style="margin-bottom: 45px; line-height: 1.3;">2. Architecture Blueprint</h2>
<p style="margin-bottom: 2em; line-height: 1.9;">The diagram shows a stateless API that enqueues work, worker pods that process jobs, and a resilient data layer. If you encounter any <a href="/n8n-control-plane-data-plane">n8n control plane data plane </a>resolve them before continuing with the setup.</p>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;">+-------------------+ +-------------------+ +-------------------+
| Ingress (TLS) | <---> | n8n API Service | <---> | Redis Queue |
+-------------------+ +-------------------+ +-------------------+
^ |
| v
+-------------------+
| n8n Worker Pods |
+-------------------+
|
v
+-------------------+
| PostgreSQL (HA) |
+-------------------+
|
v
+-------------------+
| Prometheus/Grafana|
+-------------------+
</pre>
<p style="margin-bottom: 2em; line-height: 1.9;">A quick walk‑through: the <strong>Ingress</strong> terminates TLS and hands traffic to the API service. The API writes execution requests to <strong>Redis</strong>. <strong>Worker Pods</strong> pull from the queue, run the workflow, and persist results to <strong>PostgreSQL</strong>. Prometheus scrapes metrics from both API and workers for Grafana to visualise.</p>
<hr style="border: none; margin: 55px 0;" />
<h2 style="margin-bottom: 45px; line-height: 1.3;">3. Deployment Options</h2>
<p>If you encounter any <a href="/n8n-multi-tenant-architecture">n8n multi tenant architecture </a>resolve them before continuing with the setup.</p>
<table style="border-collapse: collapse; width: 100%; margin-bottom: 2em;">
<thead>
<tr>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Platform</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Pros</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Cons</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">When to Choose</th>
</tr>
</thead>
<tbody>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Docker‑Compose (single‑node)</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Quick start, easy local dev, low cost</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">No native HA, manual scaling, limited monitoring</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">PoC, < 10 concurrent executions</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Docker Swarm</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Built‑in service replication, simple networking</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Declining community support, limited autoscaling</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Small‑to‑mid teams already on Swarm</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Kubernetes (Helm)</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Declarative, auto‑scaling, native secrets, robust ecosystem</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Higher operational overhead, learning curve</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Production, ≥ 20 concurrent executions, need for HA</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Managed n8n Cloud</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Zero‑ops, automatic backups, SLA</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Vendor lock‑in, less control over network topology</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Teams without ops resources, compliance OK with provider</td>
</tr>
</tbody>
</table>
<h3 style="margin-bottom: 45px; line-height: 1.3;">3.1 Helm Values – Core Settings</h3>
<p style="margin-bottom: 2em; line-height: 1.9;">Below are the essential Helm values split into focused snippets.</p>
<h4 style="margin-bottom: 45px; line-height: 1.3;">Environment variables (DB, queue, auth)</h4>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;">n8n:
env:
- name: DB_TYPE
value: postgres
- name: DB_POSTGRESDB_HOST
value: pg-n8n-primary
- name: DB_POSTGRESDB_PORT
value: "5432"
- name: DB_POSTGRESDB_DATABASE
value: n8n
- name: DB_POSTGRESDB_USER
valueFrom:
secretKeyRef:
name: n8n-pg-secret
key: username
- name: DB_POSTGRESDB_PASSWORD
valueFrom:
secretKeyRef:
name: n8n-pg-secret
key: password
- name: EXECUTIONS_MODE
value: queue
- name: QUEUE_BULL_REDIS_HOST
value: redis-n8n
- name: QUEUE_BULL_REDIS_PORT
value: "6379"
- name: N8N_BASIC_AUTH_ACTIVE
value: "true"
- name: N8N_BASIC_AUTH_USER
valueFrom:
secretKeyRef:
name: n8n-basic-auth
key: user
- name: N8N_BASIC_AUTH_PASSWORD
valueFrom:
secretKeyRef:
name: n8n-basic-auth
key: password
</pre>
<h4 style="margin-bottom: 45px; line-height: 1.3;">Resource limits and service definition</h4>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;"> resources:
limits:
cpu: "2"
memory: "2Gi"
requests:
cpu: "500m"
memory: "512Mi"
service:
type: ClusterIP
port: 5678
</pre>
<h4 style="margin-bottom: 45px; line-height: 1.3;">Worker replica count (horizontal scaling)</h4>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;">worker:
replicaCount: 3 # increase to meet concurrency needs
resources:
limits:
cpu: "1"
memory: "1Gi"
</pre>
<blockquote style="margin: 0 0 2em 0; padding-left: 1em; border-left: 4px solid #e0e0e0;">
<p style="margin: 0; line-height: 1.9;"><strong>EEFA Note:</strong> Start with modest <code>resources.limits</code>. Over‑provisioning workers can starve the API pod and cause request timeouts. Bumping the worker replica count is usually faster than hunting a hidden bottleneck.</p>
</blockquote>
<hr style="border: none; margin: 55px 0;" />
<h2 style="margin-bottom: 45px; line-height: 1.3;">4. Security & Compliance Checklist</h2>
<table style="border-collapse: collapse; width: 100%; margin-bottom: 2em;">
<thead>
<tr>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Item</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Implementation Detail</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Verification</th>
</tr>
</thead>
<tbody>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">TLS everywhere</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Use cert‑manager to provision certs for Ingress and internal services (Redis, PostgreSQL)</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;"><code>kubectl get secret <name>-tls</code></td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Least‑privilege DB user</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">PostgreSQL role with <code>SELECT, INSERT, UPDATE, DELETE</code> on <code>n8n</code> schema only</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;"><code>\du</code> in psql</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Credential encryption</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">n8n encrypts stored credentials with <code>ENCRYPTION_KEY</code>; store key in K8s secret, rotate quarterly</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;"><code>kubectl describe secret n8n-encryption</code></td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Network policies</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Deny all traffic, allow only API ↔ Worker, Worker ↔ Redis, API ↔ PostgreSQL</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;"><code>kubectl get netpol</code></td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Audit logging</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Enable PostgreSQL <code>log_line_prefix</code> and forward to Loki/ELK</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Check log entries for <code>INSERT INTO credential</code></td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">RBAC for API</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Enable <code>N8N_BASIC_AUTH_ACTIVE=true</code> and configure user/pass</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Test with <code>curl -u user:pass …</code> expecting 401 without header</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Secret management</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Use external secret store (AWS Secrets Manager, HashiCorp Vault) via CSI driver</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Verify secret rotation works without pod restart</td>
</tr>
</tbody>
</table>
<blockquote style="margin: 0 0 2em 0; padding-left: 1em; border-left: 4px solid #e0e0e0;">
<p style="margin: 0; line-height: 1.9;"><strong>EEFA Note:</strong> If the API is public, add rate‑limiting annotations to the Ingress to mitigate credential‑brute‑force attacks. In practice a modest limit cut down noisy scans dramatically.</p>
</blockquote>
<hr style="border: none; margin: 55px 0;" />
<h2 style="margin-bottom: 45px; line-height: 1.3;">5. High‑Availability & Disaster Recovery</h2>
<h3 style="margin-bottom: 45px; line-height: 1.3;">5.1 PostgreSQL HA Blueprint</h3>
<ol style="margin-bottom: 1.8em; line-height: 1.9;">
<li>Primary‑Replica (Patroni) – automatic failover within ~30 s.</li>
<li>WAL Archiving to an S3 bucket → enables point‑in‑time recovery.</li>
<li>Scheduled <code>pg_dump</code> (nightly) → stored in immutable object storage.</li>
</ol>
<p style="margin-bottom: 2em; line-height: 1.9;">Trigger a manual failover for testing:</p>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;">kubectl exec -n n8n -it patroni-0 -- patronictl -c /etc/patroni.yml failover
</pre>
<h3 style="margin-bottom: 45px; line-height: 1.3;">5.2 Worker Redundancy</h3>
<p style="margin-bottom: 2em; line-height: 1.9;">Deploy <strong>3+ replicas</strong> behind a ClusterIP service. With <code>EXECUTIONS_MODE=queue</code>, any worker can pick up pending jobs, ensuring no single point of failure.</p>
<h3 style="margin-bottom: 45px; line-height: 1.3;">5.3 Redis Persistence</h3>
<table style="border-collapse: collapse; width: 100%; margin-bottom: 2em;">
<thead>
<tr>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Persistence Mode</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Description</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Recommended</th>
</tr>
</thead>
<tbody>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">AOF (Append‑Only File)</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Logs every write operation; fast recovery</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">✔︎</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">RDB Snapshots</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Periodic full dumps; lower I/O</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">✖︎ (use only as secondary)</td>
</tr>
</tbody>
</table>
<p style="margin-bottom: 2em; line-height: 1.9;">Redis Helm values for AOF:</p>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;">appendonly: "yes"
save: "900 1" # snapshot every 15 min if at least 1 key changed
</pre>
<h3 style="margin-bottom: 45px; line-height: 1.3;">5.4 Disaster‑Recovery Runbook</h3>
<table style="border-collapse: collapse; width: 100%; margin-bottom: 2em;">
<thead>
<tr>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Step</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Action</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Owner</th>
</tr>
</thead>
<tbody>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">1</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Verify DB replica health (<code>patronictl list</code>)</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">DBA</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">2</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Spin up a fresh PostgreSQL from latest WAL archive</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Ops</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">3</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Re‑point n8n <code>DB_POSTGRESDB_HOST</code> to new primary (ConfigMap rollout)</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">DevOps</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">4</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Run health checks (<code>curl /healthz</code>) on API & workers</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">QA</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">5</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Validate workflow history in UI</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Product</td>
</tr>
</tbody>
</table>
<hr style="border: none; margin: 55px 0;" />
<h2 style="margin-bottom: 45px; line-height: 1.3;">6. Observability & Alerting</h2>
<h3 style="margin-bottom: 45px; line-height: 1.3;">6.1 Prometheus Exporter Annotations</h3>
<p style="margin-bottom: 2em; line-height: 1.9;">Add these annotations to the n8n deployment so Prometheus can scrape metrics:</p>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;">metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/path: "/metrics"
prometheus.io/port: "5678"
</pre>
<table style="border-collapse: collapse; width: 100%; margin-bottom: 2em;">
<thead>
<tr>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Metric</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Meaning</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Alert Threshold</th>
</tr>
</thead>
<tbody>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">n8n_executions_total</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Total executions processed</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">–</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">n8n_executions_failed_total</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Failed executions</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">> 5/min</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">n8n_queue_length</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Jobs waiting in Redis</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">> 100</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">process_resident_memory_bytes</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Memory per pod</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">> 80 % of limit</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">http_request_duration_seconds</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">API latency</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">p95 > 500 ms</td>
</tr>
</tbody>
</table>
<h3 style="margin-bottom: 45px; line-height: 1.3;">6.2 Sample Grafana Dashboard (JSON snippet)</h3>
<p style="margin-bottom: 2em; line-height: 1.9;">The following JSON defines two panels—queue depth and execution failures.</p>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;">{
"panels": [
{
"type": "graph",
"title": "Queue Depth",
"targets": [{ "expr": "n8n_queue_length" }]
},
{
"type": "graph",
"title": "Execution Failures",
"targets": [{ "expr": "rate(n8n_executions_failed_total[5m])" }]
}
]
}
</pre>
<p style="margin-bottom: 2em; line-height: 1.9;">Import this JSON into Grafana to get immediate visibility into bottlenecks.</p>
<hr style="border: none; margin: 55px 0;" />
<h2 style="margin-bottom: 45px; line-height: 1.3;">7. Cost‑Optimization Checklist</h2>
<table style="border-collapse: collapse; width: 100%; margin-bottom: 2em;">
<thead>
<tr>
<th style="border: 1px solid #e0e0e0; padding: 13px;">✔️ Item</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">How to Optimize</th>
</tr>
</thead>
<tbody>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Right‑sized workers</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Start with <code>CPU 0.5</code> / <code>Mem 512Mi</code>; auto‑scale based on <code>n8n_queue_length</code></td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Spot instances (k8s)</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Use node pools with spot/preemptible VMs for workers (stateless)</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Redis persistence tier</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Enable **AOF** only; disable RDB snapshots if storage cost is a concern</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">PostgreSQL storage</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Enable **storage autoscaling**; set <code>maxsize</code> to 2× expected data growth</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Turn off dev‑mode logs</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Set <code>N8N_LOG_LEVEL=error</code> in prod to reduce log volume</td>
</tr>
</tbody>
</table>
<hr style="border: none; margin: 55px 0;" />
<h2 style="margin-bottom: 45px; line-height: 1.3;">8. Step‑by‑Step Production Rollout (Docker‑Compose Example)</h2>
<p style="margin-bottom: 2em; line-height: 1.9;">Below the compose file is broken into logical service blocks for readability.</p>
<h3 style="margin-bottom: 45px; line-height: 1.3;">8.1 Database Service</h3>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;">db:
image: postgres:15-alpine
environment:
POSTGRES_DB: n8n
POSTGRES_USER: n8n
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- pg-data:/var/lib/postgresql/data
restart: unless-stopped
</pre>
<h3 style="margin-bottom: 45px; line-height: 1.3;">8.2 Redis Service</h3>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;">redis:
image: redis:7-alpine
command: ["redis-server", "--appendonly", "yes"]
volumes:
- redis-data:/data
restart: unless-stopped
</pre>
<h3 style="margin-bottom: 45px; line-height: 1.3;">8.3 API Service</h3>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;">api:
image: n8nio/n8n:1.30.0
environment:
- DB_TYPE=postgresdb
- DB_POSTGRESDB_HOST=db
- DB_POSTGRESDB_PORT=5432
- DB_POSTGRESDB_DATABASE=n8n
- DB_POSTGRESDB_USER=n8n
- DB_POSTGRESDB_PASSWORD=${POSTGRES_PASSWORD}
- EXECUTIONS_MODE=queue
- QUEUE_BULL_REDIS_HOST=redis
- QUEUE_BULL_REDIS_PORT=6379
- N8N_BASIC_AUTH_ACTIVE=true
- N8N_BASIC_AUTH_USER=${ADMIN_USER}
- N8N_BASIC_AUTH_PASSWORD=${ADMIN_PASS}
ports:
- "5678:5678"
depends_on:
- db
- redis
restart: unless-stopped
</pre>
<h3 style="margin-bottom: 45px; line-height: 1.3;">8.4 Worker Service</h3>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;">worker:
image: n8nio/n8n:1.30.0
command: ["n8n", "worker"]
environment: *api.environment # reuse the same env vars
depends_on:
- api
- redis
restart: unless-stopped
</pre>
<h3 style="margin-bottom: 45px; line-height: 1.3;">8.5 Compose Wrapper</h3>
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin-bottom: 2em;">version: "3.8"
services:
db: # defined above
redis: # defined above
api: # defined above
worker: # defined above
volumes:
pg-data:
redis-data:
</pre>
<p style="margin-bottom: 2em; line-height: 1.9;">**Rollout steps**</p>
<ol style="margin-bottom: 1.8em; line-height: 1.9;">
<li>Create the required secrets (<code>POSTGRES_PASSWORD</code>, <code>ADMIN_USER</code>, <code>ADMIN_PASS</code>).</li>
<li>Run <code>docker compose up -d</code>. The API becomes reachable at <code>https://<host>:5678</code>.</li>
<li>Verify the queue is working:
<pre style="background: #fafafa; padding: 20px; border: 1px solid #e0e0e0; overflow: auto; line-height: 1.9; margin: 0;">docker exec -it <redis_container> redis-cli LLEN n8n:queue
</pre>
</li>
<li>Scale workers as needed, e.g. <code>docker compose up -d --scale worker=4</code>.</li>
</ol>
<blockquote style="margin: 0 0 2em 0; padding-left: 1em; border-left: 4px solid #e0e0e0;">
<p style="margin: 0; line-height: 1.9;"><strong>EEFA Note:</strong> When stopping the API for upgrades, use <code>docker compose stop api</code> (graceful) to let in‑flight executions finish; a hard kill can lose partial runs.</p>
</blockquote>
<hr style="border: none; margin: 55px 0;" />
<h2 style="margin-bottom: 45px; line-height: 1.3;">9. Frequently Asked Production Questions</h2>
<table style="border-collapse: collapse; width: 100%; margin-bottom: 2em;">
<thead>
<tr>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Question</th>
<th style="border: 1px solid #e0e0e0; padding: 13px;">Short Answer</th>
</tr>
</thead>
<tbody>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Can I keep SQLite for prod?</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;"><strong>No</strong> – it cannot survive pod restarts or scaling.</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Do I need both Redis and a queue?</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Yes. <code>EXECUTIONS_MODE=queue</code> requires a broker; otherwise only the API pod can run jobs.</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">How many workers do I need?</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Start with <code>workers = ceil(concurrent_executions / avg_execution_time_seconds)</code>. Adjust via autoscaler.</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Is n8n thread‑safe?</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Each worker runs a single Node.js event loop; concurrency is achieved by adding more workers, not threads.</td>
</tr>
<tr>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Can I use MySQL?</td>
<td style="border: 1px solid #e0e0e0; padding: 13px;">Supported, but PostgreSQL offers better JSONB performance for workflow payloads.</td>
</tr>
</tbody>
</table>
<hr style="border: none; margin: 55px 0;" />
<h2 style="margin-bottom: 45px; line-height: 1.3;">Conclusion</h2>
<p style="margin-bottom: 2em; line-height: 1.9;">Deploying n8n in production demands <strong>stateless execution</strong>, a <strong>robust PostgreSQL backend</strong>, a <strong>persistent queue</strong> (Redis), and <strong>full observability</strong>. By separating the API from worker pods, enforcing TLS/RBAC, and automating backups, you eliminate single points of failure and gain the ability to scale horizontally. Follow the checklist, use the provided Helm/Docker‑Compose snippets, and monitor key metrics to keep the system healthy. This architecture has been battle‑tested in real‑world pipelines and delivers reliable, secure workflow automation at scale.</p>
Step by Step Guide to solve production grade n8n architecture
Who this is for: SREs, DevOps engineers, and backend developers who need a reliable, horizontally‑scalable n8n deployment in production. We cover this in detail in the Production‑Grade n8n Architecture.
Quick Diagnosis
Your n8n workflows are missing executions, exposing credentials, or failing under load. The root cause is typically a stateful, single‑node setup. Re‑architect with a stateless execution layer, external PostgreSQL, a durable queue, and proper monitoring.
TL;DR – Deploy n8n on Kubernetes (or Docker‑Compose for small scale) with PostgreSQL, Redis, Prometheus‑Grafana, TLS, RBAC, and daily backups. In production you’ll notice it once you have more than a handful of concurrent runs.
EEFA Note: Switching from SQLite to PostgreSQL requires a data migration (n8n export:db → import). Perform this in a maintenance window to avoid corrupting running workflows.
Most teams hit these gaps after a few weeks of traffic, not on day one.
2. Architecture Blueprint
The diagram shows a stateless API that enqueues work, worker pods that process jobs, and a resilient data layer. If you encounter any n8n control plane data plane resolve them before continuing with the setup.
+-------------------+ +-------------------+ +-------------------+
| Ingress (TLS) | <---> | n8n API Service | <---> | Redis Queue |
+-------------------+ +-------------------+ +-------------------+
^ |
| v
+-------------------+
| n8n Worker Pods |
+-------------------+
|
v
+-------------------+
| PostgreSQL (HA) |
+-------------------+
|
v
+-------------------+
| Prometheus/Grafana|
+-------------------+
A quick walk‑through: the Ingress terminates TLS and hands traffic to the API service. The API writes execution requests to Redis. Worker Pods pull from the queue, run the workflow, and persist results to PostgreSQL. Prometheus scrapes metrics from both API and workers for Grafana to visualise.
EEFA Note: Start with modest resources.limits. Over‑provisioning workers can starve the API pod and cause request timeouts. Bumping the worker replica count is usually faster than hunting a hidden bottleneck.
4. Security & Compliance Checklist
Item
Implementation Detail
Verification
TLS everywhere
Use cert‑manager to provision certs for Ingress and internal services (Redis, PostgreSQL)
kubectl get secret <name>-tls
Least‑privilege DB user
PostgreSQL role with SELECT, INSERT, UPDATE, DELETE on n8n schema only
\du in psql
Credential encryption
n8n encrypts stored credentials with ENCRYPTION_KEY; store key in K8s secret, rotate quarterly
kubectl describe secret n8n-encryption
Network policies
Deny all traffic, allow only API ↔ Worker, Worker ↔ Redis, API ↔ PostgreSQL
kubectl get netpol
Audit logging
Enable PostgreSQL log_line_prefix and forward to Loki/ELK
Check log entries for INSERT INTO credential
RBAC for API
Enable N8N_BASIC_AUTH_ACTIVE=true and configure user/pass
Test with curl -u user:pass … expecting 401 without header
Secret management
Use external secret store (AWS Secrets Manager, HashiCorp Vault) via CSI driver
Verify secret rotation works without pod restart
EEFA Note: If the API is public, add rate‑limiting annotations to the Ingress to mitigate credential‑brute‑force attacks. In practice a modest limit cut down noisy scans dramatically.
5. High‑Availability & Disaster Recovery
5.1 PostgreSQL HA Blueprint
Primary‑Replica (Patroni) – automatic failover within ~30 s.
WAL Archiving to an S3 bucket → enables point‑in‑time recovery.
Scheduled pg_dump (nightly) → stored in immutable object storage.
Scale workers as needed, e.g. docker compose up -d --scale worker=4.
EEFA Note: When stopping the API for upgrades, use docker compose stop api (graceful) to let in‑flight executions finish; a hard kill can lose partial runs.
9. Frequently Asked Production Questions
Question
Short Answer
Can I keep SQLite for prod?
No – it cannot survive pod restarts or scaling.
Do I need both Redis and a queue?
Yes. EXECUTIONS_MODE=queue requires a broker; otherwise only the API pod can run jobs.
How many workers do I need?
Start with workers = ceil(concurrent_executions / avg_execution_time_seconds). Adjust via autoscaler.
Is n8n thread‑safe?
Each worker runs a single Node.js event loop; concurrency is achieved by adding more workers, not threads.
Can I use MySQL?
Supported, but PostgreSQL offers better JSONB performance for workflow payloads.
Conclusion
Deploying n8n in production demands stateless execution, a robust PostgreSQL backend, a persistent queue (Redis), and full observability. By separating the API from worker pods, enforcing TLS/RBAC, and automating backups, you eliminate single points of failure and gain the ability to scale horizontally. Follow the checklist, use the provided Helm/Docker‑Compose snippets, and monitor key metrics to keep the system healthy. This architecture has been battle‑tested in real‑world pipelines and delivers reliable, secure workflow automation at scale.