Who this is for: Kubernetes operators who need a reliable, production‑grade n8n deployment that can automatically handle traffic spikes without overspending. We cover this in detail in the n8n Performance & Scaling Guide.
Quick Diagnosis
Problem: n8n runs out of CPU/memory or cannot keep up with traffic spikes.
Solution: Deploy n8n with explicit resources.requests / resources.limits, enable a Horizontal Pod Autoscaler (HPA), and run multiple replicas. This keeps the workflow engine responsive while respecting budget constraints.
One‑Minute Deployment Checklist
| Steps | Action |
|---|---|
| 1 | Create a dedicated n8n namespace. |
| 2 | Apply the n8n Deployment YAML with resource requests/limits. |
| 3 | Expose n8n via a ClusterIP Service (or Ingress). |
| 4 | Deploy a Horizontal Pod Autoscaler targeting CPU % (e.g., 60%). |
| 5 | Verify scaling with kubectl get hpa -n n8n. |
| 6 | (Optional) Add a PodDisruptionBudget for high‑availability. |
Run the checklist after each change to confirm the scaling pipeline is healthy. If you encounter any autoscaling aws ecs resolve them before continuing with the setup.
1. Prerequisites & Environment Setup
| Requirement | Detail |
|---|---|
| Kubernetes version | ≥ 1.22 (supports autoscaling/v2 metrics) |
| Cluster autoscaler | Enabled on the node pool (e.g., GKE, EKS, AKS) |
| Metrics Server | Installed (kubectl get apiservice v1beta1.metrics.k8s.io) |
| n8n version | 0.240.0 (or latest stable) |
| Namespace | n8n (isolates resources) |
# 1️⃣ Create the namespace kubectl create namespace n8n # 2️⃣ Verify Metrics Server is running kubectl get pods -n kube-system | grep metrics-server
EEFA Note – In production, lock the namespace to a specific node selector or taint/toleration pair to avoid noisy‑neighbor interference. If you encounter any horizontal scaling with redis queue resolve them before continuing with the setup.
2. Deploy n8n with Resource Requests & Limits
2.1 Deployment – container definition (4‑line snippet)
- name: n8n image: n8nio/n8n:latest ports: - containerPort: 5678
2.2 Deployment – environment & resource limits
env:
- name: DB_TYPE
value: "sqlite"
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "1Gi"
2.3 Deployment – probes and replica settings
readinessProbe:
httpGet:
path: /healthz
port: 5678
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /healthz
port: 5678
initialDelaySeconds: 30
periodSeconds: 30
replicas: 2 # baseline replication
Apply the full manifest (saved as n8n-deployment.yaml).
kubectl apply -f n8n-deployment.yaml
2.4 Service – expose the deployment
If you encounter any load balancer setup resolve them before continuing with the setup.
apiVersion: v1
kind: Service
metadata:
name: n8n-svc
namespace: n8n
spec:
selector:
app: n8n
ports:
- protocol: TCP
port: 80
targetPort: 5678
type: ClusterIP # change to LoadBalancer or Ingress as needed
kubectl apply -f n8n-service.yaml
EEFA – Tune resources.limits after observing peak load. Over‑committing CPU leads to throttling; excessive memory limits raise eviction risk on OOM events.
3. Configure Horizontal Pod Autoscaler (HPA)
3.1 HPA: core scaling parameters
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: n8n-hpa
namespace: n8n
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: n8n
3.2 HPA: replica range and CPU target
minReplicas: 2
maxReplicas: 12
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60 # target CPU % per pod
3.3 HPA: scaling behavior policies
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 30
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 60
Deploy the HPA:
kubectl apply -f n8n-hpa.yaml
Verify it is active:
kubectl get hpa n8n-hpa -n n8n
Typical output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE n8n-hpa Deployment/n8n 45%/60% 2 12 2 3m
EEFA – If the HPA never scales, confirm the Metrics Server is delivering CPU metrics (kubectl top pods -n n8n). Also ensure the node pool’s autoscaling flag is on; otherwise pods will remain pending.
4. High‑Availability Enhancements
| Feature | Why It Matters | Implementation |
|---|---|---|
| PodDisruptionBudget (PDB) | Guarantees at least *N* pods stay running during node drains/upgrades. | See snippet below. |
| Cluster Autoscaler | Adds nodes when pod requests exceed current capacity. | Enable on cloud provider; set --scale-up-from-zero=true for spot instances. |
| Readiness/Liveness Probes | Prevents traffic to unhealthy pods; speeds up rollouts. | Already defined in Deployment manifest. |
4.1 PodDisruptionBudget – keep one pod alive
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: n8n-pdb
namespace: n8n
spec:
minAvailable: 1 # keep at least one pod up
selector:
matchLabels:
app: n8n
kubectl apply -f n8n-pdb.yaml
5. Monitoring, Alerting, and Observability
| Tool | Recommended Config |
|---|---|
| Prometheus | Scrape /metrics endpoint (expose via ServiceMonitor). |
| Grafana Dashboard | Import community “n8n Workflow Metrics” (ID 21584). |
| Alertmanager | Alert when CPUUtilization > 80% for >5 min or when ReplicaCount < desired. |
5.1 PrometheusRule – high‑CPU alert
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: n8n-scaling-alerts
namespace: n8n
spec:
groups:
- name: n8n.autoscaling
rules:
- alert: N8nHighCPUUtilization
expr: avg_over_time(container_cpu_usage_seconds_total{namespace="n8n",pod=~"n8n-.*"}[2m]) /
avg_over_time(kube_pod_container_resource_requests_cpu_cores{namespace="n8n",pod=~"n8n-.*"}[2m]) > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "n8n pod CPU usage > 85%"
description: "Pod {{ $labels.pod }} is consistently above the CPU target. Consider increasing limits or checking workflow load."
EEFA – In a multi‑tenant cluster, isolate n8n metrics with a dedicated ServiceMonitor to avoid label collisions.
6. Troubleshooting Common Scaling Issues
| Symptom | Likely Cause | Fix |
|---|---|---|
| HPA never scales up | Metrics Server missing or CPU request too low | Install/upgrade Metrics Server; raise resources.requests.cpu (e.g., 250m). |
| Pods stay Pending | Cluster autoscaler disabled or node pool full | Enable autoscaler, increase max node count, or adjust node selector/taints. |
| Frequent pod restarts | Memory limit too low → OOMKill | Increase resources.limits.memory by 25‑50 % after reviewing logs. |
| Scale‑down stalls | Stabilization window too long | Reduce behavior.scaleDown.stabilizationWindowSeconds (minimum 300 s recommended). |
| HPA oscillates (flapping) | Aggressive scaling policies | Adjust scaleUp/scaleDown policies to limit pods per interval. |
EEFA Tip – Test scaling in a staging namespace before production. Pause rollouts (kubectl rollout pause deployment/n8n -n n8n) while tweaking the HPA to avoid unwanted traffic spikes.
7. Advanced: Custom Metrics for Workflow‑Based Scaling
When CPU alone doesn’t reflect n8n load (e.g., many lightweight workflows), expose a custom metric that reports the number of active workflow executions.
- Expose metric – add a
/metricsendpoint that emitsn8n_workflows_active. - Register the metric with the Kubernetes Custom Metrics API via a metrics adapter.
- Update HPA to target the custom metric.
7.1 HPA snippet using a custom metric
metrics:
- type: Pods
pods:
metric:
name: n8n_workflows_active
target:
type: AverageValue
averageValue: "50"
EEFA – Custom metrics require additional RBAC (system:auth-delegator). Ensure the adapter pod has the necessary permissions.
Conclusion
Deploying n8n with explicit resource requests, a well‑tuned HPA, and high‑availability safeguards ensures the workflow engine scales predictably under load while staying within budget. By:
- Defining realistic
requests/limits - Enabling a CPU‑targeted HPA with sensible scaling policies
- Adding a PodDisruptionBudget and leveraging the cluster autoscaler
you create a production‑ready n8n service that automatically adapts to traffic spikes, avoids OOM kills, and remains observable through Prometheus alerts. Apply the checklist, monitor the metrics, and iterate on limits as real‑world usage evolves.



