
Who this is for: Kubernetes operators and DevOps engineers who run n8n in production and need a reliable, zero‑downtime queue‑mode deployment. We cover this in detail in the n8n Queue Mode Errors Guide.
Quick Diagnosis
| Symptom | Most‑likely cause | One‑line fix |
|---|---|---|
| Pods stay Pending or CrashLoopBackOff | N8N_QUEUE_MODE=true but EXECUTIONS_PROCESS=main |
Set EXECUTIONS_PROCESS=queue and add a dedicated worker deployment. |
| Workers never pick jobs | Redis service name/port mismatch | Align N8N_REDIS_HOST and N8N_REDIS_PORT with the actual Redis Service. |
| Workers are OOM‑killed | CPU/Memory limits too low for queue processing | Raise resources.limits to ≥ 500 MiB memory & ≥ 250 m CPU (adjust per load). |
| Liveness probe fails repeatedly | Probe timeout < job start‑up time | Increase initialDelaySeconds to 30 – 45 s and periodSeconds to 15 s. |
RBAC errors in logs (Forbidden…) |
ServiceAccount missing get, list on ConfigMaps/Secrets |
Add a Role/RoleBinding that grants configmaps & secrets access. |
Apply the step‑by‑step remediation workflow below to resolve any of the above errors in a production‑grade Kubernetes cluster.
1. Why a Separate Worker Deployment Matters ?
If you encounter any n8n queue mode ssl misconfiguration resolve them before continuing with the setup.
When N8N_QUEUE_MODE=true, the web server only enqueues execution payloads. A worker pod (or a set of workers) pulls jobs from Redis and runs them. Combining both roles in a single pod works for tiny workloads but fails under load.
| Issue when combined | Impact |
|---|---|
| Resource contention | Web‑server memory spikes kill workers |
| Pod restarts affect all traffic | A worker crash restarts the web container too |
| Horizontal scaling | `replicas` affect both roles simultaneously |
Best‑practice: Deploy n8n-web and n8n-worker as independent Deployments. Run at least **two worker replicas** behind a PodDisruptionBudget for zero‑downtime processing.
2. Misconfiguration #1 – Wrong EXECUTIONS_PROCESS Value
What happens ?
If EXECUTIONS_PROCESS stays main while N8N_QUEUE_MODE=true, the web pod still tries to execute jobs locally, causing duplicate execution errors and rapid OOM kills.
Required environment variables
| Variable | Value |
|---|---|
| N8N_QUEUE_MODE | true |
| EXECUTIONS_PROCESS | queue |
| EXECUTIONS_WORKER_COUNT | 1 (or higher) |
Web deployment – part 1 (metadata & selector)
apiVersion: apps/v1
kind: Deployment
metadata:
name: n8n-web
spec:
replicas: 1
selector:
matchLabels:
app: n8n-web
Web deployment – part 2 (pod template)
template:
metadata:
labels:
app: n8n-web
spec:
serviceAccountName: n8n-sa
Web deployment – part 3 (container & env)
containers:
- name: n8n
image: n8nio/n8n:latest
env:
- name: N8N_QUEUE_MODE
value: "true"
- name: EXECUTIONS_PROCESS
value: "queue"
- name: N8N_REDIS_HOST
value: "n8n-redis"
- name: N8N_REDIS_PORT
value: "6379"
Web deployment – part 4 (ports & resources)
ports:
- containerPort: 5678
resources:
limits:
memory: "512Mi"
cpu: "250m"
EEFA warning: Never set EXECUTIONS_PROCESS=main in a pod where N8N_QUEUE_MODE=true. The conflict generates “Execution already in progress” errors that are hard to debug. If you encounter any n8n queue mode docker compose issues resolve them before continuing with the setup.
3. Misconfiguration #2 – Redis Service Not Reachable
Typical symptom
[2023-10-01 12:34:56] Error: connect ECONNREFUSED 10.96.0.12:6379
Common root causes
| Cause | Typical mistake |
|---|---|
| Service name typo | N8N_REDIS_HOST=n8n-redis-svc while Service is n8n-redis |
| Port mismatch | Redis runs on 6380 (TLS) but pod uses default 6379 |
| Namespace mismatch | Redis Service lives in infra namespace, web pod in default |
Create a ClusterIP Redis Service (same namespace)
apiVersion: v1
kind: Service
metadata:
name: n8n-redis
spec:
selector:
app: n8n-redis
ports:
- port: 6379
targetPort: 6379
protocol: TCP
Cross‑namespace reference (if Redis is elsewhere)
- name: N8N_REDIS_HOST value: "n8n-redis.infra.svc.cluster.local"
EEFA tip: Enable Redis TLS in production (N8N_REDIS_TLS=true) and mount the CA cert as a secret. Reference it with N8N_REDIS_TLS_CA_CERT.
4. Misconfiguration #3 – Resource Limits Trigger OOM Kills
Why it matters
Queue workers often need more memory than the web container because they load external APIs, run heavy transformations, and keep large payloads in memory.
Recommended CPU resources
| Container | CPU request | CPU limit |
|---|---|---|
| n8n-web | 100m | 250m |
| n8n-worker | 250m | 500m |
Recommended Memory resources
| Container | Memory request | Memory limit |
|---|---|---|
| n8n-web | 256Mi | 512Mi |
| n8n-worker | 512Mi | 1Gi |
Worker deployment – part 1 (metadata & replicas)
apiVersion: apps/v1
kind: Deployment
metadata:
name: n8n-worker
spec:
replicas: 2
selector:
matchLabels:
app: n8n-worker
Worker deployment – part 2 (pod template)
template:
metadata:
labels:
app: n8n-worker
spec:
serviceAccountName: n8n-sa
Worker deployment – part 3 (container, env & resources)
containers:
- name: n8n
image: n8nio/n8n:latest
env:
- name: N8N_QUEUE_MODE
value: "true"
- name: EXECUTIONS_PROCESS
value: "worker"
- name: N8N_REDIS_HOST
value: "n8n-redis"
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
HorizontalPodAutoscaler for workers (4 lines)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: n8n-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: n8n-worker
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
EEFA caution: A memory limit lower than the worker’s peak usage triggers OOMKilled events that appear as “CrashLoopBackOff”. Monitor container_memory_working_set_bytes in Prometheus and raise the limit before hitting the threshold.
5. Misconfiguration #4 – Probes Too Aggressive
Typical failure
Readiness probe failed: Get http://10.244.1.5:5678/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Reason
Queue workers need a few seconds to bootstrap the Redis client and load the execution queue. A probe that starts at 5s with a 2s timeout kills the pod before it is ready.
Proven probe configuration (4‑line snippet)
livenessProbe:
httpGet:
path: /healthz
port: 5678
initialDelaySeconds: 30
periodSeconds: 15
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /healthz
port: 5678
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
EEFA note: If your worker image does not expose /healthz, replace the HTTP probe with a TCP check on the Redis port or an exec probe that runs pgrep -f "worker".
6. Misconfiguration #5 – Insufficient RBAC for ConfigMaps & Secrets
Error snippet
Error: EACCES: permission denied, getaddrinfo ENOTFOUND redis
The underlying cause is often the worker’s ServiceAccount lacking permission to read the Redis credentials stored in a Secret.
Minimal RBAC objects (split for readability)
ServiceAccount
apiVersion: v1 kind: ServiceAccount metadata: name: n8n-sa
Role (granting ConfigMap & Secret read)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: n8n-role
rules:
- apiGroups: [""]
resources: ["configmaps", "secrets"]
verbs: ["get", "list"]
RoleBinding (attach role to ServiceAccount)
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: n8n-rb
subjects:
- kind: ServiceAccount
name: n8n-sa
roleRef:
kind: Role
name: n8n-role
apiGroup: rbac.authorization.k8s.io
Attach serviceAccountName: n8n-sa to both the web and worker pods.
EEFA reminder: For clusters using PodSecurityPolicies or OPA Gatekeeper, ensure the ServiceAccount is allowed to run as UID 1000 (the default n8n UID).
7. Diagnostic Checklist – Quick Copy‑Paste for On‑Call Engineers
| Check | Command / Manifest | Expected result |
|---|---|---|
| Queue mode env vars set correctly | kubectl exec -ti <web-pod> — printenv | grep N8N_ | N8N_QUEUE_MODE=true and EXECUTIONS_PROCESS=queue |
| Redis reachable from pod | kubectl exec -ti <worker-pod> — nc -zv n8n-redis 6379 | Connection to n8n-redis 6379 port [tcp/*] succeeded! |
| Worker pod has enough memory | kubectl top pod <worker-pod> | MEMORY ≤ memory.limit |
| Probes are not failing | kubectl describe pod <worker-pod> | No Liveness/Readiness failures in events |
| ServiceAccount can read secret | kubectl auth can-i get secret n8n-redis-secret –as=system:serviceaccount:default:n8n-sa | yes |
8. Step‑by‑Step Remediation Workflow
- Validate environment variables – Run the env‑check command above. Edit the
Deploymentif any variable is missing or wrong, thenkubectl apply -f <file>.yaml. - Test Redis connectivity – Use
ncorredis-cli. If unreachable, verify Service name, namespace, and DNS (nslookup n8n-redis). - Adjust resources – Increase
resources.limitsin the worker Deployment, thenkubectl rollout restart deployment/n8n-worker. - Tune probes – Apply the probe snippet, then
kubectl apply -f <probe-file>.yaml. Wait for the pod to becomeReady. - Apply RBAC – Deploy the ServiceAccount, Role, and RoleBinding. Re‑attach the ServiceAccount to the pods if not already.
- Scale workers – Once the pod is stable, use the HPA or manually increase
replicas. Verify jobs are processed via the n8n UI (Executions → Queue).
Conclusion
The most common Kubernetes deployment errors in n8n queue mode stem from misaligned environment variables, unreachable Redis, insufficient resources, over‑eager health probes, and missing RBAC. By separating the web and worker roles, configuring the correct EXECUTIONS_PROCESS, ensuring Redis connectivity, provisioning adequate CPU/memory, tuning probes, and granting the proper permissions, you create a resilient, horizontally‑scalable queue‑mode deployment that handles production workloads without unexpected restarts. Apply the checklist and remediation workflow above, monitor the pods, and the n8n queue will run smoothly in any Kubernetes environment.



