Stop 4 Kubernetes Deployment Errors in n8n Queue Mode

Step by Step Guide to solve n8n queue mode kubernetes deployment errors
Step by Step Guide to solve n8n queue mode kubernetes deployment errors

 


Who this is for: Kubernetes operators and DevOps engineers who run n8n in production and need a reliable, zero‑downtime queue‑mode deployment. We cover this in detail in the n8n Queue Mode Errors Guide.


Quick Diagnosis

Symptom Most‑likely cause One‑line fix
Pods stay Pending or CrashLoopBackOff N8N_QUEUE_MODE=true but EXECUTIONS_PROCESS=main Set EXECUTIONS_PROCESS=queue and add a dedicated worker deployment.
Workers never pick jobs Redis service name/port mismatch Align N8N_REDIS_HOST and N8N_REDIS_PORT with the actual Redis Service.
Workers are OOM‑killed CPU/Memory limits too low for queue processing Raise resources.limits to ≥ 500 MiB memory & ≥ 250 m CPU (adjust per load).
Liveness probe fails repeatedly Probe timeout < job start‑up time Increase initialDelaySeconds to 30 – 45 s and periodSeconds to 15 s.
RBAC errors in logs (Forbidden…) ServiceAccount missing get, list on ConfigMaps/Secrets Add a Role/RoleBinding that grants configmaps & secrets access.

Apply the step‑by‑step remediation workflow below to resolve any of the above errors in a production‑grade Kubernetes cluster.


1. Why a Separate Worker Deployment Matters ?

If you encounter any n8n queue mode ssl misconfiguration resolve them before continuing with the setup.

When N8N_QUEUE_MODE=true, the web server only enqueues execution payloads. A worker pod (or a set of workers) pulls jobs from Redis and runs them. Combining both roles in a single pod works for tiny workloads but fails under load.

Issue when combined Impact
Resource contention Web‑server memory spikes kill workers
Pod restarts affect all traffic A worker crash restarts the web container too
Horizontal scaling `replicas` affect both roles simultaneously

Best‑practice: Deploy n8n-web and n8n-worker as independent Deployments. Run at least **two worker replicas** behind a PodDisruptionBudget for zero‑downtime processing.


2. Misconfiguration #1 – Wrong EXECUTIONS_PROCESS Value

What happens ?

If EXECUTIONS_PROCESS stays main while N8N_QUEUE_MODE=true, the web pod still tries to execute jobs locally, causing duplicate execution errors and rapid OOM kills.

Required environment variables

Variable Value
N8N_QUEUE_MODE true
EXECUTIONS_PROCESS queue
EXECUTIONS_WORKER_COUNT 1 (or higher)

Web deployment – part 1 (metadata & selector)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: n8n-web
spec:
  replicas: 1
  selector:
    matchLabels:
      app: n8n-web

Web deployment – part 2 (pod template)

  template:
    metadata:
      labels:
        app: n8n-web
    spec:
      serviceAccountName: n8n-sa

Web deployment – part 3 (container & env)

      containers:
        - name: n8n
          image: n8nio/n8n:latest
          env:
            - name: N8N_QUEUE_MODE
              value: "true"
            - name: EXECUTIONS_PROCESS
              value: "queue"
            - name: N8N_REDIS_HOST
              value: "n8n-redis"
            - name: N8N_REDIS_PORT
              value: "6379"

Web deployment – part 4 (ports & resources)

          ports:
            - containerPort: 5678
          resources:
            limits:
              memory: "512Mi"
              cpu: "250m"

EEFA warning: Never set EXECUTIONS_PROCESS=main in a pod where N8N_QUEUE_MODE=true. The conflict generates “Execution already in progress” errors that are hard to debug. If you encounter any n8n queue mode docker compose issues resolve them before continuing with the setup.


3. Misconfiguration #2 – Redis Service Not Reachable

Typical symptom

[2023-10-01 12:34:56] Error: connect ECONNREFUSED 10.96.0.12:6379

Common root causes

Cause Typical mistake
Service name typo N8N_REDIS_HOST=n8n-redis-svc while Service is n8n-redis
Port mismatch Redis runs on 6380 (TLS) but pod uses default 6379
Namespace mismatch Redis Service lives in infra namespace, web pod in default

Create a ClusterIP Redis Service (same namespace)

apiVersion: v1
kind: Service
metadata:
  name: n8n-redis
spec:
  selector:
    app: n8n-redis
  ports:
    - port: 6379
      targetPort: 6379
      protocol: TCP

Cross‑namespace reference (if Redis is elsewhere)

- name: N8N_REDIS_HOST
  value: "n8n-redis.infra.svc.cluster.local"

EEFA tip: Enable Redis TLS in production (N8N_REDIS_TLS=true) and mount the CA cert as a secret. Reference it with N8N_REDIS_TLS_CA_CERT.


4. Misconfiguration #3 – Resource Limits Trigger OOM Kills

Why it matters

Queue workers often need more memory than the web container because they load external APIs, run heavy transformations, and keep large payloads in memory.

Recommended CPU resources

Container CPU request CPU limit
n8n-web 100m 250m
n8n-worker 250m 500m

Recommended Memory resources

Container Memory request Memory limit
n8n-web 256Mi 512Mi
n8n-worker 512Mi 1Gi

Worker deployment – part 1 (metadata & replicas)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: n8n-worker
spec:
  replicas: 2
  selector:
    matchLabels:
      app: n8n-worker

Worker deployment – part 2 (pod template)

  template:
    metadata:
      labels:
        app: n8n-worker
    spec:
      serviceAccountName: n8n-sa

Worker deployment – part 3 (container, env & resources)

      containers:
        - name: n8n
          image: n8nio/n8n:latest
          env:
            - name: N8N_QUEUE_MODE
              value: "true"
            - name: EXECUTIONS_PROCESS
              value: "worker"
            - name: N8N_REDIS_HOST
              value: "n8n-redis"
          resources:
            requests:
              cpu: "250m"
              memory: "512Mi"
            limits:
              cpu: "500m"
              memory: "1Gi"

HorizontalPodAutoscaler for workers (4 lines)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: n8n-worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: n8n-worker
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

EEFA caution: A memory limit lower than the worker’s peak usage triggers OOMKilled events that appear as “CrashLoopBackOff”. Monitor container_memory_working_set_bytes in Prometheus and raise the limit before hitting the threshold.


5. Misconfiguration #4 – Probes Too Aggressive

Typical failure

Readiness probe failed: Get http://10.244.1.5:5678/healthz: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

Reason

Queue workers need a few seconds to bootstrap the Redis client and load the execution queue. A probe that starts at 5s with a 2s timeout kills the pod before it is ready.

Proven probe configuration (4‑line snippet)

livenessProbe:
  httpGet:
    path: /healthz
    port: 5678
  initialDelaySeconds: 30
  periodSeconds: 15
  timeoutSeconds: 5
readinessProbe:
  httpGet:
    path: /healthz
    port: 5678
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5

EEFA note: If your worker image does not expose /healthz, replace the HTTP probe with a TCP check on the Redis port or an exec probe that runs pgrep -f "worker".


6. Misconfiguration #5 – Insufficient RBAC for ConfigMaps & Secrets

Error snippet

Error: EACCES: permission denied, getaddrinfo ENOTFOUND redis

The underlying cause is often the worker’s ServiceAccount lacking permission to read the Redis credentials stored in a Secret.

Minimal RBAC objects (split for readability)

ServiceAccount

apiVersion: v1
kind: ServiceAccount
metadata:
  name: n8n-sa

Role (granting ConfigMap & Secret read)

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: n8n-role
rules:
  - apiGroups: [""]
    resources: ["configmaps", "secrets"]
    verbs: ["get", "list"]

RoleBinding (attach role to ServiceAccount)

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: n8n-rb
subjects:
  - kind: ServiceAccount
    name: n8n-sa
roleRef:
  kind: Role
  name: n8n-role
  apiGroup: rbac.authorization.k8s.io

Attach serviceAccountName: n8n-sa to both the web and worker pods.

EEFA reminder: For clusters using PodSecurityPolicies or OPA Gatekeeper, ensure the ServiceAccount is allowed to run as UID 1000 (the default n8n UID).


7. Diagnostic Checklist – Quick Copy‑Paste for On‑Call Engineers

Check Command / Manifest Expected result
Queue mode env vars set correctly kubectl exec -ti <web-pod> — printenv | grep N8N_ N8N_QUEUE_MODE=true and EXECUTIONS_PROCESS=queue
Redis reachable from pod kubectl exec -ti <worker-pod> — nc -zv n8n-redis 6379 Connection to n8n-redis 6379 port [tcp/*] succeeded!
Worker pod has enough memory kubectl top pod <worker-pod> MEMORY ≤ memory.limit
Probes are not failing kubectl describe pod <worker-pod> No Liveness/Readiness failures in events
ServiceAccount can read secret kubectl auth can-i get secret n8n-redis-secret –as=system:serviceaccount:default:n8n-sa yes

8. Step‑by‑Step Remediation Workflow

  1. Validate environment variables – Run the env‑check command above. Edit the Deployment if any variable is missing or wrong, then kubectl apply -f <file>.yaml.
  2. Test Redis connectivity – Use nc or redis-cli. If unreachable, verify Service name, namespace, and DNS (nslookup n8n-redis).
  3. Adjust resources – Increase resources.limits in the worker Deployment, then kubectl rollout restart deployment/n8n-worker.
  4. Tune probes – Apply the probe snippet, then kubectl apply -f <probe-file>.yaml. Wait for the pod to become Ready.
  5. Apply RBAC – Deploy the ServiceAccount, Role, and RoleBinding. Re‑attach the ServiceAccount to the pods if not already.
  6. Scale workers – Once the pod is stable, use the HPA or manually increase replicas. Verify jobs are processed via the n8n UI (Executions → Queue).

Conclusion

The most common Kubernetes deployment errors in n8n queue mode stem from misaligned environment variables, unreachable Redis, insufficient resources, over‑eager health probes, and missing RBAC. By separating the web and worker roles, configuring the correct EXECUTIONS_PROCESS, ensuring Redis connectivity, provisioning adequate CPU/memory, tuning probes, and granting the proper permissions, you create a resilient, horizontally‑scalable queue‑mode deployment that handles production workloads without unexpected restarts. Apply the checklist and remediation workflow above, monitor the pods, and the n8n queue will run smoothly in any Kubernetes environment.

Leave a Comment

Your email address will not be published. Required fields are marked *