# Deploying Apache Superset on Kubernetes (Helm): From Chaos to Production

## Introduction

Deploying Apache Superset on Kubernetes using the official Helm chart appears straightforward when following the documentation. In real-world environments, however, production deployments often expose issues across multiple layers — Helm dependency resolution, container image integrity, Python runtime behavior, database connectivity, and secret management.

This article walks through a real-world failure analysis, explains the root causes, and documents the production-ready deployment that supports:

* In-cluster PostgreSQL & Redis
    
* External PostgreSQL (e.g., AWS RDS) & External Redis
    
* Optional Kubernetes Secret–based credential injection
    

The final architecture is flexible, secure, and restart-safe.

---

## 1\. Problem Statement

We attempted to deploy Apache Superset on Kubernetes using the official Helm chart.

### Target Setup

* Apache Superset (Web + Celery Worker)
    
* PostgreSQL (metadata database)
    
* Redis (Celery broker and caching)
    
* Kubernetes
    
* Helm-based deployment
    
* Custom Superset image
    
* Optional external PostgreSQL (AWS RDS)
    
* Optional external Redis
    

### Expected Outcome

* Superset UI accessible
    
* Database migrations completed successfully
    
* Celery workers start without errors
    
* Stable across restarts
    
* Secure credential handling
    

### What Actually Happened

The deployment failed at multiple stages:

* Dependency image pull failures
    
* Python module errors inside the container
    
* Runtime package installation failures
    
* SECRET\_KEY validation error
    
* Database connectivity issues
    

This was a multi-layer failure — not a single misconfiguration.

---

## 2\. Issue #1: PostgreSQL and Redis Images Not Found

### Observed Error

```bash
ImagePullBackOff
Failed to pull image
not found
```

Both PostgreSQL and Redis pods failed to start.

### Root Cause

The Helm chart referenced specific image tags that were no longer available in the container registry.

Helm does not validate tag existence.  
Kubernetes only detects the failure during image pull.

Until dependencies are healthy:

* Superset init job cannot complete
    
* Application errors remain hidden
    
* Debugging becomes misleading
    

Infrastructure must be stable before diagnosing application issues.

---

## 3\. Fix #1: Diagnostic Use of `latest`

### To confirm whether the issue was caused by deprecated image tags or application logic, dependency images were temporarily switched to `latest`.

```bash
postgresql:
  image:
    tag: latest

redis:
  image:
    tag: latest
```

This confirmed:

* The Helm chart’s default tags were deprecated.
    
* The infrastructure was blocking deployment.
    
* Superset itself was not the initial issue.
    

⚠ The `latest` tag was used only for diagnostics.  
In production environments, pinned image versions are recommended for deterministic deployments.

Once dependencies were running, the real application error surfaced.

---

## 4\. Issue #2: psycopg2 Module Missing

Superset failed with:

```bash
ModuleNotFoundError: No module named 'psycopg2'
```

This affected:

* Superset Web pod
    
* Superset Worker pod
    
* Superset Init DB job
    

---

## 5\. Why This Breaks Superset

Superset requires a metadata database.

Dependency chain:

```bash
Superset → SQLAlchemy → psycopg2 → PostgreSQL
```

If psycopg2 is missing:

* Superset cannot start
    
* Database migrations fail
    
* Celery workers fail
    
* No fallback mode exists
    

---

## 6\. Why Runtime Installation Failed

Attempts included:

* `extraPipPackages`
    
* `bootstrapScript`
    
* Installing packages inside running pods
    
* Init container installation
    

All failed.

### Root Cause

The official Superset image runs inside a prebuilt Python virtual environment:

```bash
/app/.venv/
```

Key details:

* Superset executes strictly inside this environment.
    
* Runtime installations either failed.
    
* Or installed packages outside the active environment.
    
* Container immutability was violated.
    

Even when psycopg2 appeared installed, it was outside Superset’s active virtual environment — making it effectively unusable.

---

## 7\. Correct Fix: Build a Custom Immutable Superset Image

Database drivers must be installed at image build time.

### Dockerfile Used

```bash
FROM apachesuperset.docker.scarf.sh/apache/superset:3.0.0

USER root

RUN apt-get update && apt-get install -y libpq-dev gcc \
 && /app/.venv/bin/python -m ensurepip --upgrade \
 && /app/.venv/bin/python -m pip install --no-cache-dir psycopg2==2.9.9

USER superset
```

### Why This Works

* Installs psycopg2 inside Superset’s active virtual environment
    
* Immutable and reproducible
    
* Restart-safe
    
* Production aligned
    

---

### 8\. Flexible Credential Management

Superset supports multiple ways to provide database and Redis credentials.

### Option A – Directly in Helm Values (Testing Only)

```bash
supersetNode:
  connections:
    db_type: postgresql
    db_host: my-db-endpoint
    db_port: "5432"
    db_user: superset
    db_pass: superset123
    db_name: superset
```

Suitable for:

* Local testing
    
* Temporary debugging
    
* Learning environments
    

⚠ Credentials stored in plaintext.

---

### Option B – Kubernetes Secret Injection (Recommended)

Instead of storing credentials in Helm values, they can be injected securely.

### Create Secret

```bash
kubectl create secret generic superset-backend-secret \
  --from-literal=DB_HOST=<db-endpoint> \
  --from-literal=DB_PORT=5432 \
  --from-literal=DB_USER=<db-user> \
  --from-literal=DB_PASSWORD=<db-password> \
  --from-literal=DB_NAME=<db-name> \
  --from-literal=REDIS_HOST=<redis-endpoint> \
  --from-literal=REDIS_PORT=6379
```

### Reference Secret in Helm Values

```bash
envFromSecrets:
  - superset-backend-secret
```

Superset connections then use environment variables:

```bash
supersetNode:
  connections:
    db_type: postgresql
    db_host: "$(DB_HOST)"
    db_port: "$(DB_PORT)"
    db_user: "$(DB_USER)"
    db_pass: "$(DB_PASSWORD)"
    db_name: "$(DB_NAME)"
    redis_host: "$(REDIS_HOST)"
    redis_port: "$(REDIS_PORT)"
```

Benefits:

* No plaintext credentials in Git
    
* Secure runtime injection
    
* Easier rotation
    
* Environment portability
    

Using Kubernetes Secrets is optional but strongly recommended for production.

---

## 9\. Database & Redis Architecture Options

Superset supports two architectural modes.

---

## Option 1 – In-Cluster PostgreSQL & Redis

Enable Helm-managed dependencies:

```bash
postgresql:
  enabled: true

redis:
  enabled: true
```

Best for:

* Development
    
* Testing
    
* Small internal tools
    

Pros:

* Simple
    
* Self-contained
    

Cons:

* You manage backups
    
* You manage scaling
    
* Higher operational overhead
    

---

## Option 2 – External PostgreSQL & Redis (Optional)

Disable internal services:

```bash
postgresql:
  enabled: false

redis:
  enabled: false
```

Best for:

* Production
    
* High availability needs
    
* Managed backups
    
* Reduced operational risk
    

Pros:

* Managed durability
    
* Better reliability
    
* Clear stateless/stateful separation
    

External services are optional — the deployment remains flexible.

> The final production architecture is designed to support both Helm-managed in-cluster stateful services and externally managed database/cache services (such as AWS RDS and ElastiCache), ensuring operational flexibility and scalability across environments.

---

## 10\. Enforcing SSL for Database Connections

```bash
import os

SQLALCHEMY_DATABASE_URI = (
    f"postgresql+psycopg2://{os.environ['DB_USER']}:{os.environ['DB_PASSWORD']}"
    f"@{os.environ['DB_HOST']}:{os.environ['DB_PORT']}/{os.environ['DB_NAME']}"
    "?sslmode=require"
)
```

Ensures encrypted communication with PostgreSQL.

---

## 11\. Startup Readiness Handling

### Init containers wait for DB and Redis:

```bash
command:
  - dockerize
  - -wait
  - tcp://$(DB_HOST):$(DB_PORT)
  - -wait
  - tcp://$(REDIS_HOST):$(REDIS_PORT)
  - -timeout
  - 120s
```

Prevents:

* CrashLoopBackOff
    
* Early DB connection failures
    
* Celery startup issues
    

---

## 12\. Secure SUPERSET\_SECRET\_KEY

```bash
extraSecretEnv:
  SUPERSET_SECRET_KEY: <strong-random-secret>
```

Superset refuses to start without a secure secret key.

---

## 13\. Final Deployment

```bash
helm upgrade --install superset apache/superset \
  -f values.yaml \
  --namespace superset \
  --create-namespace
```

---

## 14\. Final Root Cause Summary

Deployment failed due to:

* Deprecated dependency image tags
    
* Missing psycopg2 driver in container
    
* Runtime package installation is incompatible with Superset’s virtual environment
    
* Missing secure SECRET\_KEY
    

Resolution involved:

* Diagnosing infrastructure image failures
    
* Building a custom immutable Superset image
    
* Securely injecting credentials
    
* Supporting flexible DB/Redis architecture
    
* Enforcing SSL
    
* Implementing readiness checks
    

---

## 15\. 30-Second Summary

Apache Superset initially failed due to deprecated dependency image tags and a missing PostgreSQL driver inside the container. Runtime installation failed because the official Superset image runs inside a prebuilt Python virtual environment, making post-start package installation ineffective. The issue was resolved by building a custom immutable image with psycopg2 installed at build time, securely managing credentials, and supporting both in-cluster and external database/Redis architectures. The final deployment is stable, secure, and production-ready.

**Keywords:**  
Apache Superset Kubernetes,  
Superset Helm Chart,  
Superset Production Deployment,  
psycopg2 error in Superset,  
Kubernetes ImagePullBackOff,  
Superset with AWS RDS,  
Superset External PostgreSQL,  
Superset Redis configuration
