Skip to main content

Command Palette

Search for a command to run...

Deploying Apache Superset on Kubernetes (Helm): From Chaos to Production

Real-world failure analysis, custom image build, and secure production deployment with flexible DB architecture.

Updated
6 min read
Deploying Apache Superset on Kubernetes (Helm): From Chaos to Production

Introduction

Deploying Apache Superset on Kubernetes using the official Helm chart appears straightforward when following the documentation. In real-world environments, however, production deployments often expose issues across multiple layers — Helm dependency resolution, container image integrity, Python runtime behavior, database connectivity, and secret management.

This article walks through a real-world failure analysis, explains the root causes, and documents the production-ready deployment that supports:

  • In-cluster PostgreSQL & Redis

  • External PostgreSQL (e.g., AWS RDS) & External Redis

  • Optional Kubernetes Secret–based credential injection

The final architecture is flexible, secure, and restart-safe.


1. Problem Statement

We attempted to deploy Apache Superset on Kubernetes using the official Helm chart.

Target Setup

  • Apache Superset (Web + Celery Worker)

  • PostgreSQL (metadata database)

  • Redis (Celery broker and caching)

  • Kubernetes

  • Helm-based deployment

  • Custom Superset image

  • Optional external PostgreSQL (AWS RDS)

  • Optional external Redis

Expected Outcome

  • Superset UI accessible

  • Database migrations completed successfully

  • Celery workers start without errors

  • Stable across restarts

  • Secure credential handling

What Actually Happened

The deployment failed at multiple stages:

  • Dependency image pull failures

  • Python module errors inside the container

  • Runtime package installation failures

  • SECRET_KEY validation error

  • Database connectivity issues

This was a multi-layer failure — not a single misconfiguration.


2. Issue #1: PostgreSQL and Redis Images Not Found

Observed Error

ImagePullBackOff
Failed to pull image
not found

Both PostgreSQL and Redis pods failed to start.

Root Cause

The Helm chart referenced specific image tags that were no longer available in the container registry.

Helm does not validate tag existence.
Kubernetes only detects the failure during image pull.

Until dependencies are healthy:

  • Superset init job cannot complete

  • Application errors remain hidden

  • Debugging becomes misleading

Infrastructure must be stable before diagnosing application issues.


3. Fix #1: Diagnostic Use of latest

To confirm whether the issue was caused by deprecated image tags or application logic, dependency images were temporarily switched to latest.

postgresql:
  image:
    tag: latest

redis:
  image:
    tag: latest

This confirmed:

  • The Helm chart’s default tags were deprecated.

  • The infrastructure was blocking deployment.

  • Superset itself was not the initial issue.

⚠ The latest tag was used only for diagnostics.
In production environments, pinned image versions are recommended for deterministic deployments.

Once dependencies were running, the real application error surfaced.


4. Issue #2: psycopg2 Module Missing

Superset failed with:

ModuleNotFoundError: No module named 'psycopg2'

This affected:

  • Superset Web pod

  • Superset Worker pod

  • Superset Init DB job


5. Why This Breaks Superset

Superset requires a metadata database.

Dependency chain:

Superset → SQLAlchemy → psycopg2 → PostgreSQL

If psycopg2 is missing:

  • Superset cannot start

  • Database migrations fail

  • Celery workers fail

  • No fallback mode exists


6. Why Runtime Installation Failed

Attempts included:

  • extraPipPackages

  • bootstrapScript

  • Installing packages inside running pods

  • Init container installation

All failed.

Root Cause

The official Superset image runs inside a prebuilt Python virtual environment:

/app/.venv/

Key details:

  • Superset executes strictly inside this environment.

  • Runtime installations either failed.

  • Or installed packages outside the active environment.

  • Container immutability was violated.

Even when psycopg2 appeared installed, it was outside Superset’s active virtual environment — making it effectively unusable.


7. Correct Fix: Build a Custom Immutable Superset Image

Database drivers must be installed at image build time.

Dockerfile Used

FROM apachesuperset.docker.scarf.sh/apache/superset:3.0.0

USER root

RUN apt-get update && apt-get install -y libpq-dev gcc \
 && /app/.venv/bin/python -m ensurepip --upgrade \
 && /app/.venv/bin/python -m pip install --no-cache-dir psycopg2==2.9.9

USER superset

Why This Works

  • Installs psycopg2 inside Superset’s active virtual environment

  • Immutable and reproducible

  • Restart-safe

  • Production aligned


8. Flexible Credential Management

Superset supports multiple ways to provide database and Redis credentials.

Option A – Directly in Helm Values (Testing Only)

supersetNode:
  connections:
    db_type: postgresql
    db_host: my-db-endpoint
    db_port: "5432"
    db_user: superset
    db_pass: superset123
    db_name: superset

Suitable for:

  • Local testing

  • Temporary debugging

  • Learning environments

⚠ Credentials stored in plaintext.


Instead of storing credentials in Helm values, they can be injected securely.

Create Secret

kubectl create secret generic superset-backend-secret \
  --from-literal=DB_HOST=<db-endpoint> \
  --from-literal=DB_PORT=5432 \
  --from-literal=DB_USER=<db-user> \
  --from-literal=DB_PASSWORD=<db-password> \
  --from-literal=DB_NAME=<db-name> \
  --from-literal=REDIS_HOST=<redis-endpoint> \
  --from-literal=REDIS_PORT=6379

Reference Secret in Helm Values

envFromSecrets:
  - superset-backend-secret

Superset connections then use environment variables:

supersetNode:
  connections:
    db_type: postgresql
    db_host: "$(DB_HOST)"
    db_port: "$(DB_PORT)"
    db_user: "$(DB_USER)"
    db_pass: "$(DB_PASSWORD)"
    db_name: "$(DB_NAME)"
    redis_host: "$(REDIS_HOST)"
    redis_port: "$(REDIS_PORT)"

Benefits:

  • No plaintext credentials in Git

  • Secure runtime injection

  • Easier rotation

  • Environment portability

Using Kubernetes Secrets is optional but strongly recommended for production.


9. Database & Redis Architecture Options

Superset supports two architectural modes.


Option 1 – In-Cluster PostgreSQL & Redis

Enable Helm-managed dependencies:

postgresql:
  enabled: true

redis:
  enabled: true

Best for:

  • Development

  • Testing

  • Small internal tools

Pros:

  • Simple

  • Self-contained

Cons:

  • You manage backups

  • You manage scaling

  • Higher operational overhead


Option 2 – External PostgreSQL & Redis (Optional)

Disable internal services:

postgresql:
  enabled: false

redis:
  enabled: false

Best for:

  • Production

  • High availability needs

  • Managed backups

  • Reduced operational risk

Pros:

  • Managed durability

  • Better reliability

  • Clear stateless/stateful separation

External services are optional — the deployment remains flexible.

The final production architecture is designed to support both Helm-managed in-cluster stateful services and externally managed database/cache services (such as AWS RDS and ElastiCache), ensuring operational flexibility and scalability across environments.


10. Enforcing SSL for Database Connections

import os

SQLALCHEMY_DATABASE_URI = (
    f"postgresql+psycopg2://{os.environ['DB_USER']}:{os.environ['DB_PASSWORD']}"
    f"@{os.environ['DB_HOST']}:{os.environ['DB_PORT']}/{os.environ['DB_NAME']}"
    "?sslmode=require"
)

Ensures encrypted communication with PostgreSQL.


11. Startup Readiness Handling

Init containers wait for DB and Redis:

command:
  - dockerize
  - -wait
  - tcp://$(DB_HOST):$(DB_PORT)
  - -wait
  - tcp://$(REDIS_HOST):$(REDIS_PORT)
  - -timeout
  - 120s

Prevents:

  • CrashLoopBackOff

  • Early DB connection failures

  • Celery startup issues


12. Secure SUPERSET_SECRET_KEY

extraSecretEnv:
  SUPERSET_SECRET_KEY: <strong-random-secret>

Superset refuses to start without a secure secret key.


13. Final Deployment

helm upgrade --install superset apache/superset \
  -f values.yaml \
  --namespace superset \
  --create-namespace

14. Final Root Cause Summary

Deployment failed due to:

  • Deprecated dependency image tags

  • Missing psycopg2 driver in container

  • Runtime package installation is incompatible with Superset’s virtual environment

  • Missing secure SECRET_KEY

Resolution involved:

  • Diagnosing infrastructure image failures

  • Building a custom immutable Superset image

  • Securely injecting credentials

  • Supporting flexible DB/Redis architecture

  • Enforcing SSL

  • Implementing readiness checks


15. 30-Second Summary

Apache Superset initially failed due to deprecated dependency image tags and a missing PostgreSQL driver inside the container. Runtime installation failed because the official Superset image runs inside a prebuilt Python virtual environment, making post-start package installation ineffective. The issue was resolved by building a custom immutable image with psycopg2 installed at build time, securely managing credentials, and supporting both in-cluster and external database/Redis architectures. The final deployment is stable, secure, and production-ready.

Keywords:
Apache Superset Kubernetes,
Superset Helm Chart,
Superset Production Deployment,
psycopg2 error in Superset,
Kubernetes ImagePullBackOff,
Superset with AWS RDS,
Superset External PostgreSQL,
Superset Redis configuration

Ops Migration Playbooks

Part 6 of 6

Real production migration and incident playbooks focused on safe execution, root cause analysis, and rollback-first DevOps practices. Each post documents how real production issues were handled and fixed without downtime.

Start from the beginning

Cost Optimization with Planned Downtime Migrating an EBS-Backed StatefulSet from Multi-AZ to Single-AZ in Amazon EKS (Production Pattern)

This migration was performed on a production workload where cost reduction was prioritized over zone-level high availability.

More from this blog

D

DevOps and Cloud Mastery Online - DevOps' World

34 posts