From Provisioning to Control Plane: Designing a Hybrid Terraform + Crossplane Architecture at Scale

1. Overview
What I Designed
I designed a hybrid infrastructure architecture:
Terraform → Foundation Layer
Crossplane → Dynamic Lifecycle Layer
ArgoCD → GitOps Enforcement
This created a continuously reconciling cloud control plane inside Kubernetes.
Why It Was Required
Our platform crossed:
50+ microservices
Multiple engineering teams
Multi-region expansion
PR-driven infrastructure workflows
Feature branch–based short-lived environments
Terraform workflows became operationally slow due to:
State contention
Long plan times
PR bottlenecks
Manual drift detection
Provisioning was working.
Lifecycle control was missing.
Constraint
No full rewrite
No infrastructure instability
Zero data loss
Minimal migration risk
Engineering Principle Followed
Separate provisioning from lifecycle reconciliation.
Provision once.
Reconcile continuously.
2. Problem Statement
Existing Architecture
Terraform model:
Plan → Apply → Exit
After apply:
No continuous reconciliation
Drift detection only on next plan
Manual console changes remain undetected
Failure Risk
If an engineer modified:
RDS storage encryption
Deletion protection
Security groups
IAM policies
Terraform would not react until the next plan/apply cycle.
Drift became silent operational risk.
What Would Break
Compliance posture
Backup guarantees
Encryption enforcement
Network boundaries
Incident recovery confidence
Why It Was Unacceptable
At scale:
Manual governance does not work.
Infrastructure must enforce its declared state.
3. Architecture After Implementation
Control plane flow:
Developer commits YAML
↓
ArgoCD syncs to cluster
↓
Kubernetes API stores desired state
↓
Crossplane controller watches resource
↓
Crossplane calls AWS API
↓
Cloud resource created/updated
↑
Continuous reconciliation loop
Terraform foundation layer:
Terraform
↓
VPC
Subnets
EKS Control Plane
Core Networking
Clear separation of responsibilities.
4. Design Decisions
4.1 Core Component Choice
Terraform for foundation
Why I chose it:
Mature state handling
Strong bootstrap ecosystem
Clear isolation of foundational infrastructure
Multi-account governance was enforced via separate state isolation and account-factory patterns; Terraform itself does not natively provide org-level governance.
Trade-off:
- No continuous reconciliation
Risk accepted:
- Foundation changes are rare and tightly controlled
Crossplane for dynamic infrastructure
Why I chose it:
Kubernetes-native control loop
GitOps-friendly
CRD-based lifecycle management
Trade-off:
Adds API server load
Adds controller complexity
Risk accepted:
- Infrastructure lifecycle now depends on cluster health
4.2 Failure Detection
Reconciliation loop ensures:
Actual State == Desired State
Manual console change → Crossplane reconciles.
Trade-off:
AWS API throttling possible
Eventual consistency delays
Risk accepted:
- Tuned provider/controller concurrency and AWS API backoff settings to mitigate throttling
4.3 Event Routing
Git Commit
→ ArgoCD
→ Kubernetes API
→ Crossplane Controller
→ AWS API
Rollback = git revert.
Trade-off:
- Git becomes critical dependency
Risk accepted:
- Strong repository governance and PR controls
4.4 Automation Logic
Platform team defined Compositions.
Example:
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: xpostgres
spec:
compositeTypeRef:
apiVersion: platform.io/v1alpha1
kind: XPostgres
resources:
- name: database
base:
apiVersion: database.aws.crossplane.io/v1beta1
kind: RDSInstance
spec:
deletionPolicy: Orphan
forProvider:
storageEncrypted: true
deletionProtection: true
Application team used Claim:
apiVersion: platform.io/v1alpha1
kind: PostgresClaim
metadata:
name: app-db
spec:
parameters:
storage: 20
Why I chose this:
Central policy enforcement
Developer abstraction
Clear ownership boundary
Trade-off:
Composition update blast radius
Requires versioning discipline
Risk accepted:
- Versioned compositions per environment
4.5 DNS / Networking
DNS and core networking remained Terraform-managed.
Reason:
High blast radius
Low change frequency
Complex dependency graph
Control plane expansion was phased deliberately.
5. Implementation Snippet
Install Crossplane:
helm repo add crossplane-stable https://charts.crossplane.io/stable
helm install crossplane crossplane-stable/crossplane \
--namespace crossplane-system \
--create-namespace
Install AWS Provider:
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
name: provider-aws
spec:
package: xpkg.upbound.io/crossplane-contrib/provider-aws:v0.54.2
Configure ProviderConfig (IRSA recommended):
apiVersion: aws.crossplane.io/v1beta1
kind: ProviderConfig
metadata:
name: aws
spec:
credentials:
source: IRSA
ProviderConfig was configured using IRSA to avoid static credentials.
RDS Example:
apiVersion: database.aws.crossplane.io/v1beta1
kind: RDSInstance
metadata:
name: platform-db
spec:
deletionPolicy: Orphan
forProvider:
region: us-east-1
dbInstanceClass: db.t3.micro
allocatedStorage: 20
engine: postgres
storageEncrypted: true
deletionProtection: true
providerConfigRef:
name: aws
Assumes a default VPC and subnet group already exist; in hardened environments, explicit subnetGroupName and securityGroupIds must be specified.
6. Traffic / DNS Consideration
Database endpoints remained AWS-managed.
No DNS switching automated through Crossplane.
Reason:
Database endpoints are stable
DNS manipulation has high blast radius
Networking remained Terraform-owned
7. Validation Process
Step 1 – Manual Drift Simulation
Modified RDS parameter in AWS Console.
Observed:
Crossplane detected change
Reconciliation restored desired state
Step 2 – Deletion Policy Test
Deleted CR with:
deletionPolicy: Orphan
Observed:
Cloud resource retained
CR removed
Step 3 – Delete Mode Test
Changed to:
deletionPolicy: Delete
Deleted CR.
Observed:
- Cloud resource removed
Lifecycle behavior verified.
Step 4 – API Throttling Simulation
Created multiple resources in parallel.
Observed:
AWS API throttling errors
Provider retry with exponential backoff
Validated concurrency and backoff tuning necessity.
8. Cost & Trade-offs
Infrastructure Cost
Additional Crossplane controller pods
Increased etcd object count
Higher AWS API call volume
Cost impact: Moderate.
Operational Complexity
Increased:
Controller debugging
Composition versioning
CRD lifecycle management
Reduced:
Manual drift remediation
Terraform PR bottlenecks
Apply-time surprises
RTO
Improved.
Drift auto-corrected without manual intervention.
RPO
No direct change.
Depends on AWS-native backup policies.
Scaling Impact
Pros:
Safe self-service for app teams
Git-auditable infrastructure
Continuous compliance enforcement
Cons:
API server load increases
AWS rate limit sensitivity
Composition update blast radius
9. When This Design Makes Sense
✔️ 50+ services
✔️ Dedicated platform team
✔️ GitOps maturity
✔️ Kubernetes-native organization
✔️ High infrastructure churn
When NOT to Use It
Small teams
Low churn infrastructure
No Kubernetes maturity
Multi-account bootstrap phase
Extremely complex networking requirements
10. Final Takeaway
Terraform builds infrastructure.
Crossplane manages the infrastructure lifecycle.
GitOps enforces declared intent.
This was not a tool replacement exercise.
It was an architectural shift from:
Provisioning mindset → Control plane mindset
At scale, lifecycle enforcement matters more than provisioning speed.
Hybrid architecture made lifecycle enforcement operationally viable at scale.





