Skip to main content

Command Palette

Search for a command to run...

Migrating from Amazon Linux 2 to Amazon Linux 2023

Updated
5 min read
Migrating from Amazon Linux 2 to Amazon Linux 2023

A Practical Production Playbook (200 Instance Scenario)

Amazon Linux 2 (AL2) will reach end-of-support on June 30, 2026. After that date, AWS will no longer provide security updates, patches, or new packages.

Although AL2 continues to receive maintenance updates today, it is no longer the forward-looking platform. Amazon Linux 2023 (AL2023) is the long-term replacement, offering a predictable 5-year lifecycle per release (2 years standard + 3 years maintenance) along with modernized system components.

If you are running production workloads on AL2, migration should be planned early — not rushed near 2026.

This article explains how to safely migrate in a large production environment with:

  • Standalone EC2 instances

  • Auto Scaling Groups (ASG)

  • Amazon EKS worker nodes

Assume a worst-case environment of 200 AL2 instances.


There is no supported in-place upgrade path from AL2 to AL2023.

AL2023 introduces:

  • Updated kernel

  • Newer system libraries

  • Updated OpenSSL and crypto policies

  • cgroup v2 by default

  • Updated container runtime stack

Attempting in-place OS mutation:

  • Is unsupported

  • Is difficult to test

  • Has no clean rollback

  • Is unsafe for Kubernetes worker nodes

The correct production pattern is:

Build new → Validate → Controlled cutover → Preserve rollback → Decommission old


Example Production Layout (200 Instances)

Workload TypeCount
Standalone EC2 (stateful)40
Auto Scaling Groups (stateless)120
EKS worker nodes40
Total200

Each workload type requires a different strategy.


1. Standalone EC2 (Stateful Systems)

Typical Examples

  • Databases

  • Legacy applications

  • EC2 instances using Elastic IP

  • Applications dependent on local storage


Step 1 – Create a Safety Checkpoint

Before any migration:

  • Create EBS snapshots

  • Confirm snapshot completion

This is your rollback baseline.


Step 2 – Launch Parallel AL2023 Instance

Create a new EC2 instance with:

  • Same VPC and subnet

  • Same security groups

  • Same IAM role

  • Same instance type

Do not modify the AL2 instance.


Step 3 – Validate Dependency Compatibility

Test explicitly:

  • Runtime versions (Java, Python, Node)

  • OpenSSL behavior

  • Crypto policy differences

  • Systemd services

  • Hardcoded paths in custom scripts

Small OS-level changes can break production services.


Step 4 – Controlled Data Migration

  1. Initial sync while application is running

  2. Stop or freeze writes

  3. Final sync with checksum validation

Example:

rsync -avh --checksum --numeric-ids /data/ new-server:/data/

For database systems, prefer logical dump/restore over raw filesystem copy to avoid corruption risk.


Step 5 – Validate Before Cutover

Start the application on AL2023 and verify:

  • Application logs

  • Health endpoints

  • Downstream connectivity

  • Resource utilization

Only after validation:

  • Stop service on AL2

  • Switch Elastic IP or DNS

Keep AL2 intact until full confidence.


2. Auto Scaling Groups (Stateless Systems)

Typical Examples

  • Web servers

  • APIs

  • Microservices

The main risk is pushing a broken AMI to the entire fleet.


Step 1 – Build an AL2023 Golden AMI

Include:

  • Monitoring agents

  • Security agents

  • Logging agents

  • Application bootstrap scripts

Test full userdata execution.
Simulate instance termination to confirm auto-recovery.


Step 2 – Create New Launch Template Version

Update only:

  • AMI ID

Keep AL2 template available for rollback.


Step 3 – Canary Deployment

Increase desired capacity by 1.

Validate:

  • Load balancer health checks

  • Application startup

  • Logs and error rate

  • Metrics stability

Do not skip canary testing.


Step 4 – Controlled Instance Refresh

Use safe rollout configuration:

  • Configure minimum healthy percentage appropriate for fleet size (e.g., 90–100% for large fleets).

  • Warm-up time configured

  • ELB health checks enabled

Monitor closely during rollout.

If instability occurs:

  • Cancel refresh

  • Revert launch template


3. Amazon EKS Worker Nodes (Highest Blast Radius)

AL2023 introduces changes that can affect Kubernetes workloads:

  • cgroup v2

  • Updated kernel

  • Updated container runtime

This can impact:

  • DaemonSets

  • Monitoring agents

  • Security tooling

  • CNI plugins


Safe EKS Migration Flow

Step 1 – Add AL2023 Node Group

Create a new managed node group (or Karpenter pool).
Do not modify AL2 nodes yet.


Step 2 – Taint AL2 Nodes

kubectl taint node <al2-node> os=al2:NoSchedule

Effect:

  • No new pods schedule on AL2

  • Existing pods continue running


Step 3 – Validate Scheduling on AL2023

Scale workloads or deploy new services.

Confirm:

  • Pods schedule on AL2023

  • Networking functions correctly

  • Metrics and logs flow normally


Step 4 – Validate Cluster Add-ons and DaemonSets

Check:

  • VPC CNI

  • CoreDNS

  • kube-proxy

  • Logging agents

  • Monitoring agents

  • Security tools

Ensure cluster add-ons (VPC CNI, CoreDNS, kube-proxy) versions are compatible with AL2023 node AMIs before rollout.

Also verify PodDisruptionBudgets:

kubectl get pdb -A

Ignoring PDBs can cause draining failures or partial outages.


Step 5 – Drain AL2 Nodes

kubectl drain <node> --ignore-daemonsets

Observe workload behavior during drain.


Step 6 – Delete AL2 Node Group

Delete only after full stability confirmation.

Rollback is possible only until deletion.


Common Production Mistakes

  • Attempting in-place OS upgrades

  • Skipping canary validation

  • Ignoring bootstrap script testing

  • Draining EKS nodes prematurely

  • Not snapshotting stateful systems

  • Removing rollback resources too early


Executive Summary

Amazon Linux 2 reaches end-of-support on June 30, 2026. Migration to Amazon Linux 2023 should follow immutable infrastructure principles. For stateful EC2, use parallel instances with validated data sync and controlled DNS cutover. For Auto Scaling Groups, roll out a new AMI using canary and guarded instance refresh. For EKS, introduce AL2023 nodes, prevent new scheduling on AL2, validate workloads and cluster add-ons, then drain and remove AL2 nodes after stability is confirmed. Maintain rollback until the final step.


Final Takeaway

This migration is not about replacing servers.
It is about maintaining production stability while upgrading the platform.

Plan early.
Validate carefully.
Preserve rollback.
Decommission only after confidence.

Ops Migration Playbooks

Part 5 of 6

Real production migration and incident playbooks focused on safe execution, root cause analysis, and rollback-first DevOps practices. Each post documents how real production issues were handled and fixed without downtime.

Up next

Deploying Apache Superset on Kubernetes (Helm): From Chaos to Production

Real-world failure analysis, custom image build, and secure production deployment with flexible DB architecture.

More from this blog

D

DevOps and Cloud Mastery Online - DevOps' World

34 posts