Spot Instances + Kubernetes: Save 60-90% on Compute Without the Drama
The Math That Changes Everything
Spot instances cost 60-90% less than on-demand. For a team running 20 nodes on m5.xlarge ($0.192/hr on-demand), the savings are massive:
| Pricing Model | $/hr per node | 20 Nodes/Month | Annual Cost |
|---|---|---|---|
| On-Demand | $0.192 | $2,765 | $33,178 |
| Spot (avg 70% off) | $0.058 | $835 | $10,022 |
| Savings | $1,930/mo | $23,156/yr |
Twenty-three thousand dollars a year. From the same workload, on the same hardware. The catch? Spot instances can be interrupted with a 2-minute warning. But Kubernetes was literally designed for this kind of chaos. Let's make it work.
The Architecture: Spot-Friendly K8s Design
The golden rule: on-demand for the control plane and stateful workloads, spot for everything else.
┌─────────────────────────────────────────────┐
│ EKS Cluster │
├──────────────┬──────────────────────────────┤
│ On-Demand │ Spot Pools │
│ Node Group │ (multiple instance types) │
│ │ │
│ - System │ - Stateless apps │
│ - Databases │ - Web servers │
│ - Redis │ - Workers / queue consumers │
│ - Kafka │ - Batch jobs │
│ │ - CI/CD runners │
└──────────────┴──────────────────────────────┘
Step 1: Create Diversified Spot Node Groups
The number one mistake with spot is using a single instance type. AWS runs out of capacity for that type, and your whole fleet gets reclaimed. Diversify across instance types and availability zones.
Terraform EKS Managed Node Group
resource "aws_eks_node_group" "spot_workers" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "spot-workers"
node_role_arn = aws_iam_role.node.arn
subnet_ids = var.private_subnet_ids # Multiple AZs
capacity_type = "SPOT"
instance_types = [
"m5.xlarge",
"m5a.xlarge",
"m5d.xlarge",
"m6i.xlarge",
"m6a.xlarge",
"m5.2xlarge", # Overprovisioning is fine — K8s handles scheduling
"m5a.2xlarge",
]
scaling_config {
desired_size = 5
max_size = 15
min_size = 2
}
labels = {
"node-type" = "spot"
"workload-type" = "stateless"
}
taint {
key = "spot"
value = "true"
effect = "NO_SCHEDULE"
}
tags = {
"k8s.io/cluster-autoscaler/enabled" = "true"
}
}
# On-demand baseline for critical workloads
resource "aws_eks_node_group" "on_demand_baseline" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "on-demand-baseline"
node_role_arn = aws_iam_role.node.arn
subnet_ids = var.private_subnet_ids
capacity_type = "ON_DEMAND"
instance_types = ["m6i.xlarge"]
scaling_config {
desired_size = 3
max_size = 6
min_size = 2
}
labels = {
"node-type" = "on-demand"
}
}
Why seven instance types? Because spot availability varies by type. If m5.xlarge gets reclaimed in us-east-1a, your autoscaler can launch m6a.xlarge in us-east-1b. More options = more stability.
Step 2: Pod Configuration for Spot Resilience
Tolerations and Affinity
Pods that can handle interruption should tolerate the spot taint:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 4 # Always run multiple replicas on spot
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
spec:
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
affinity:
# Spread across nodes so one interruption doesn't kill all replicas
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values: ["api-server"]
topologyKey: "kubernetes.io/hostname"
# Prefer spot nodes to save money
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: node-type
operator: In
values: ["spot"]
terminationGracePeriodSeconds: 60
containers:
- name: api-server
image: myapp/api:v2.1.0
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1Gi"
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5 && /app/graceful-shutdown"]
Key points:
- 4 replicas minimum for spot workloads. If one node gets interrupted, you still serve traffic.
- Pod anti-affinity spreads replicas across nodes. One interruption event shouldn't take more than 25% of your capacity.
- preStop hook gives your app time to drain connections.
Pod Disruption Budgets
Non-negotiable for spot workloads:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: api-server-pdb
spec:
minAvailable: 2 # or use maxUnavailable: 1
selector:
matchLabels:
app: api-server
This tells Kubernetes: "Never voluntarily evict pods if it would drop us below 2 healthy replicas." It won't prevent spot interruptions, but it protects against overlapping disruptions from node scaling or upgrades.
Step 3: Handle Spot Interruptions Gracefully
AWS Node Termination Handler
This is mandatory. It watches for spot interruption notices and cordons/drains nodes before AWS reclaims them.
helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler eks/aws-node-termination-handler \
--namespace kube-system \
--set enableSpotInterruptionDraining=true \
--set enableRebalanceRecommendation=true \
--set enableScheduledEventDraining=true
When AWS signals an interruption (2 minutes before reclaim), the handler:
- Cordons the node (no new pods scheduled)
- Drains existing pods (respects PDBs and grace periods)
- Pods get rescheduled on healthy nodes
Rebalance Recommendations
AWS sometimes sends rebalance recommendations before an actual interruption — giving you even more time to migrate pods. The handler above already listens for these. Combined with the Cluster Autoscaler, it can proactively launch a replacement node before the original is reclaimed.
Step 4: Cluster Autoscaler Configuration
The autoscaler needs to understand your spot strategy:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
spec:
template:
spec:
containers:
- name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --v=4
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste # Pack nodes efficiently
- --balance-similar-node-groups=true # Spread across AZs
- --skip-nodes-with-system-pods=false
- --scale-down-utilization-threshold=0.5
- --scale-down-delay-after-add=5m
- --max-graceful-termination-sec=120
The balance-similar-node-groups flag is critical — it ensures your spot nodes are spread across AZs, so a capacity crunch in one zone doesn't nuke your entire fleet.
Step 5: Monitor Spot Interruption Rates
Track interruptions so you can tune your instance type mix:
# Query CloudTrail for spot interruption events in the last 30 days
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=BidEvictedEvent \
--start-time "2026-02-20T00:00:00Z" \
--end-time "2026-03-20T00:00:00Z" \
--query 'Events[].{Time:EventTime,Instance:Resources[0].ResourceName}' \
--output table
Healthy spot interruption rates by instance type:
| Interruption Rate | Assessment | Action |
|---|---|---|
| < 5% monthly | Excellent | Keep using this type |
| 5-10% monthly | Acceptable | Diversify more |
| 10-20% monthly | Concerning | Reduce reliance on this type |
| > 20% monthly | Too volatile | Drop from your instance mix |
What NOT to Run on Spot
Be honest about what can handle interruptions:
| Workload | Spot-Safe? | Why |
|---|---|---|
| Stateless web APIs | Yes | Multiple replicas, fast startup |
| Queue consumers | Yes | Messages re-queue on failure |
| Batch processing | Yes | Checkpointing handles restarts |
| CI/CD runners | Yes | Jobs retry automatically |
| Databases (RDS/self-managed) | No | Data loss risk, long recovery |
| Redis primary | No | In-memory state is lost |
| Kafka brokers | No | Partition rebalancing is slow |
| Singleton controllers | No | Single point of failure |
Real-World Cost Breakdown
Here's an actual cluster I optimized last quarter (anonymized, but real numbers):
| Component | Before (All On-Demand) | After (Spot + On-Demand) | Savings |
|---|---|---|---|
| System nodes (3x m6i.large) | $210/mo | $210/mo (on-demand) | $0 |
| Database nodes (2x r6i.xlarge) | $486/mo | $486/mo (on-demand) | $0 |
| API servers (8x m5.xlarge) | $1,106/mo | $332/mo (spot) | $774 |
| Workers (6x m5.xlarge) | $830/mo | $249/mo (spot) | $581 |
| CI runners (4x m5.xlarge) | $553/mo | $166/mo (spot) | $387 |
| Totals | $3,185/mo | $1,443/mo | $1,742/mo |
That's $20,904/year in savings, with zero downtime events caused by spot interruptions over the past 6 months. The node termination handler and proper pod anti-affinity did their job.
The Adoption Playbook
Don't go all-in on day one. Here's the rollout I recommend:
- Week 1: Deploy node termination handler. Add a small spot node group (2 nodes). Move CI/CD runners to spot.
- Week 2: Add PDBs to all stateless services. Move batch workers to spot.
- Week 3: Move stateless API replicas to spot (keep minimum on-demand). Monitor interruption rates.
- Week 4: Tune instance type diversity based on interruption data. Expand spot node group scaling limits.
- Ongoing: Review monthly. Add new instance types as AWS releases them. Graviton spot instances offer the deepest discounts —
m7g.xlargespot averages $0.041/hr vs $0.163/hr on-demand, a 75% discount.
Troubleshooting Spot Issues
Spot setups look clean on paper but break in specific, predictable ways. Here's what to watch for and how to fix it.
Pods Stuck in Pending After Interruption
When a spot node is reclaimed, pods get rescheduled. But if the Cluster Autoscaler can't provision a replacement node fast enough, or all spot capacity is exhausted, pods sit in Pending. Diagnose it:
# Check pending pods and their events
kubectl get pods --field-selector=status.phase=Pending -A
# Look at scheduler events for a specific pod
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Events:"
Common messages and fixes:
| Event Message | Cause | Fix |
|---|---|---|
no nodes available to schedule pods | All spot capacity exhausted | Add more instance types to your node group |
Insufficient cpu/memory | Existing nodes are full | Lower scale-down-utilization-threshold or increase max_size |
pod didn't tolerate taint | Missing spot toleration on the pod | Add the spot taint toleration to the pod spec |
Monitoring Spot Savings in Real Time
You need visibility into what you're actually saving. Deploy a Prometheus exporter that tracks spot vs. on-demand pricing:
# Use kubecost or kube-cost-exporter for cost metrics
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="<your-token>"
If you want a lightweight alternative without a full cost platform, export node labels and compute savings with a simple recording rule:
groups:
- name: spot-tracking
interval: 5m
rules:
# Count of spot vs on-demand nodes
- record: cluster:spot_nodes:count
expr: count(kube_node_labels{label_node_type="spot"})
- record: cluster:ondemand_nodes:count
expr: count(kube_node_labels{label_node_type="on-demand"})
# Spot node ratio — target > 60% for cost-optimized clusters
- record: cluster:spot_node_ratio
expr: |
cluster:spot_nodes:count
/
(cluster:spot_nodes:count + cluster:ondemand_nodes:count)
Alert if your spot ratio drops below your target, which signals capacity issues:
groups:
- name: spot-alerts
rules:
- alert: SpotNodeRatioLow
expr: cluster:spot_node_ratio < 0.5
for: 15m
labels:
severity: warning
annotations:
summary: "Spot node ratio dropped to {{ $value | humanizePercentage }}. Check for capacity issues or add instance types."
Graviton Spot: The Deepest Discounts
If you haven't tested ARM-based Graviton instances, you're leaving money on the table. Graviton spot prices are consistently 70-80% cheaper than on-demand x86 equivalents, and spot interruption rates tend to be lower because fewer teams compete for them.
Add Graviton instances to your spot node group with a multi-arch build strategy:
resource "aws_eks_node_group" "spot_graviton" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "spot-graviton"
node_role_arn = aws_iam_role.node.arn
subnet_ids = var.private_subnet_ids
ami_type = "AL2_ARM_64"
capacity_type = "SPOT"
instance_types = [
"m7g.xlarge",
"m7g.2xlarge",
"m6g.xlarge",
"m6g.2xlarge",
"c7g.xlarge",
"c7g.2xlarge",
]
scaling_config {
desired_size = 3
max_size = 10
min_size = 1
}
labels = {
"node-type" = "spot"
"arch" = "arm64"
"workload-type" = "stateless"
}
taint {
key = "spot"
value = "true"
effect = "NO_SCHEDULE"
}
}
Your container images need to be multi-arch. If you're using Docker Buildx, this is straightforward:
docker buildx build \
--platform linux/amd64,linux/arm64 \
--tag myapp/api:v2.1.0 \
--push .
Kubernetes schedules pods to matching architectures automatically. No changes to your Deployment manifests needed — just build multi-arch images and let the scheduler handle it.
Spot instances aren't risky. Running all your compute on on-demand when you don't have to — that's the real risk to your budget.
Related Articles
Related Articles
The Complete AWS Cost Optimization Playbook: Compute, Storage, Networking, and Reserved Capacity
A data-driven playbook for cutting AWS costs across compute, storage, networking, and reserved capacity with real numbers and actions.
AWS Lambda Cost Optimization: Memory Tuning, Provisioned Concurrency, and ARM
Cut your AWS Lambda costs by 40-70% with memory right-sizing, ARM/Graviton migration, and smart provisioned concurrency strategies.
Reserved Instances vs Savings Plans: Which to Buy When
A data-driven comparison of AWS Reserved Instances vs Savings Plans — with decision frameworks, break-even math, and real purchase recommendations.