DevOpsil
Cloud Cost
90%
Fresh
Part 5 of 5 in Cloud Cost Cutting

Spot Instances + Kubernetes: Save 60-90% on Compute Without the Drama

Dev PatelDev Patel10 min read

The Math That Changes Everything

Spot instances cost 60-90% less than on-demand. For a team running 20 nodes on m5.xlarge ($0.192/hr on-demand), the savings are massive:

Pricing Model$/hr per node20 Nodes/MonthAnnual Cost
On-Demand$0.192$2,765$33,178
Spot (avg 70% off)$0.058$835$10,022
Savings$1,930/mo$23,156/yr

Twenty-three thousand dollars a year. From the same workload, on the same hardware. The catch? Spot instances can be interrupted with a 2-minute warning. But Kubernetes was literally designed for this kind of chaos. Let's make it work.

The Architecture: Spot-Friendly K8s Design

The golden rule: on-demand for the control plane and stateful workloads, spot for everything else.

┌─────────────────────────────────────────────┐
│                 EKS Cluster                  │
├──────────────┬──────────────────────────────┤
│  On-Demand   │         Spot Pools           │
│  Node Group  │  (multiple instance types)   │
│              │                              │
│  - System    │  - Stateless apps            │
│  - Databases │  - Web servers               │
│  - Redis     │  - Workers / queue consumers │
│  - Kafka     │  - Batch jobs                │
│              │  - CI/CD runners             │
└──────────────┴──────────────────────────────┘

Step 1: Create Diversified Spot Node Groups

The number one mistake with spot is using a single instance type. AWS runs out of capacity for that type, and your whole fleet gets reclaimed. Diversify across instance types and availability zones.

Terraform EKS Managed Node Group

resource "aws_eks_node_group" "spot_workers" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "spot-workers"
  node_role_arn   = aws_iam_role.node.arn
  subnet_ids      = var.private_subnet_ids  # Multiple AZs

  capacity_type = "SPOT"

  instance_types = [
    "m5.xlarge",
    "m5a.xlarge",
    "m5d.xlarge",
    "m6i.xlarge",
    "m6a.xlarge",
    "m5.2xlarge",   # Overprovisioning is fine — K8s handles scheduling
    "m5a.2xlarge",
  ]

  scaling_config {
    desired_size = 5
    max_size     = 15
    min_size     = 2
  }

  labels = {
    "node-type"    = "spot"
    "workload-type" = "stateless"
  }

  taint {
    key    = "spot"
    value  = "true"
    effect = "NO_SCHEDULE"
  }

  tags = {
    "k8s.io/cluster-autoscaler/enabled" = "true"
  }
}

# On-demand baseline for critical workloads
resource "aws_eks_node_group" "on_demand_baseline" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "on-demand-baseline"
  node_role_arn   = aws_iam_role.node.arn
  subnet_ids      = var.private_subnet_ids

  capacity_type  = "ON_DEMAND"
  instance_types = ["m6i.xlarge"]

  scaling_config {
    desired_size = 3
    max_size     = 6
    min_size     = 2
  }

  labels = {
    "node-type" = "on-demand"
  }
}

Why seven instance types? Because spot availability varies by type. If m5.xlarge gets reclaimed in us-east-1a, your autoscaler can launch m6a.xlarge in us-east-1b. More options = more stability.

Step 2: Pod Configuration for Spot Resilience

Tolerations and Affinity

Pods that can handle interruption should tolerate the spot taint:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 4  # Always run multiple replicas on spot
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
    spec:
      tolerations:
        - key: "spot"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"
      affinity:
        # Spread across nodes so one interruption doesn't kill all replicas
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchExpressions:
                    - key: app
                      operator: In
                      values: ["api-server"]
                topologyKey: "kubernetes.io/hostname"
        # Prefer spot nodes to save money
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 80
              preference:
                matchExpressions:
                  - key: node-type
                    operator: In
                    values: ["spot"]
      terminationGracePeriodSeconds: 60
      containers:
        - name: api-server
          image: myapp/api:v2.1.0
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "1000m"
              memory: "1Gi"
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 5 && /app/graceful-shutdown"]

Key points:

  • 4 replicas minimum for spot workloads. If one node gets interrupted, you still serve traffic.
  • Pod anti-affinity spreads replicas across nodes. One interruption event shouldn't take more than 25% of your capacity.
  • preStop hook gives your app time to drain connections.

Pod Disruption Budgets

Non-negotiable for spot workloads:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
spec:
  minAvailable: 2  # or use maxUnavailable: 1
  selector:
    matchLabels:
      app: api-server

This tells Kubernetes: "Never voluntarily evict pods if it would drop us below 2 healthy replicas." It won't prevent spot interruptions, but it protects against overlapping disruptions from node scaling or upgrades.

Step 3: Handle Spot Interruptions Gracefully

AWS Node Termination Handler

This is mandatory. It watches for spot interruption notices and cordons/drains nodes before AWS reclaims them.

helm repo add eks https://aws.github.io/eks-charts
helm install aws-node-termination-handler eks/aws-node-termination-handler \
  --namespace kube-system \
  --set enableSpotInterruptionDraining=true \
  --set enableRebalanceRecommendation=true \
  --set enableScheduledEventDraining=true

When AWS signals an interruption (2 minutes before reclaim), the handler:

  1. Cordons the node (no new pods scheduled)
  2. Drains existing pods (respects PDBs and grace periods)
  3. Pods get rescheduled on healthy nodes

Rebalance Recommendations

AWS sometimes sends rebalance recommendations before an actual interruption — giving you even more time to migrate pods. The handler above already listens for these. Combined with the Cluster Autoscaler, it can proactively launch a replacement node before the original is reclaimed.

Step 4: Cluster Autoscaler Configuration

The autoscaler needs to understand your spot strategy:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: cluster-autoscaler
  namespace: kube-system
spec:
  template:
    spec:
      containers:
        - name: cluster-autoscaler
          command:
            - ./cluster-autoscaler
            - --v=4
            - --cloud-provider=aws
            - --skip-nodes-with-local-storage=false
            - --expander=least-waste     # Pack nodes efficiently
            - --balance-similar-node-groups=true  # Spread across AZs
            - --skip-nodes-with-system-pods=false
            - --scale-down-utilization-threshold=0.5
            - --scale-down-delay-after-add=5m
            - --max-graceful-termination-sec=120

The balance-similar-node-groups flag is critical — it ensures your spot nodes are spread across AZs, so a capacity crunch in one zone doesn't nuke your entire fleet.

Step 5: Monitor Spot Interruption Rates

Track interruptions so you can tune your instance type mix:

# Query CloudTrail for spot interruption events in the last 30 days
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=EventName,AttributeValue=BidEvictedEvent \
  --start-time "2026-02-20T00:00:00Z" \
  --end-time "2026-03-20T00:00:00Z" \
  --query 'Events[].{Time:EventTime,Instance:Resources[0].ResourceName}' \
  --output table

Healthy spot interruption rates by instance type:

Interruption RateAssessmentAction
< 5% monthlyExcellentKeep using this type
5-10% monthlyAcceptableDiversify more
10-20% monthlyConcerningReduce reliance on this type
> 20% monthlyToo volatileDrop from your instance mix

What NOT to Run on Spot

Be honest about what can handle interruptions:

WorkloadSpot-Safe?Why
Stateless web APIsYesMultiple replicas, fast startup
Queue consumersYesMessages re-queue on failure
Batch processingYesCheckpointing handles restarts
CI/CD runnersYesJobs retry automatically
Databases (RDS/self-managed)NoData loss risk, long recovery
Redis primaryNoIn-memory state is lost
Kafka brokersNoPartition rebalancing is slow
Singleton controllersNoSingle point of failure

Real-World Cost Breakdown

Here's an actual cluster I optimized last quarter (anonymized, but real numbers):

ComponentBefore (All On-Demand)After (Spot + On-Demand)Savings
System nodes (3x m6i.large)$210/mo$210/mo (on-demand)$0
Database nodes (2x r6i.xlarge)$486/mo$486/mo (on-demand)$0
API servers (8x m5.xlarge)$1,106/mo$332/mo (spot)$774
Workers (6x m5.xlarge)$830/mo$249/mo (spot)$581
CI runners (4x m5.xlarge)$553/mo$166/mo (spot)$387
Totals$3,185/mo$1,443/mo$1,742/mo

That's $20,904/year in savings, with zero downtime events caused by spot interruptions over the past 6 months. The node termination handler and proper pod anti-affinity did their job.

The Adoption Playbook

Don't go all-in on day one. Here's the rollout I recommend:

  1. Week 1: Deploy node termination handler. Add a small spot node group (2 nodes). Move CI/CD runners to spot.
  2. Week 2: Add PDBs to all stateless services. Move batch workers to spot.
  3. Week 3: Move stateless API replicas to spot (keep minimum on-demand). Monitor interruption rates.
  4. Week 4: Tune instance type diversity based on interruption data. Expand spot node group scaling limits.
  5. Ongoing: Review monthly. Add new instance types as AWS releases them. Graviton spot instances offer the deepest discounts — m7g.xlarge spot averages $0.041/hr vs $0.163/hr on-demand, a 75% discount.

Troubleshooting Spot Issues

Spot setups look clean on paper but break in specific, predictable ways. Here's what to watch for and how to fix it.

Pods Stuck in Pending After Interruption

When a spot node is reclaimed, pods get rescheduled. But if the Cluster Autoscaler can't provision a replacement node fast enough, or all spot capacity is exhausted, pods sit in Pending. Diagnose it:

# Check pending pods and their events
kubectl get pods --field-selector=status.phase=Pending -A

# Look at scheduler events for a specific pod
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Events:"

Common messages and fixes:

Event MessageCauseFix
no nodes available to schedule podsAll spot capacity exhaustedAdd more instance types to your node group
Insufficient cpu/memoryExisting nodes are fullLower scale-down-utilization-threshold or increase max_size
pod didn't tolerate taintMissing spot toleration on the podAdd the spot taint toleration to the pod spec

Monitoring Spot Savings in Real Time

You need visibility into what you're actually saving. Deploy a Prometheus exporter that tracks spot vs. on-demand pricing:

# Use kubecost or kube-cost-exporter for cost metrics
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="<your-token>"

If you want a lightweight alternative without a full cost platform, export node labels and compute savings with a simple recording rule:

groups:
  - name: spot-tracking
    interval: 5m
    rules:
      # Count of spot vs on-demand nodes
      - record: cluster:spot_nodes:count
        expr: count(kube_node_labels{label_node_type="spot"})

      - record: cluster:ondemand_nodes:count
        expr: count(kube_node_labels{label_node_type="on-demand"})

      # Spot node ratio — target > 60% for cost-optimized clusters
      - record: cluster:spot_node_ratio
        expr: |
          cluster:spot_nodes:count
          /
          (cluster:spot_nodes:count + cluster:ondemand_nodes:count)

Alert if your spot ratio drops below your target, which signals capacity issues:

groups:
  - name: spot-alerts
    rules:
      - alert: SpotNodeRatioLow
        expr: cluster:spot_node_ratio < 0.5
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "Spot node ratio dropped to {{ $value | humanizePercentage }}. Check for capacity issues or add instance types."

Graviton Spot: The Deepest Discounts

If you haven't tested ARM-based Graviton instances, you're leaving money on the table. Graviton spot prices are consistently 70-80% cheaper than on-demand x86 equivalents, and spot interruption rates tend to be lower because fewer teams compete for them.

Add Graviton instances to your spot node group with a multi-arch build strategy:

resource "aws_eks_node_group" "spot_graviton" {
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = "spot-graviton"
  node_role_arn   = aws_iam_role.node.arn
  subnet_ids      = var.private_subnet_ids
  ami_type        = "AL2_ARM_64"
  capacity_type   = "SPOT"

  instance_types = [
    "m7g.xlarge",
    "m7g.2xlarge",
    "m6g.xlarge",
    "m6g.2xlarge",
    "c7g.xlarge",
    "c7g.2xlarge",
  ]

  scaling_config {
    desired_size = 3
    max_size     = 10
    min_size     = 1
  }

  labels = {
    "node-type"    = "spot"
    "arch"         = "arm64"
    "workload-type" = "stateless"
  }

  taint {
    key    = "spot"
    value  = "true"
    effect = "NO_SCHEDULE"
  }
}

Your container images need to be multi-arch. If you're using Docker Buildx, this is straightforward:

docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --tag myapp/api:v2.1.0 \
  --push .

Kubernetes schedules pods to matching architectures automatically. No changes to your Deployment manifests needed — just build multi-arch images and let the scheduler handle it.

Spot instances aren't risky. Running all your compute on on-demand when you don't have to — that's the real risk to your budget.

Share:
Dev Patel
Dev Patel

Cloud Cost Optimization Specialist

I find the money your cloud is wasting. FinOps practitioner, data-driven analyst, and the person your CFO wishes they'd hired sooner. Every dollar saved is a dollar earned.

Related Articles