Kubernetes Resource Requests vs Limits: The Guide I Wish I Had Before My First OOM Kill

Your Pods Are Lying About What They Need

Here's the thing — most Kubernetes deployments I audit have resource requests and limits that were copy-pasted from a blog post three years ago and never touched again. Developers set cpu: 500m and memory: 512Mi because it "seemed reasonable," and nobody ever went back to check whether that was remotely accurate.

The result? Clusters that are 70% allocated on paper but 15% utilized in reality. Or worse, pods getting OOM-killed in production because the memory limit was set based on vibes instead of data.

Let me tell you why understanding the relationship between requests, limits, and QoS classes is foundational to running Kubernetes well — and how to actually right-size your workloads.

Requests vs Limits: They Do Very Different Things

This is the most misunderstood concept in Kubernetes resource management. Requests and limits are not "min and max." They serve fundamentally different purposes in the scheduler and the kubelet.

Requests tell the scheduler how much capacity to reserve. When you set cpu: 250m as a request, you're saying "this pod needs at least 250 millicores guaranteed." The scheduler uses this to decide which node has room for the pod.

Limits tell the kubelet when to intervene. When a pod exceeds its memory limit, it gets OOM-killed. When it exceeds its CPU limit, it gets throttled.

apiVersion: v1
kind: Pod
metadata:
  name: api-server
spec:
  containers:
  - name: app
    image: myapp:v2.1.0
    resources:
      requests:
        cpu: "250m"
        memory: "256Mi"
      limits:
        cpu: "1000m"
        memory: "512Mi"

Here's where it gets nuanced. CPU is compressible — if a container hits its CPU limit, it gets throttled but keeps running. Memory is incompressible — if a container exceeds its memory limit, the kernel's OOM killer terminates it. No warning, no graceful shutdown. Dead.

The Three QoS Classes and Why They Matter

Kubernetes assigns every pod a Quality of Service class based on how you configure requests and limits. This classification determines which pods get killed first when the node runs out of resources.

Guaranteed

Every container has requests equal to limits for both CPU and memory:

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Guaranteed pods are the last to be evicted. The kubelet will kill BestEffort and Burstable pods before touching Guaranteed ones. Use this for your most critical workloads — databases, payment processors, anything where an unexpected restart causes real pain.

Burstable

At least one container has a request set, but requests don't equal limits:

resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "1000m"
    memory: "1Gi"

This is the most common class in production. The pod gets its requested resources guaranteed, but can burst higher when capacity is available. The risk is that during contention, the kubelet will start killing Burstable pods after all BestEffort pods are gone.

BestEffort

No requests or limits set at all:

resources: {}

These are the first pods to die when the node is under memory pressure. I've seen teams run entire production workloads as BestEffort because they "didn't get around to setting resources." Then they're baffled when pods restart randomly under load.

OOM Kills: Understanding the Kill Chain

When a node runs low on memory, here's what actually happens — and this is the part most docs gloss over:

The kubelet monitors node memory via the memory.available eviction signal
When available memory drops below the eviction threshold (default: 100Mi), the kubelet starts evicting pods
Eviction order: BestEffort first, then Burstable (sorted by how much they exceed their requests), then Guaranteed
If eviction doesn't free enough memory fast enough, the Linux kernel's OOM killer steps in and kills processes directly

# Check if a pod was OOM-killed
kubectl describe pod my-pod -n production | grep -A5 "Last State"

# You'll see something like:
#   Last State:  Terminated
#     Reason:    OOMKilled
#     Exit Code: 137

# Check node-level memory pressure
kubectl describe node worker-03 | grep -A5 "Conditions"

Here's the thing about OOM kills — exit code 137 means the process received SIGKILL (128 + 9). There's no cleanup, no connection draining, no graceful shutdown handler. Whatever that container was doing mid-transaction is gone.

The most insidious scenario I've dealt with is when a container's actual memory usage slowly climbs over hours due to a leak, eventually hits the limit, gets killed, restarts, and the cycle repeats. The pod shows "Running" with a restart count of 47. Everything looks fine until you actually check.

CPU Throttling: The Silent Performance Killer

OOM kills are loud and obvious. CPU throttling is sneaky. Your pod stays running, but requests take three times longer and nobody knows why.

When a container exceeds its CPU limit, the kernel's CFS (Completely Fair Scheduler) throttles it. The container is forced to wait, even if the node has spare CPU capacity. Let me tell you why this matters more than you think.

# Check CPU throttling for a container
kubectl exec -it my-pod -- cat /sys/fs/cgroup/cpu.stat

# Look for:
#   nr_throttled — number of times the container was throttled
#   throttled_time — total time spent throttled (nanoseconds)

A common antipattern: setting CPU limits equal to CPU requests. This creates a Guaranteed QoS class, which sounds good, but it also means the container can never burst above its request — even when the node is idle. For bursty workloads like web servers that spike during request handling, this causes constant throttling.

# This causes unnecessary throttling for bursty workloads
resources:
  requests:
    cpu: "250m"
  limits:
    cpu: "250m"  # Container can never burst above 250m

# Better for most web workloads — allow CPU bursting
resources:
  requests:
    cpu: "250m"
  limits:
    cpu: "1000m"  # Can burst to 1 core when needed

Some teams have started dropping CPU limits entirely and relying only on requests. There's a valid argument for this approach — it avoids throttling while still giving the scheduler the information it needs. But it requires that you trust your workloads not to be CPU hogs, and that you have good monitoring in place.

Right-Sizing: Data Over Guesswork

Stop guessing. Use actual utilization data to set requests and limits. Here's the process I follow:

Step 1: Deploy with generous limits and observe

resources:
  requests:
    cpu: "100m"
    memory: "128Mi"
  limits:
    cpu: "2000m"
    memory: "2Gi"

Step 2: Collect data over a meaningful period

You need at least a week of data that includes peak traffic patterns. Use Prometheus to capture actual usage:

# P99 CPU usage over 7 days
quantile_over_time(0.99,
  rate(container_cpu_usage_seconds_total{
    namespace="production",
    container="api-server"
  }[5m])[7d:]
)

# P99 memory usage over 7 days
quantile_over_time(0.99,
  container_memory_working_set_bytes{
    namespace="production",
    container="api-server"
  }[7d:]
)

Step 3: Set requests and limits based on percentiles

resources:
  requests:
    cpu: "200m"      # P50 usage + 20% buffer
    memory: "384Mi"  # P99 usage + 10% buffer
  limits:
    cpu: "800m"      # P99 usage + headroom for spikes
    memory: "512Mi"  # P99 usage + 30% buffer for safety

Here's the thing about memory limits specifically: set them too tight and you get OOM kills. Set them too loose and a memory leak can consume the entire node before anyone notices. I aim for 20-30% above the P99 observed usage as a starting point, then adjust based on the workload's behavior.

LimitRange and ResourceQuota: Guardrails for Teams

In multi-tenant clusters, you cannot trust every team to set resources correctly. Use LimitRange to enforce defaults and boundaries per namespace:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-alpha
spec:
  limits:
  - default:
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:
      cpu: "100m"
      memory: "128Mi"
    max:
      cpu: "4"
      memory: "8Gi"
    min:
      cpu: "50m"
      memory: "64Mi"
    type: Container

And use ResourceQuota to cap total consumption per namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    pods: "100"

Without these guardrails, one team's runaway deployment can starve everyone else. I've seen a single misconfigured CronJob request 64Gi of memory per pod and drain an entire node pool before anyone noticed.

The Recommendations That Actually Work

After years of tuning resource configurations across production clusters, here's where I've landed:

Always set memory requests and limits. Memory is incompressible. An unbounded container is a ticking time bomb.
Set CPU requests, but think carefully about CPU limits. For latency-sensitive workloads, CPU limits cause throttling that directly impacts user experience. Consider using only requests.
Use Guaranteed QoS for stateful workloads. Databases, message queues, and anything with persistent state should be the last to get evicted.
Use Burstable QoS for stateless web services. These can tolerate occasional eviction because they're designed to be replaced.
Never run production workloads as BestEffort. If it matters enough to be in production, it matters enough to have resource definitions.
Automate right-sizing with VPA in recommendation mode. The Vertical Pod Autoscaler can continuously analyze utilization and suggest better values without automatically applying them.
Review resource settings quarterly. Application behavior changes as features are added. The values you set six months ago might be wildly inaccurate now.

Final Thoughts

Resource management in Kubernetes isn't glamorous work. Nobody's writing conference talks about how they tuned their memory requests. But it's the foundation that everything else builds on — your scheduling efficiency, your workload stability, your cloud bill.

Get requests and limits right, and your cluster runs smoothly. Get them wrong, and you'll spend your weekends debugging OOM kills and wondering why your nodes are "full" at 20% utilization.

Start with data, set conservative limits, monitor continuously, and adjust. That's the entire strategy. It's not exciting, but it works.

On this page

Kubernetes Resource Requests vs Limits: The Guide I Wish I Had Before My First OOM Kill

Your Pods Are Lying About What They Need

Requests vs Limits: They Do Very Different Things

The Three QoS Classes and Why They Matter

Guaranteed

Burstable

BestEffort

OOM Kills: Understanding the Kill Chain

CPU Throttling: The Silent Performance Killer

Right-Sizing: Data Over Guesswork

Step 1: Deploy with generous limits and observe

Step 2: Collect data over a meaningful period

Step 3: Set requests and limits based on percentiles

LimitRange and ResourceQuota: Guardrails for Teams

The Recommendations That Actually Work

Final Thoughts

Related Articles

Istio Observability and Authorization: Distributed Tracing, Metrics, and Access Policies

Istio Service Mesh: Installation, Traffic Management, and mTLS

Fix Kubernetes OOMKilled: Pod Keeps Getting Killed for Memory

The Complete Guide to Kubernetes Deployment Strategies: Rolling, Blue-Green, Canary, and Progressive Delivery

Kubernetes Ingress vs Gateway API: When to Migrate and How to Do It Without Breaking Everything

Encrypting Kubernetes Secrets at Rest: Because Base64 Is Not Encryption

More in Kubernetes

Kubernetes Vertical Pod Autoscaler: Automating Resource Request Tuning In Production

Fix Helm 'UPGRADE FAILED: has no deployed releases'

Fix Kubernetes 'Evicted' Pods Filling Up the Node

Fix Kubernetes ImagePullBackOff: Container Image Won't Pull

Discussion

Related Articles

The Complete Guide to Kubernetes Deployment Strategies: Rolling, Blue-Green, Canary, and Progressive Delivery

Systematic Debugging of CrashLoopBackOff: A Field Guide From Someone Who's Been Paged Too Many Times

Encrypting Kubernetes Secrets at Rest: Because Base64 Is Not Encryption