Kubernetes Resource Requests vs Limits: The Guide I Wish I Had Before My First OOM Kill
Your Pods Are Lying About What They Need
Here's the thing — most Kubernetes deployments I audit have resource requests and limits that were copy-pasted from a blog post three years ago and never touched again. Developers set cpu: 500m and memory: 512Mi because it "seemed reasonable," and nobody ever went back to check whether that was remotely accurate.
The result? Clusters that are 70% allocated on paper but 15% utilized in reality. Or worse, pods getting OOM-killed in production because the memory limit was set based on vibes instead of data.
Let me tell you why understanding the relationship between requests, limits, and QoS classes is foundational to running Kubernetes well — and how to actually right-size your workloads.
Requests vs Limits: They Do Very Different Things
This is the most misunderstood concept in Kubernetes resource management. Requests and limits are not "min and max." They serve fundamentally different purposes in the scheduler and the kubelet.
Requests tell the scheduler how much capacity to reserve. When you set cpu: 250m as a request, you're saying "this pod needs at least 250 millicores guaranteed." The scheduler uses this to decide which node has room for the pod.
Limits tell the kubelet when to intervene. When a pod exceeds its memory limit, it gets OOM-killed. When it exceeds its CPU limit, it gets throttled.
apiVersion: v1
kind: Pod
metadata:
name: api-server
spec:
containers:
- name: app
image: myapp:v2.1.0
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"
Here's where it gets nuanced. CPU is compressible — if a container hits its CPU limit, it gets throttled but keeps running. Memory is incompressible — if a container exceeds its memory limit, the kernel's OOM killer terminates it. No warning, no graceful shutdown. Dead.
The Three QoS Classes and Why They Matter
Kubernetes assigns every pod a Quality of Service class based on how you configure requests and limits. This classification determines which pods get killed first when the node runs out of resources.
Guaranteed
Every container has requests equal to limits for both CPU and memory:
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "512Mi"
Guaranteed pods are the last to be evicted. The kubelet will kill BestEffort and Burstable pods before touching Guaranteed ones. Use this for your most critical workloads — databases, payment processors, anything where an unexpected restart causes real pain.
Burstable
At least one container has a request set, but requests don't equal limits:
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "1Gi"
This is the most common class in production. The pod gets its requested resources guaranteed, but can burst higher when capacity is available. The risk is that during contention, the kubelet will start killing Burstable pods after all BestEffort pods are gone.
BestEffort
No requests or limits set at all:
resources: {}
These are the first pods to die when the node is under memory pressure. I've seen teams run entire production workloads as BestEffort because they "didn't get around to setting resources." Then they're baffled when pods restart randomly under load.
OOM Kills: Understanding the Kill Chain
When a node runs low on memory, here's what actually happens — and this is the part most docs gloss over:
- The kubelet monitors node memory via the
memory.availableeviction signal - When available memory drops below the eviction threshold (default: 100Mi), the kubelet starts evicting pods
- Eviction order: BestEffort first, then Burstable (sorted by how much they exceed their requests), then Guaranteed
- If eviction doesn't free enough memory fast enough, the Linux kernel's OOM killer steps in and kills processes directly
# Check if a pod was OOM-killed
kubectl describe pod my-pod -n production | grep -A5 "Last State"
# You'll see something like:
# Last State: Terminated
# Reason: OOMKilled
# Exit Code: 137
# Check node-level memory pressure
kubectl describe node worker-03 | grep -A5 "Conditions"
Here's the thing about OOM kills — exit code 137 means the process received SIGKILL (128 + 9). There's no cleanup, no connection draining, no graceful shutdown handler. Whatever that container was doing mid-transaction is gone.
The most insidious scenario I've dealt with is when a container's actual memory usage slowly climbs over hours due to a leak, eventually hits the limit, gets killed, restarts, and the cycle repeats. The pod shows "Running" with a restart count of 47. Everything looks fine until you actually check.
CPU Throttling: The Silent Performance Killer
OOM kills are loud and obvious. CPU throttling is sneaky. Your pod stays running, but requests take three times longer and nobody knows why.
When a container exceeds its CPU limit, the kernel's CFS (Completely Fair Scheduler) throttles it. The container is forced to wait, even if the node has spare CPU capacity. Let me tell you why this matters more than you think.
# Check CPU throttling for a container
kubectl exec -it my-pod -- cat /sys/fs/cgroup/cpu.stat
# Look for:
# nr_throttled — number of times the container was throttled
# throttled_time — total time spent throttled (nanoseconds)
A common antipattern: setting CPU limits equal to CPU requests. This creates a Guaranteed QoS class, which sounds good, but it also means the container can never burst above its request — even when the node is idle. For bursty workloads like web servers that spike during request handling, this causes constant throttling.
# This causes unnecessary throttling for bursty workloads
resources:
requests:
cpu: "250m"
limits:
cpu: "250m" # Container can never burst above 250m
# Better for most web workloads — allow CPU bursting
resources:
requests:
cpu: "250m"
limits:
cpu: "1000m" # Can burst to 1 core when needed
Some teams have started dropping CPU limits entirely and relying only on requests. There's a valid argument for this approach — it avoids throttling while still giving the scheduler the information it needs. But it requires that you trust your workloads not to be CPU hogs, and that you have good monitoring in place.
Right-Sizing: Data Over Guesswork
Stop guessing. Use actual utilization data to set requests and limits. Here's the process I follow:
Step 1: Deploy with generous limits and observe
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "2000m"
memory: "2Gi"
Step 2: Collect data over a meaningful period
You need at least a week of data that includes peak traffic patterns. Use Prometheus to capture actual usage:
# P99 CPU usage over 7 days
quantile_over_time(0.99,
rate(container_cpu_usage_seconds_total{
namespace="production",
container="api-server"
}[5m])[7d:]
)
# P99 memory usage over 7 days
quantile_over_time(0.99,
container_memory_working_set_bytes{
namespace="production",
container="api-server"
}[7d:]
)
Step 3: Set requests and limits based on percentiles
resources:
requests:
cpu: "200m" # P50 usage + 20% buffer
memory: "384Mi" # P99 usage + 10% buffer
limits:
cpu: "800m" # P99 usage + headroom for spikes
memory: "512Mi" # P99 usage + 30% buffer for safety
Here's the thing about memory limits specifically: set them too tight and you get OOM kills. Set them too loose and a memory leak can consume the entire node before anyone notices. I aim for 20-30% above the P99 observed usage as a starting point, then adjust based on the workload's behavior.
LimitRange and ResourceQuota: Guardrails for Teams
In multi-tenant clusters, you cannot trust every team to set resources correctly. Use LimitRange to enforce defaults and boundaries per namespace:
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-alpha
spec:
limits:
- default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "4"
memory: "8Gi"
min:
cpu: "50m"
memory: "64Mi"
type: Container
And use ResourceQuota to cap total consumption per namespace:
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "20"
requests.memory: "40Gi"
limits.cpu: "40"
limits.memory: "80Gi"
pods: "100"
Without these guardrails, one team's runaway deployment can starve everyone else. I've seen a single misconfigured CronJob request 64Gi of memory per pod and drain an entire node pool before anyone noticed.
The Recommendations That Actually Work
After years of tuning resource configurations across production clusters, here's where I've landed:
-
Always set memory requests and limits. Memory is incompressible. An unbounded container is a ticking time bomb.
-
Set CPU requests, but think carefully about CPU limits. For latency-sensitive workloads, CPU limits cause throttling that directly impacts user experience. Consider using only requests.
-
Use Guaranteed QoS for stateful workloads. Databases, message queues, and anything with persistent state should be the last to get evicted.
-
Use Burstable QoS for stateless web services. These can tolerate occasional eviction because they're designed to be replaced.
-
Never run production workloads as BestEffort. If it matters enough to be in production, it matters enough to have resource definitions.
-
Automate right-sizing with VPA in recommendation mode. The Vertical Pod Autoscaler can continuously analyze utilization and suggest better values without automatically applying them.
-
Review resource settings quarterly. Application behavior changes as features are added. The values you set six months ago might be wildly inaccurate now.
Final Thoughts
Resource management in Kubernetes isn't glamorous work. Nobody's writing conference talks about how they tuned their memory requests. But it's the foundation that everything else builds on — your scheduling efficiency, your workload stability, your cloud bill.
Get requests and limits right, and your cluster runs smoothly. Get them wrong, and you'll spend your weekends debugging OOM kills and wondering why your nodes are "full" at 20% utilization.
Start with data, set conservative limits, monitor continuously, and adjust. That's the entire strategy. It's not exciting, but it works.
Related Articles
Senior Kubernetes Architect
10+ years orchestrating containers in production. Battle-tested opinions on everything from pod scheduling to service mesh. I've seen clusters burn and helped rebuild them better.
Related Articles
The Complete Guide to Kubernetes Deployment Strategies: Rolling, Blue-Green, Canary, and Progressive Delivery
A comprehensive guide to every Kubernetes deployment strategy — rolling updates, blue-green, canary, and progressive delivery with Argo Rollouts and Flagger.
Kubernetes Ingress vs Gateway API: When to Migrate and How to Do It Without Breaking Everything
A practical comparison of Kubernetes Ingress and Gateway API, with a migration strategy that won't take down your production traffic.
Encrypting Kubernetes Secrets at Rest: Because Base64 Is Not Encryption
How to configure encryption at rest for Kubernetes secrets using KMS providers, because your secrets in etcd are stored in plaintext by default.