Right sizing Kubernetes workloads to increase resource utilization efficiency

Feb 23, 2026

Rajshri Mohan K S

When we deploy our workloads to Kubernetes, there's always the question of sizing the workloads. When we define a Pod, Kubernetes allows us to specify the resources it needs using the resource requests and limits.

resources:
    limits:
        cpu: "250m"
        memory: "1024Mi"
    requests:
        cpu: "100m"
        memory: "100Mi"

This is specified on a per container level within the Pod definition in the .spec.containers[].resources path (Kubernetes v1.34 allows us to define resources in the Pod level in the .spec.resources path).

`requests` vs `limits`

The container runtime defines how the `requests` and `limits` are handled.

The CPU limit defines the maximum CPU time the container can use.
The CPU request is used to allocate CPU time as a fraction of the total CPUs available. Workloads with higher CPU requests are allocated higher CPU time.
Memory request is mainly used during (Kubernetes) Pod scheduling. The node must have the requested memory available in order for the pod to be scheduled in it.
The memory limit defines the maximum amount of memory allowed to be accessed by the pod. If the container tries to allocate more memory than this limit, the pod gets restarted by Kubernetes with a OOMKilled status.

One thing to note here is that, while Kubernetes will schedule a pod with a memory request value less than the node available value, if the limit is defined higher, it could potentially get evicted out of the node if the node is not able to provide the memory the pod is attempting to consume. For example, a pod could have the following resource requests and limits:

resources:
    limits:
        cpu: "2000m"
        memory: "4Gi"
    requests:
        cpu: "1000m"
        memory: "2Gi"

If there's a node with 8GB of installed RAM of which 3GB is available, Kubernetes will still schedule the pod on the node. But as the pod starts to consume more then 3GB of memory, the pod will get evicted from the node due to the node running out of memory rather than be OOMKilled since we never reached the limit. This can lead to confusing scheduling situations.

It is, therefore, important to right size our Kubernetes workloads as well as our Kubernetes nodes.

Few large nodes vs. many small nodes

For an estimated capacity of 16 cores of CPU and 64GB of RAM, it might be convenient to think that we can have two nodes of 8 cores and 32GB RAM rather than 4 nodes of 4 cores and 16GB RAM to be able to provide more RAM to the workloads. However, this also means we'll only be running 2 nodes vs 4 leading to reduced resiliency. Spreading out our pods over multiple nodes increases availability by avoiding risks associated with failing nodes.

On the other hand, too many smaller nodes would mean that we are sacrificing useful capacity to Kubernetes system pods in each node. This also means that we are limiting the number of pods that can be scheduled in each node due to sizing constraints and increasing the risk of pods getting evicted due to insufficient memory.

What if we don't specify resource limits in pods?

Not specifying resource limits in pods means a pod can consume any amount of memory as long as the node can provide it. This means a pod leaking memory could potentially exhaust the entire node of its available memory causing other pods to be evicted. This is not ideal.

How do we right size nodes and pods in Kubernetes?

There is no one size fits all scenario here. However, we could employ certain methods to arrive at a good size estimation.

Using monitoring tools like Prometheus, we can monitor the resource usage of pods in real time and calculate the required amount of resources to size the pods.
We can use the node metrics data API in Kubernetes to monitor the node resource usage and compute sizes for further workloads.

Both the above approaches are reactive and require monitoring and manual intervention and is generally a good practice to start this right from development, in order achieve good sizing in production.

Tags:

Right sizing Kubernetes workloads to increase resource utilization efficiency

`requests` vs `limits`

Few large nodes vs. many small nodes

What if we don't specify resource limits in pods?

How do we right size nodes and pods in Kubernetes?

Why Your AI Investments Are Failing and It's Not the Models

Moving away from ingress-nginx to the Gateway API in GKE

Right sizing Kubernetes workloads to increase resource utilization efficiency

requests vs limits

Few large nodes vs. many small nodes

What if we don't specify resource limits in pods?

How do we right size nodes and pods in Kubernetes?

Why Your AI Investments Are Failing and It's Not the Models

Moving away from ingress-nginx to the Gateway API in GKE

`requests` vs `limits`