DevOpsil
Kubernetes
93%
Fresh
Part 8 of 8 in Kubernetes from Zero to Hero

Production-Ready Helm Charts: Templates, Values, Hooks, and Testing

Aareez AsifAareez Asif9 min read

Most Helm Charts Are Not Production-Ready

Here's the thing about Helm charts in the wild — the vast majority of them work on a developer's laptop and crumble in production. I've inherited charts that hardcoded replica counts, had no resource limits, used latest as the default image tag, and exposed secrets in plaintext through values files.

A production-ready Helm chart is one that another engineer can deploy to a live cluster with confidence, customize for their environment without forking the chart, and upgrade without downtime. That bar is higher than most people realize.

Let me tell you why these patterns matter, and walk through the practices I enforce on every chart that touches production.

Chart Structure That Scales

Start with a clean layout. Every chart I build follows this structure:

my-app/
├── Chart.yaml
├── Chart.lock
├── values.yaml
├── values-production.yaml
├── values-staging.yaml
├── templates/
│   ├── _helpers.tpl
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   ├── hpa.yaml
│   ├── pdb.yaml
│   ├── serviceaccount.yaml
│   ├── configmap.yaml
│   ├── secret.yaml
│   ├── networkpolicy.yaml
│   └── tests/
│       └── test-connection.yaml
├── charts/            # subcharts
└── ci/
    └── ci-values.yaml # values used in CI testing

The ci/ directory is something most people skip. It holds a values file specifically for automated testing in your pipeline. More on that later.

Values Design: The API of Your Chart

Your values.yaml is an API contract. Treat it like one. Here's how I structure values for a typical web service:

# values.yaml

# -- Number of replicas. Override per environment.
replicaCount: 2

image:
  # -- Container image repository
  repository: ghcr.io/myorg/my-app
  # -- Image pull policy
  pullPolicy: IfNotPresent
  # -- Image tag. Defaults to chart appVersion.
  tag: ""

# -- Image pull secrets for private registries
imagePullSecrets: []

serviceAccount:
  # -- Create a service account
  create: true
  # -- Annotations for the service account (e.g., IRSA)
  annotations: {}
  # -- Service account name. Auto-generated if not set.
  name: ""

service:
  type: ClusterIP
  port: 80
  targetPort: 8080

ingress:
  enabled: false
  className: nginx
  annotations: {}
  hosts:
    - host: my-app.example.com
      paths:
        - path: /
          pathType: Prefix
  tls: []

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

autoscaling:
  enabled: false
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70

podDisruptionBudget:
  enabled: true
  minAvailable: 1

# -- Extra environment variables as key-value pairs
env: {}

# -- Extra environment variables from secrets/configmaps
envFrom: []

# -- Readiness probe configuration
readinessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 5
  periodSeconds: 10

# -- Liveness probe configuration
livenessProbe:
  httpGet:
    path: /health
    port: http
  initialDelaySeconds: 15
  periodSeconds: 20

# -- Node selector constraints
nodeSelector: {}

# -- Tolerations for pod scheduling
tolerations: []

# -- Affinity rules for pod scheduling
affinity: {}

Let me tell you why every field has a comment with --. That double-dash prefix is a convention that helm-docs picks up to auto-generate documentation. If you're not generating docs from your values file, you're asking every consumer of your chart to read your templates to understand what's configurable.

Template Helpers That Prevent Disasters

Your _helpers.tpl should define reusable named templates. Here's the foundation I use:

{{/* templates/_helpers.tpl */}}

{{/*
Expand the name of the chart.
*/}}
{{- define "my-app.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Create a fully qualified app name.
We truncate at 63 characters because Kubernetes name fields are limited.
*/}}
{{- define "my-app.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}

{{/*
Common labels applied to every resource.
*/}}
{{- define "my-app.labels" -}}
helm.sh/chart: {{ include "my-app.chart" . }}
{{ include "my-app.selectorLabels" . }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{/*
Selector labels — used in deployments and services.
*/}}
{{- define "my-app.selectorLabels" -}}
app.kubernetes.io/name: {{ include "my-app.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

{{/*
Chart name and version for the chart label.
*/}}
{{- define "my-app.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Service account name.
*/}}
{{- define "my-app.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "my-app.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}

The 63-character truncation is not optional. Kubernetes rejects names longer than 63 characters, and when your release name is staging-my-long-application-name, that limit comes fast. I've watched deployments fail in CI because nobody tested with long release names.

The Deployment Template Done Right

Here's a deployment template with the patterns I consider mandatory:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "my-app.fullname" . }}
  labels:
    {{- include "my-app.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "my-app.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
      labels:
        {{- include "my-app.labels" . | nindent 8 }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "my-app.serviceAccountName" . }}
      securityContext:
        runAsNonRoot: true
        fsGroup: 65534
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
          ports:
            - name: http
              containerPort: {{ .Values.service.targetPort }}
              protocol: TCP
          {{- with .Values.readinessProbe }}
          readinessProbe:
            {{- toYaml . | nindent 12 }}
          {{- end }}
          {{- with .Values.livenessProbe }}
          livenessProbe:
            {{- toYaml . | nindent 12 }}
          {{- end }}
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          {{- if .Values.env }}
          env:
            {{- range $key, $value := .Values.env }}
            - name: {{ $key }}
              value: {{ $value | quote }}
            {{- end }}
          {{- end }}
          {{- with .Values.envFrom }}
          envFrom:
            {{- toYaml . | nindent 12 }}
          {{- end }}
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}

Here's the thing about that checksum/config annotation — it forces a rolling restart when your ConfigMap changes. Without it, you update a config value, Helm reports success, and your pods keep running with the old config because the Deployment spec itself didn't change. I've seen this cause hours of confusion.

Also note the security context: runAsNonRoot, readOnlyRootFilesystem, and dropping all capabilities. These should be defaults, not opt-in.

Helm Hooks for Lifecycle Management

Hooks let you run actions at specific points in the release lifecycle. Here's a database migration hook that runs before upgrades:

apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "my-app.fullname" . }}-migrate
  labels:
    {{- include "my-app.labels" . | nindent 4 }}
  annotations:
    "helm.sh/hook": pre-upgrade,pre-install
    "helm.sh/hook-weight": "-1"
    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
  backoffLimit: 3
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: migrate
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          command: ["./migrate", "--direction", "up"]
          envFrom:
            - secretRef:
                name: {{ include "my-app.fullname" . }}-db-credentials

Let me tell you why hook-delete-policy is critical. Without before-hook-creation, if a previous migration Job still exists (maybe it failed), the new hook can't create a Job with the same name and the entire upgrade hangs. I've been paged for exactly this scenario.

The hook-weight controls ordering when you have multiple hooks. Lower numbers run first. Use negative weights for migrations that must complete before other setup hooks.

Testing Your Charts

Helm has a built-in test framework that almost nobody uses. Add test pods in templates/tests/:

apiVersion: v1
kind: Pod
metadata:
  name: "{{ include "my-app.fullname" . }}-test-connection"
  labels:
    {{- include "my-app.labels" . | nindent 4 }}
  annotations:
    "helm.sh/hook": test
spec:
  restartPolicy: Never
  containers:
    - name: wget
      image: busybox
      command: ['wget']
      args: ['{{ include "my-app.fullname" . }}:{{ .Values.service.port }}/health']

Run tests after install:

helm test my-release -n production

But in-cluster tests are only one layer. For CI, I also run:

# Lint the chart
helm lint ./my-app --values ./my-app/ci/ci-values.yaml

# Template rendering — catches syntax errors without a cluster
helm template test-release ./my-app --values ./my-app/ci/ci-values.yaml > rendered.yaml

# Validate rendered manifests against Kubernetes schemas
kubeconform -strict -kubernetes-version 1.29.0 rendered.yaml

# Policy checks with conftest
conftest test rendered.yaml --policy ./policies/

This pipeline catches the majority of issues before anything touches a cluster. The ci-values.yaml file should enable every feature toggle so your templates get fully rendered and tested.

Patterns to Avoid

After maintaining charts across many teams, these are the anti-patterns I push back on:

  1. Defaulting image tag to latest. Use .Chart.AppVersion as the default. Pinned versions are non-negotiable for reproducible deployments.

  2. Putting secrets in values.yaml. Secrets belong in external secret managers (Vault, AWS Secrets Manager) referenced via envFrom or external-secrets-operator. Never check credentials into a chart.

  3. Massive monolithic templates. If a template file exceeds 150 lines, split it. Use named templates in _helpers.tpl for repeated blocks.

  4. No resource requests or limits. A chart without resource definitions will get scheduled on nodes that can't handle it, or worse, it'll consume unbounded resources and starve other workloads.

  5. Skipping PodDisruptionBudgets. If you care about availability during node drains and cluster upgrades, a PDB is mandatory. Default minAvailable: 1 for any multi-replica workload.

Final Thoughts

A Helm chart is the interface between your application and the cluster. It encodes your operational knowledge: how the app should be deployed, what resources it needs, how it scales, and what happens during upgrades.

Treat your charts with the same rigor as application code. Review them in PRs, test them in CI, version them properly. The chart that works on your laptop and the chart that survives a production node failure at 3 AM are very different things. Build for the 3 AM scenario, and the laptop scenario takes care of itself.

Share:
Aareez Asif
Aareez Asif

Senior Kubernetes Architect

10+ years orchestrating containers in production. Battle-tested opinions on everything from pod scheduling to service mesh. I've seen clusters burn and helped rebuild them better.

Related Articles