GitLab CI Pipeline Optimization: Caching, DAG, and Parallel Jobs
The Fast Pipeline First
Here's the optimized .gitlab-ci.yml. Then I'll show you what each piece saves.
stages:
- build
- test
- deploy
variables:
DOCKER_BUILDKIT: "1"
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
NPM_CONFIG_CACHE: "$CI_PROJECT_DIR/.cache/npm"
.default_cache: &default_cache
key:
files:
- package-lock.json
- requirements.txt
paths:
- .cache/
- node_modules/
- .venv/
policy: pull
build-frontend:
stage: build
cache:
<<: *default_cache
policy: pull-push
script:
- npm ci --prefer-offline
- npm run build
artifacts:
paths:
- dist/
expire_in: 1 hour
build-backend:
stage: build
cache:
<<: *default_cache
policy: pull-push
script:
- python -m venv .venv
- source .venv/bin/activate
- pip install -r requirements.txt
- python setup.py build
artifacts:
paths:
- build/
expire_in: 1 hour
lint:
stage: test
needs: ["build-frontend"]
cache:
<<: *default_cache
script:
- npm run lint
- npm run type-check
unit-tests:
stage: test
needs: ["build-backend"]
cache:
<<: *default_cache
parallel: 4
script:
- source .venv/bin/activate
- pip install -r requirements.txt
- python -m pytest tests/unit/ --splits 4 --group $CI_NODE_INDEX
artifacts:
reports:
junit: report.xml
integration-tests:
stage: test
needs: ["build-backend", "build-frontend"]
services:
- postgres:15
- redis:7
variables:
POSTGRES_DB: testdb
POSTGRES_PASSWORD: testpass
script:
- source .venv/bin/activate
- pip install -r requirements.txt
- python -m pytest tests/integration/ -x
e2e-tests:
stage: test
needs: ["build-frontend"]
parallel: 3
script:
- npm ci --prefer-offline
- npx playwright install --with-deps chromium
- npx playwright test --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
deploy-staging:
stage: deploy
needs: ["unit-tests", "integration-tests", "e2e-tests"]
script:
- ./deploy.sh staging
environment:
name: staging
rules:
- if: $CI_COMMIT_BRANCH == "main"
That pipeline runs in under 6 minutes. The unoptimized version took 25. Here's why.
Caching: Stop Re-downloading the Internet
Every pipeline run without caching downloads every dependency from scratch. That's insane.
The key.files directive hashes your lockfiles. Same lockfile, same cache. Change a dependency, cache busts automatically.
cache:
key:
files:
- package-lock.json
paths:
- node_modules/
policy: pull-push
Three cache policies matter:
pull-push— Read and write cache. Use on build jobs that populate the cache.pull— Read-only. Use on test jobs that consume the cache. Prevents cache corruption from parallel writes.push— Write-only. Rare. Used for cache warming jobs.
One rule: only one job should pull-push per cache key. Multiple writers cause race conditions. Every other job gets pull.
DAG: needs Keyword Kills Idle Time
Default GitLab CI waits for the entire previous stage to finish before starting the next one. The needs keyword breaks that wall.
lint:
stage: test
needs: ["build-frontend"] # starts as soon as build-frontend finishes
integration-tests:
stage: test
needs: ["build-backend", "build-frontend"] # waits for both
Without needs, lint waits for build-backend too. That's wasted time. DAG dependencies let jobs start the instant their actual dependencies complete.
Visualize it. Without DAG:
build-frontend ──┐
├── (wait for both) ── lint, unit-tests, integration-tests, e2e
build-backend ──┘
With DAG:
build-frontend ── lint (starts immediately)
── e2e-tests (starts immediately)
build-backend ── unit-tests (starts immediately)
both done ── integration-tests
You just saved 3-5 minutes depending on build times.
Parallel Test Splitting
Tests are the bottleneck. Split them.
unit-tests:
parallel: 4
script:
- python -m pytest tests/unit/ --splits 4 --group $CI_NODE_INDEX
GitLab spawns 4 runners. $CI_NODE_INDEX tells each runner which chunk to run. $CI_NODE_TOTAL gives you the total count.
For Playwright or Cypress, sharding is built in:
e2e-tests:
parallel: 3
script:
- npx playwright test --shard=$CI_NODE_INDEX/$CI_NODE_TOTAL
200 E2E tests across 3 runners. Each runs ~67. Wall time drops by 3x.
Want smarter splitting? Use pytest-split with timing data:
unit-tests:
parallel: 4
script:
- python -m pytest tests/unit/ --splits 4 --group $CI_NODE_INDEX --splitting-algorithm least_duration
artifacts:
paths:
- .test_durations
It learns which tests are slow and distributes them evenly. No more one runner finishing in 30 seconds while another grinds for 4 minutes.
Artifacts: Pass Data, Not Re-build
Build once. Share everywhere.
build-frontend:
artifacts:
paths:
- dist/
expire_in: 1 hour
Set expire_in. Always. Default artifact retention fills your storage fast. One hour is enough for pipeline artifacts. Bump to 30 days for release builds.
Use needs to pull artifacts from specific jobs instead of downloading everything:
deploy-staging:
needs:
- job: build-frontend
artifacts: true
- job: unit-tests
artifacts: false # only need the dependency, not the test reports
Rules Over only/except
Stop using only and except. They're deprecated in spirit if not in docs.
deploy-staging:
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: on_success
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
when: manual
- when: never
Rules are evaluated top to bottom. First match wins. Always end with when: never as a catch-all.
Docker-in-Docker vs. Kaniko
Building Docker images inside GitLab CI is a common bottleneck. Docker-in-Docker (DinD) is the default, but Kaniko is faster and more secure.
Docker-in-Docker (Slow, Requires Privileged)
build-image:
stage: build
image: docker:24
services:
- docker:24-dind
variables:
DOCKER_TLS_CERTDIR: "/certs"
script:
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
DinD requires privileged: true on the runner. That's a security risk. It also starts a Docker daemon for every job — 10-15 seconds of overhead.
Kaniko (Faster, No Privileges)
build-image:
stage: build
image:
name: gcr.io/kaniko-project/executor:v1.22.0-debug
entrypoint: [""]
script:
- /kaniko/executor
--context $CI_PROJECT_DIR
--dockerfile $CI_PROJECT_DIR/Dockerfile
--destination $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
--cache=true
--cache-repo=$CI_REGISTRY_IMAGE/cache
Kaniko runs in userspace. No Docker daemon. No privileged mode. The --cache=true flag caches layers in your registry, so subsequent builds only rebuild changed layers. This alone saves 2-5 minutes per build.
Runner Configuration for Speed
Your runner setup matters as much as your .gitlab-ci.yml. Here are the optimizations that make the biggest difference.
Use Local Caches on Dedicated Runners
If you're running your own GitLab runners, local caching outperforms S3/GCS caches:
# /etc/gitlab-runner/config.toml
[[runners]]
[runners.cache]
Type = "local"
Path = "/opt/gitlab-runner/cache"
Shared = true
[runners.docker]
pull_policy = ["if-not-present"]
volumes = ["/opt/gitlab-runner/cache:/cache"]
pull_policy = "if-not-present" skips pulling images that already exist on the runner. For a custom base image used by every job, this saves 15-30 seconds per job.
Autoscaling Runners for Peak Hours
[[runners]]
[runners.machine]
IdleCount = 2
IdleTime = 600
MaxBuilds = 100
MachineDriver = "amazonec2"
MachineName = "gitlab-runner-%s"
MachineOptions = [
"amazonec2-instance-type=c5.2xlarge",
"amazonec2-region=us-east-1",
"amazonec2-spot-instance=true",
"amazonec2-spot-price=0.15"
]
Spot instances for CI runners save 60-70% over on-demand pricing. CI workloads are interruptible by design — a spot termination just retries the job.
Reducing Image Pull Times
Every job starts by pulling a Docker image. For large images, this dominates the pipeline.
variables:
# Use a lightweight base image
DEFAULT_IMAGE: alpine:3.19 # 7 MB vs. ubuntu:22.04 at 77 MB
# Pre-build custom images with your dependencies baked in
.python-base:
image: $CI_REGISTRY_IMAGE/ci-python:3.12
# This image contains: python 3.12, pip, pytest, ruff
# Built weekly by a scheduled pipeline
Bake your CI dependencies into a custom image. Instead of running pip install in every job, pull an image that already has everything installed. The one-time cost of maintaining the image pays for itself across hundreds of pipeline runs.
# Scheduled weekly: rebuild CI base images
rebuild-ci-images:
stage: build
rules:
- if: $CI_PIPELINE_SOURCE == "schedule"
script:
- docker build -t $CI_REGISTRY_IMAGE/ci-python:3.12 -f ci/Dockerfile.python .
- docker push $CI_REGISTRY_IMAGE/ci-python:3.12
Interruptible Jobs
When a new commit is pushed to the same branch, previous pipeline runs are wasted work. Mark jobs as interruptible:
default:
interruptible: true
deploy-staging:
stage: deploy
interruptible: false # Never cancel a deployment mid-way
Auto-cancel superseded pipelines in project settings under CI/CD > General pipelines. This prevents queue buildup during active development.
The Scorecard
| Optimization | Time Saved |
|---|---|
| Caching dependencies | ~4 min |
DAG with needs | ~3 min |
| Parallel test splitting (4x) | ~8 min |
| Artifact reuse | ~2 min |
| Smaller Docker images | ~2 min |
| Kaniko vs. DinD | ~2 min |
| Custom CI base images | ~1 min |
| Total | ~22 min |
From 25 minutes to 3-6. That's not a nice-to-have. That's the difference between developers waiting for CI and developers shipping code.
Common Pitfalls
Caching node_modules directly. Cache the npm/yarn/pip cache directory, not node_modules/. Direct node_modules caching can lead to stale or incompatible dependencies. npm ci --prefer-offline with a cached .cache/npm/ is safer.
Running all tests on every push. Feature branch pushes don't need the full E2E suite. Use rules to run expensive tests only on merge requests and main:
e2e-tests:
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == "main"
- when: never
Not setting artifact expiry. Default retention fills your storage. Artifacts from feature branch pipelines should expire in 1-3 days. Release artifacts in 30 days.
Ignoring pipeline metrics. Without data, you can't prove your optimizations work. Track pipeline duration over time.
One More Thing
Add this to your pipeline to track optimization over time:
pipeline-metrics:
stage: .post
script:
- echo "Pipeline duration ${CI_PIPELINE_DURATION}s"
- 'curl -s -X POST "$METRICS_URL/api/v1/import/prometheus" --data-binary "gitlab_pipeline_duration_seconds{project=\"$CI_PROJECT_NAME\"} $CI_PIPELINE_DURATION"'
when: always
If it's not measured, it drifts. Ship it, track it, keep it fast.
Related Articles
CI/CD Engineering Lead
Automation evangelist who believes no deployment should require a human. I write pipelines, break pipelines, and write about both. Code-first, always.
Related Articles
The Complete Guide to GitHub Actions CI/CD: From Zero to Production-Ready Pipelines
Build production-grade GitHub Actions CI/CD pipelines — from first workflow to reusable workflows, matrix builds, and deployment gates.
GitHub Actions Reusable Workflows and Composite Actions for DRY Pipelines
Eliminate duplicated CI/CD logic across repositories using GitHub Actions reusable workflows and composite actions with real-world examples.
Hardening GitHub Actions: Permissions, OIDC, and Pinned Actions
Harden GitHub Actions security with least-privilege permissions, OIDC federation, SHA-pinned actions, and secrets management best practices.