DevOpsil
GCP
92%
Fresh

GCP Core Services: The DevOps Engineer's Essential Guide

Aareez AsifAareez Asif25 min read

Google Cloud Platform has always been the engineer's cloud. Where AWS leads in breadth and Azure in enterprise integration, GCP wins on developer experience, data analytics, and Kubernetes -- which makes sense given that Google invented Kubernetes internally as Borg. If your organization is Kubernetes-first or data-heavy, GCP deserves serious consideration. This guide covers every core service a DevOps engineer needs, with real CLI examples, pricing details, architecture patterns, and the operational nuances that documentation often glosses over.

GCP Project Structure

GCP organizes resources in a hierarchy that looks simple but has important implications for billing, IAM, and policy inheritance. Understanding this hierarchy is foundational to everything else.

The Resource Hierarchy

Organization (company.com)
|-- Folders (optional, for departments or environments)
|   |-- Folder: Production
|   |   |-- Project: prod-web-app
|   |   +-- Project: prod-data-pipeline
|   |-- Folder: Staging
|   |   +-- Project: staging-web-app
|   +-- Folder: Shared
|       +-- Project: shared-networking
+-- Projects (can also exist directly under org)

Projects Are Everything

A project is the fundamental unit in GCP. Every resource belongs to a project. Projects provide:

  • Billing boundary -- each project links to a billing account.
  • IAM boundary -- permissions are granted at the project level (or above/below). Permissions granted at the folder or org level cascade down.
  • API enablement -- you explicitly enable APIs per project. This is a security feature: an unused API cannot be exploited.
  • Resource namespace -- resource names are unique within a project.
  • Quota management -- API quotas and resource limits are per-project.
# Create a new project
gcloud projects create prod-web-app-2026 \
  --name="Production Web App" \
  --folder=123456789 \
  --labels=env=production,team=platform

# Set your active project
gcloud config set project prod-web-app-2026

# Enable required APIs (only enable what you need)
gcloud services enable \
  compute.googleapis.com \
  container.googleapis.com \
  cloudbuild.googleapis.com \
  artifactregistry.googleapis.com \
  run.googleapis.com \
  monitoring.googleapis.com \
  logging.googleapis.com \
  secretmanager.googleapis.com \
  sqladmin.googleapis.com

# List enabled APIs
gcloud services list --enabled --format="table(config.name,config.title)"

Always enable only the APIs you need. Each enabled API is an attack surface and a potential cost vector. GCP's explicit API enablement model is more secure by default than AWS's approach where all services are available immediately.

Organization Policies

Organization policies are the GCP equivalent of AWS SCPs. They define constraints that cascade down the resource hierarchy.

# Restrict VM creation to specific regions
gcloud resource-manager org-policies set-policy \
  --project=prod-web-app-2026 \
  policy.yaml

# Where policy.yaml contains:
# constraint: constraints/compute.restrictLocations
# listPolicy:
#   allowedValues:
#     - us-central1
#     - europe-west1

Cross-Cloud Project Model Comparison

ConceptGCPAWSAzureAlibaba Cloud
Top-level containerOrganizationOrganizationEntra ID TenantResource Directory
Grouping mechanismFoldersOrganizational UnitsManagement GroupsFolders
Billing/isolation unitProjectAccountSubscriptionAccount
Policy enforcementOrganization PoliciesSCPsAzure PolicyConfig Rules
Resource groupingLabelsTagsResource GroupsResource Groups
API enablementExplicit per-projectAll availableResource providersExplicit activation

IAM: Identity and Access Management

GCP IAM uses a resource hierarchy model where permissions flow downward. A role granted at the organization level applies to every folder, project, and resource underneath. This inheritance model is powerful but requires careful planning to avoid over-permissioning.

Role Types

TypeExampleWhen to UseSecurity Level
Basicroles/viewer, roles/editor, roles/ownerAlmost never in production -- too broadLow
Predefinedroles/compute.instanceAdmin, roles/storage.objectViewerMost common -- scoped to a serviceMedium
Customroles/myCustomRoleWhen predefined roles grant too muchHigh

Basic roles are legacy and should be avoided in production. The roles/editor role, for example, grants write access to almost every GCP service. Use predefined roles instead, and create custom roles when even predefined roles are too permissive.

IAM Policy Bindings

GCP IAM uses a binding model: you bind a member (who) to a role (what) at a scope (where). Multiple bindings form a policy.

# Grant a user viewer access to a project
gcloud projects add-iam-policy-binding prod-web-app-2026 \
  --member="user:engineer@company.com" \
  --role="roles/viewer"

# Grant a group contributor access to a specific bucket
gcloud storage buckets add-iam-policy-binding gs://prod-app-data \
  --member="group:platform-team@company.com" \
  --role="roles/storage.objectAdmin"

# Grant conditional access (only from corporate network)
gcloud projects add-iam-policy-binding prod-web-app-2026 \
  --member="user:engineer@company.com" \
  --role="roles/compute.instanceAdmin.v1" \
  --condition='expression=request.time.getHours("America/New_York") >= 9 && request.time.getHours("America/New_York") <= 17,title=business-hours-only,description=Only allow access during business hours'

# List all IAM bindings for a project
gcloud projects get-iam-policy prod-web-app-2026 --format=yaml

Service Accounts

Service accounts are the identities for workloads -- VMs, Cloud Functions, CI/CD pipelines. They are the equivalent of AWS IAM roles. Each service account is identified by an email address.

# Create a service account
gcloud iam service-accounts create sa-web-backend \
  --display-name="Web Backend Service Account" \
  --project=prod-web-app-2026

# Grant the minimum required roles
gcloud projects add-iam-policy-binding prod-web-app-2026 \
  --member="serviceAccount:sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com" \
  --role="roles/storage.objectViewer"

gcloud projects add-iam-policy-binding prod-web-app-2026 \
  --member="serviceAccount:sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com" \
  --role="roles/cloudsql.client"

gcloud projects add-iam-policy-binding prod-web-app-2026 \
  --member="serviceAccount:sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com" \
  --role="roles/secretmanager.secretAccessor"

Critical rule: Never download service account keys (JSON key files) unless absolutely necessary. Use Workload Identity for GKE, attached service accounts for Compute Engine, and Workload Identity Federation for external CI/CD systems. Every downloaded key is a credential that can be leaked.

Workload Identity Federation

For CI/CD pipelines running outside GCP (GitHub Actions, GitLab CI, Jenkins), use Workload Identity Federation instead of service account keys. This eliminates the need to manage and rotate JSON key files.

# Create a Workload Identity Pool
gcloud iam workload-identity-pools create github-pool \
  --location="global" \
  --display-name="GitHub Actions Pool" \
  --description="Identity pool for GitHub Actions CI/CD"

# Create a provider for GitHub
gcloud iam workload-identity-pools providers create-oidc github-provider \
  --location="global" \
  --workload-identity-pool=github-pool \
  --display-name="GitHub Provider" \
  --attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository,attribute.actor=assertion.actor" \
  --attribute-condition="assertion.repository_owner == 'myorg'" \
  --issuer-uri="https://token.actions.githubusercontent.com"

# Allow the service account to be impersonated
gcloud iam service-accounts add-iam-policy-binding \
  sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="principalSet://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/github-pool/attribute.repository/myorg/myrepo"

This is a significantly better security model than downloading JSON keys and storing them as CI/CD secrets. The token exchange happens automatically, credentials are short-lived, and there are no static secrets to rotate.

IAM Recommender

GCP provides an IAM Recommender that analyzes actual usage patterns and suggests tighter roles. This is one of the best tools for enforcing least privilege over time.

# List IAM recommendations for a project
gcloud recommender recommendations list \
  --project=prod-web-app-2026 \
  --location=global \
  --recommender=google.iam.policy.Recommender \
  --format="table(content.operationGroups[0].operations[0].pathFilters)"

Compute Engine

Compute Engine is GCP's VM service. It is fast, flexible, and has one feature that sets it apart from AWS and Azure: live migration. Google can move your running VM to another physical host for maintenance without any downtime. This means fewer maintenance windows and higher effective availability.

Machine Types

FamilyUse CaseExampleOn-Demand (us-central1)
e2Cost-optimized, general purposee2-medium (2 vCPU, 4 GB)~$0.034/hr
n2/n2dBalanced, production workloadsn2-standard-4 (4 vCPU, 16 GB)~$0.194/hr
n4Latest gen, best price-performancen4-standard-4 (4 vCPU, 16 GB)~$0.170/hr
c2/c2dCompute optimizedc2-standard-8 (8 vCPU, 32 GB)~$0.334/hr
c3Latest compute optimizedc3-standard-8 (8 vCPU, 32 GB)~$0.320/hr
m2Memory optimizedm2-ultramem-208 (208 vCPU, 5.75 TB)~$42.18/hr
t2aArm-based (Ampere), cost savingst2a-standard-4 (4 vCPU, 16 GB)~$0.153/hr
t2dAMD-based, balancedt2d-standard-4 (4 vCPU, 16 GB)~$0.156/hr
a2/g2GPU workloadsg2-standard-4 (4 vCPU, 16 GB, 1 GPU)~$0.73/hr

Custom Machine Types

One of GCP's unique features is custom machine types. You specify exact vCPU and memory combinations, so you never overpay for resources you do not use.

# Create a VM with a custom machine type (4 vCPUs, 8 GB RAM)
gcloud compute instances create web-server-01 \
  --zone=us-central1-a \
  --machine-type=e2-custom-4-8192 \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=50GB \
  --boot-disk-type=pd-ssd \
  --network=vpc-production \
  --subnet=subnet-app \
  --service-account=sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com \
  --scopes=cloud-platform \
  --tags=http-server,https-server \
  --labels=env=production,team=platform \
  --metadata-from-file=startup-script=bootstrap.sh \
  --shielded-secure-boot \
  --shielded-vtpm

Pricing Model: Sustained Use Discounts

GCP automatically applies sustained use discounts (SUDs) to VMs that run for more than 25% of a month. No commitment required -- it happens automatically. The discount increases the longer the instance runs, up to 30% off for instances running the full month. This is unique to GCP; AWS and Azure require upfront commitments for equivalent savings.

Usage LevelDiscount
0-25% of month0% (full price)
25-50%~20% off
50-75%~20% off
75-100%~30% off
Full month effective~30% average savings

Committed Use Discounts (CUDs)

For predictable workloads, CUDs offer 1-year (37% savings) or 3-year (55% savings) commitments. Unlike AWS Reserved Instances, GCP CUDs are applied at the project level and can cover any machine type within the committed resource class.

# Purchase a committed use discount
gcloud compute commitments create my-commitment \
  --region=us-central1 \
  --plan=twelve-month \
  --resources=vcpu=100,memory=400GB \
  --type=GENERAL_PURPOSE

Preemptible and Spot VMs

Spot VMs (formerly preemptible VMs) cost 60-91% less than regular VMs but can be terminated with 30 seconds notice. Perfect for CI/CD build agents, batch processing, non-critical data processing, and GKE node pools for fault-tolerant workloads.

gcloud compute instances create build-agent-01 \
  --machine-type=c2-standard-8 \
  --provisioning-model=SPOT \
  --instance-termination-action=DELETE \
  --zone=us-central1-a \
  --metadata=shutdown-script='#!/bin/bash
    # Gracefully drain work before termination
    curl -X POST http://localhost:8080/drain'

Managed Instance Groups (MIGs)

MIGs are the auto-scaling mechanism for Compute Engine, equivalent to AWS ASGs:

# Create an instance template
gcloud compute instance-templates create web-template-v2 \
  --machine-type=n2-standard-2 \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=50 \
  --boot-disk-type=pd-ssd \
  --service-account=sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com \
  --tags=http-server \
  --metadata-from-file=startup-script=bootstrap.sh \
  --network=vpc-production \
  --subnet=subnet-app \
  --region=us-central1

# Create a regional MIG with autoscaling
gcloud compute instance-groups managed create web-mig \
  --template=web-template-v2 \
  --size=2 \
  --region=us-central1 \
  --target-distribution-shape=EVEN \
  --health-check=http-health-check \
  --initial-delay=120

gcloud compute instance-groups managed set-autoscaling web-mig \
  --region=us-central1 \
  --min-num-replicas=2 \
  --max-num-replicas=10 \
  --target-cpu-utilization=0.70 \
  --cool-down-period=120 \
  --scale-in-control max-scaled-in-replicas=2,time-window=300

# Rolling update to a new template
gcloud compute instance-groups managed rolling-action start-update web-mig \
  --version=template=web-template-v3 \
  --region=us-central1 \
  --max-surge=3 \
  --max-unavailable=0

Persistent Disk Types

Disk TypeMax IOPS (read)Max ThroughputUse CaseCost (per GB/mo)
pd-standard7,500400 MB/sBulk storage, backups~$0.040
pd-balanced80,0001,200 MB/sMost workloads~$0.100
pd-ssd100,0001,200 MB/sDatabases, latency-sensitive~$0.170
pd-extreme120,0002,400 MB/sTop-tier databases~$0.125 + IOPS
Hyperdisk Balanced160,0002,400 MB/sNext-gen balanced~$0.060 + IOPS + throughput
Local SSD900,0009,360 MB/sEphemeral high-perf~$0.080

VPC Networking

GCP VPCs are global by default -- a single VPC spans all regions. Subnets are regional. This is a fundamental architectural difference from AWS and Azure where VPCs/VNets are regional. A single GCP VPC can contain subnets in us-central1, europe-west1, and asia-east1, and they can all communicate privately without peering.

Network Architecture

VPC: vpc-production (global)
|-- subnet-web-us    (10.0.1.0/24)  -- us-central1
|-- subnet-app-us    (10.0.2.0/24)  -- us-central1
|-- subnet-data-us   (10.0.3.0/24)  -- us-central1
|-- subnet-web-eu    (10.10.1.0/24) -- europe-west1
|-- subnet-app-eu    (10.10.2.0/24) -- europe-west1
|-- subnet-gke-us    (10.0.16.0/20) -- us-central1
|   |-- Pod CIDR:     10.100.0.0/14  (secondary range)
|   +-- Service CIDR: 10.200.0.0/20  (secondary range)
+-- subnet-gke-eu    (10.10.16.0/20) -- europe-west1

Firewall Rules

GCP uses VPC-level firewall rules (not per-subnet like AWS/Azure). Rules are applied using network tags or service accounts as targets. This model is more flexible -- you can apply the same firewall rule to VMs across different subnets and regions within the same VPC.

# Allow HTTP/HTTPS to instances tagged 'http-server'
gcloud compute firewall-rules create allow-http-https \
  --network=vpc-production \
  --direction=INGRESS \
  --action=ALLOW \
  --rules=tcp:80,tcp:443 \
  --source-ranges=0.0.0.0/0 \
  --target-tags=http-server \
  --priority=1000 \
  --description="Allow HTTP and HTTPS from internet to web servers"

# Allow internal communication between app and data tiers
gcloud compute firewall-rules create allow-app-to-data \
  --network=vpc-production \
  --direction=INGRESS \
  --action=ALLOW \
  --rules=tcp:5432,tcp:6379,tcp:3306 \
  --source-tags=app-server \
  --target-tags=data-server \
  --priority=1000

# Service-account-based rules (more secure than tags)
gcloud compute firewall-rules create allow-backend-to-db \
  --network=vpc-production \
  --direction=INGRESS \
  --action=ALLOW \
  --rules=tcp:5432 \
  --source-service-accounts=sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com \
  --target-service-accounts=sa-database@prod-web-app-2026.iam.gserviceaccount.com \
  --priority=900

# Deny all other ingress (implicit, but making it explicit)
gcloud compute firewall-rules create deny-all-ingress \
  --network=vpc-production \
  --direction=INGRESS \
  --action=DENY \
  --rules=all \
  --source-ranges=0.0.0.0/0 \
  --priority=65534

# Enable firewall logging for troubleshooting
gcloud compute firewall-rules update allow-http-https \
  --enable-logging \
  --logging-metadata=INCLUDE_ALL_METADATA

Service-account-based firewall rules are more secure than tag-based rules because tags can be modified by anyone with compute.instances.setTags permission, while service accounts require iam.serviceAccounts.actAs permission.

Cloud NAT and Private Google Access

For private instances that need outbound internet access:

# Create a Cloud Router (required for Cloud NAT)
gcloud compute routers create router-production \
  --region=us-central1 \
  --network=vpc-production

# Create Cloud NAT
gcloud compute routers nats create nat-production \
  --router=router-production \
  --region=us-central1 \
  --nat-all-subnet-ip-ranges \
  --auto-allocate-nat-external-ips \
  --min-ports-per-vm=256 \
  --enable-logging

Enable Private Google Access on subnets to let VMs without external IPs reach Google APIs and services:

gcloud compute networks subnets update subnet-app-us \
  --region=us-central1 \
  --enable-private-ip-google-access

Shared VPC

For multi-project environments, Shared VPC lets you define the network in a host project and share subnets with service projects. This centralizes network management while allowing individual teams to manage their own resources.

# Enable shared VPC on the host project
gcloud compute shared-vpc enable shared-networking

# Associate a service project
gcloud compute shared-vpc associated-projects add prod-web-app-2026 \
  --host-project=shared-networking

Load Balancing

GCP offers a comprehensive load balancing portfolio:

TypeScopeLayerUse Case
External HTTP(S) LBGlobalL7Web apps, APIs, CDN integration
External TCP/UDP Network LBRegionalL4Non-HTTP traffic, gaming
Internal HTTP(S) LBRegionalL7Internal microservices
Internal TCP/UDP LBRegionalL4Internal databases, gRPC
Cross-region Internal LBGlobalL7Multi-region internal services

The Global HTTP(S) Load Balancer is one of GCP's strongest offerings. It provides a single anycast IP that routes traffic to the nearest healthy backend worldwide, with automatic SSL termination, Cloud CDN integration, and Cloud Armor (WAF/DDoS) built in.

Cloud Storage

Cloud Storage is GCP's object storage service, equivalent to S3. It uses a flat namespace with buckets and objects. Buckets can be regional, dual-region, or multi-region.

Storage Classes

ClassMin Storage DurationUse CaseMonthly Cost (per GB)
StandardNoneFrequently accessed~$0.020 (regional)
Nearline30 daysMonthly access~$0.010
Coldline90 daysQuarterly access~$0.004
Archive365 daysAnnual access or less~$0.0012

GCP charges for early deletion: if you store data in Nearline and delete it before 30 days, you pay for the full 30 days. Plan your storage class based on actual access patterns.

Bucket Location Types

Location TypeRedundancyLatencyUse Case
RegionalSingle region, multiple zonesLowest within regionApplication data, compute co-location
Dual-regionTwo specific regionsLow in both regionsDR between known regions
Multi-regionThree+ regions in a continentHigherGlobally accessed content
# Create a bucket with lifecycle rules
gcloud storage buckets create gs://prod-app-logs-2026 \
  --location=us-central1 \
  --default-storage-class=STANDARD \
  --uniform-bucket-level-access \
  --public-access-prevention=enforced \
  --soft-delete-duration=7d

# Set lifecycle policy
cat > lifecycle.json << 'EOF'
{
  "rule": [
    {
      "action": { "type": "SetStorageClass", "storageClass": "NEARLINE" },
      "condition": { "age": 30, "matchesStorageClass": ["STANDARD"] }
    },
    {
      "action": { "type": "SetStorageClass", "storageClass": "COLDLINE" },
      "condition": { "age": 90, "matchesStorageClass": ["NEARLINE"] }
    },
    {
      "action": { "type": "SetStorageClass", "storageClass": "ARCHIVE" },
      "condition": { "age": 365, "matchesStorageClass": ["COLDLINE"] }
    },
    {
      "action": { "type": "Delete" },
      "condition": { "age": 2555 }
    }
  ]
}
EOF

gcloud storage buckets update gs://prod-app-logs-2026 \
  --lifecycle-file=lifecycle.json

# Enable versioning for state files
gcloud storage buckets update gs://prod-terraform-state \
  --versioning

# Enable Object Lock for compliance
gcloud storage buckets update gs://prod-audit-logs \
  --retention-period=365d \
  --locked-retention-period

Always enable uniform bucket-level access to simplify permissions. It prevents the confusing mix of IAM and ACLs that plagues older buckets. Enable public access prevention unless you explicitly need public access.

gsutil vs gcloud storage

Google is transitioning from gsutil to gcloud storage for Cloud Storage operations. Use gcloud storage for new work:

# Copy files (gcloud storage)
gcloud storage cp ./dist/* gs://prod-app-data/assets/ --recursive

# Sync a directory
gcloud storage rsync ./build/ gs://prod-app-data/static/ --recursive --delete-unmatched-destination-objects

# Parallel composite uploads for large files
gcloud storage cp large-backup.tar.gz gs://prod-backups/ --no-clobber

Cloud SQL

Cloud SQL is GCP's managed relational database service, supporting MySQL, PostgreSQL, and SQL Server. It handles patching, backups, replication, and failover.

# Create a PostgreSQL instance
gcloud sql instances create prod-postgres \
  --database-version=POSTGRES_15 \
  --tier=db-custom-4-16384 \
  --region=us-central1 \
  --availability-type=REGIONAL \
  --storage-type=SSD \
  --storage-size=100 \
  --storage-auto-increase \
  --backup-start-time=03:00 \
  --enable-point-in-time-recovery \
  --retained-backups-count=14 \
  --maintenance-window-day=MON \
  --maintenance-window-hour=4 \
  --insights-config-query-insights-enabled \
  --root-password="$(gcloud secrets versions access latest --secret=db-root-password)" \
  --network=vpc-production \
  --no-assign-ip \
  --labels=env=production,team=platform

# Create a read replica
gcloud sql instances create prod-postgres-replica \
  --master-instance-name=prod-postgres \
  --database-version=POSTGRES_15 \
  --tier=db-custom-4-16384 \
  --region=us-central1

# Connect via Cloud SQL Auth Proxy (recommended for applications)
cloud-sql-proxy \
  --auto-iam-authn \
  prod-web-app-2026:us-central1:prod-postgres

Cloud SQL vs AlloyDB vs Cloud Spanner

FeatureCloud SQLAlloyDBCloud Spanner
EngineMySQL, PostgreSQL, SQL ServerPostgreSQL-compatibleProprietary
ScalingVertical (up to 96 vCPU)Vertical + read poolsHorizontal, global
Max storage64 TB128 TBUnlimited
Global distributionRead replicasRegionalMulti-region, strong consistency
Price (entry)~$50/mo (db-f1-micro)~$500/mo~$657/mo (1 node)
Best forStandard RDBMS workloadsHigh-perf PostgreSQLGlobal, financial-grade

GKE: Google Kubernetes Engine

GKE is where GCP truly shines. As the birthplace of Kubernetes, Google offers the most mature managed Kubernetes service with features that other clouds are still catching up on.

Autopilot vs Standard

FeatureAutopilotStandard
Node managementGoogle managesYou manage
Pod-level billingYesNo (pay for nodes)
Node configurationLimitedFull control
GPU supportYes (with reservations)Full
Security hardeningAutomatic (workload isolation, CIS benchmarks)Manual
Control plane costFreeFree (standard), $73/mo (enterprise)
Resource efficiencyOptimized by GoogleYou optimize
Best forMost workloadsSpecialized needs, custom kernels

Autopilot is the recommended choice for most teams. You define pods, GKE handles node provisioning, scaling, security hardening, and OS patching.

# Create an Autopilot cluster (recommended for most teams)
gcloud container clusters create-auto gke-production \
  --region=us-central1 \
  --release-channel=regular \
  --network=vpc-production \
  --subnetwork=subnet-gke-us \
  --cluster-secondary-range-name=pod-range \
  --services-secondary-range-name=service-range \
  --enable-master-authorized-networks \
  --master-authorized-networks=10.0.0.0/8 \
  --workload-pool=prod-web-app-2026.svc.id.goog \
  --enable-fleet

# Create a Standard cluster (when you need more control)
gcloud container clusters create gke-standard \
  --region=us-central1 \
  --num-nodes=1 \
  --machine-type=n2-standard-4 \
  --enable-autoscaling --min-nodes=1 --max-nodes=5 \
  --network=vpc-production \
  --subnetwork=subnet-gke-us \
  --cluster-secondary-range-name=pod-range \
  --services-secondary-range-name=service-range \
  --enable-ip-alias \
  --release-channel=regular \
  --workload-pool=prod-web-app-2026.svc.id.goog \
  --enable-network-policy \
  --enable-dataplane-v2 \
  --enable-shielded-nodes \
  --enable-autorepair \
  --enable-autoupgrade \
  --maintenance-window-start "2026-01-01T04:00:00Z" \
  --maintenance-window-end "2026-01-01T08:00:00Z" \
  --maintenance-window-recurrence "FREQ=WEEKLY;BYDAY=SA,SU"

# Get credentials
gcloud container clusters get-credentials gke-production --region=us-central1

# Add a Spot node pool for batch workloads
gcloud container node-pools create spot-pool \
  --cluster=gke-standard \
  --region=us-central1 \
  --machine-type=n2-standard-4 \
  --spot \
  --enable-autoscaling --min-nodes=0 --max-nodes=10 \
  --node-taints=cloud.google.com/gke-spot=true:NoSchedule \
  --node-labels=workload-type=batch

GKE Workload Identity

Workload Identity is the recommended way for pods to authenticate to GCP services. It maps Kubernetes service accounts to GCP service accounts, eliminating the need for service account keys.

# Create a Kubernetes service account
kubectl create serviceaccount app-backend --namespace=production

# Bind KSA to GSA
gcloud iam service-accounts add-iam-policy-binding \
  sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com \
  --role=roles/iam.workloadIdentityUser \
  --member="serviceAccount:prod-web-app-2026.svc.id.goog[production/app-backend]"

# Annotate the KSA
kubectl annotate serviceaccount app-backend \
  --namespace=production \
  iam.gke.io/gcp-service-account=sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com

GKE Cost Optimization

  • Use Autopilot to avoid paying for idle node capacity.
  • Use Spot VMs for fault-tolerant workloads (30-91% cheaper).
  • Enable Vertical Pod Autoscaler to right-size resource requests.
  • Use node auto-provisioning to let GKE choose optimal machine types.
  • Set resource requests and limits on every pod. Pods without requests waste capacity.

Cloud Build

Cloud Build is GCP's serverless CI/CD platform. It runs build steps as containers, making it highly flexible. You can use any Docker image as a build step, which means you can use any tool in your pipeline.

Pricing

Cloud Build charges $0.003 per build-minute for the first 120 minutes per day (free), then scales based on machine type:

Machine TypevCPUsRAMCost per minute
e2-medium14 GB$0.003
e2-highcpu-888 GB$0.016
e2-highcpu-323232 GB$0.064
# cloudbuild.yaml with multi-stage pipeline
steps:
  # Run tests
  - name: 'node:20'
    entrypoint: 'bash'
    args: ['-c', 'npm ci && npm run lint && npm test']

  # Build container image
  - name: 'gcr.io/cloud-builders/docker'
    args:
      - 'build'
      - '-t'
      - 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:$SHORT_SHA'
      - '-t'
      - 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:latest'
      - '--cache-from'
      - 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:latest'
      - '.'

  # Push to Artifact Registry
  - name: 'gcr.io/cloud-builders/docker'
    args:
      - 'push'
      - '--all-tags'
      - 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp'

  # Deploy to Cloud Run
  - name: 'gcr.io/cloud-builders/gcloud'
    args:
      - 'run'
      - 'deploy'
      - 'webapp'
      - '--image=us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:$SHORT_SHA'
      - '--region=us-central1'
      - '--platform=managed'
      - '--no-traffic'
    id: deploy-canary

  # Run smoke tests
  - name: 'curlimages/curl'
    entrypoint: 'sh'
    args:
      - '-c'
      - 'curl -sf "$(gcloud run services describe webapp --region=us-central1 --format="value(status.url)")/health" || exit 1'
    waitFor: ['deploy-canary']

  # Shift traffic to new revision
  - name: 'gcr.io/cloud-builders/gcloud'
    args:
      - 'run'
      - 'services'
      - 'update-traffic'
      - 'webapp'
      - '--region=us-central1'
      - '--to-latest'

options:
  machineType: 'E2_HIGHCPU_8'
  logging: CLOUD_LOGGING_ONLY

images:
  - 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:$SHORT_SHA'
# Submit a build manually
gcloud builds submit --config=cloudbuild.yaml .

# Set up a trigger for GitHub pushes
gcloud builds triggers create github \
  --name=deploy-on-push \
  --repo-owner=myorg \
  --repo-name=webapp \
  --branch-pattern="^main$" \
  --build-config=cloudbuild.yaml \
  --include-logs-with-status

Cloud Run

Cloud Run is GCP's serverless container platform. It takes a container image and runs it with automatic scaling, including scale-to-zero. It is simpler than Kubernetes and cheaper for many workloads. If your service does not need the complexity of GKE, Cloud Run should be your first choice.

Pricing

Cloud Run uses a pay-per-use model:

ResourceCostFree Tier
CPU$0.00002400 per vCPU-second180,000 vCPU-seconds/month
Memory$0.00000250 per GiB-second360,000 GiB-seconds/month
Requests$0.40 per million2 million requests/month

A service handling 1 million requests/month with 100ms average response time at 1 vCPU and 512MB RAM costs approximately $1-3/month. Compare that to running an always-on VM or EKS cluster.

# Deploy a container to Cloud Run
gcloud run deploy webapp \
  --image=us-central1-docker.pkg.dev/prod-web-app-2026/app-images/webapp:v1.2.3 \
  --region=us-central1 \
  --platform=managed \
  --port=8080 \
  --memory=512Mi \
  --cpu=1 \
  --min-instances=1 \
  --max-instances=100 \
  --concurrency=80 \
  --timeout=60s \
  --cpu-throttling \
  --service-account=sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com \
  --set-env-vars="NODE_ENV=production,LOG_LEVEL=info" \
  --set-secrets="DB_PASSWORD=db-password:latest" \
  --vpc-connector=connector-production \
  --vpc-egress=private-ranges-only \
  --ingress=all \
  --allow-unauthenticated

# Traffic splitting for canary deployments
gcloud run services update-traffic webapp \
  --region=us-central1 \
  --to-revisions=webapp-00005-abc=90,webapp-00006-def=10

# Map a custom domain
gcloud run domain-mappings create \
  --service=webapp \
  --domain=app.example.com \
  --region=us-central1

Cloud Run vs Cloud Functions vs GKE

FeatureCloud RunCloud FunctionsGKE
Unit of deploymentContainerFunction codePods (containers)
Scaling0 to 1000 instances0 to 3000 instancesNode-level
Max request timeout60 min9 min (2nd gen)Unlimited
Minimum instances0 (scale to zero)01 node
ConcurrencyUp to 1000 requests/instance1 (1st gen), configurable (2nd gen)Unlimited
PricingPer-second CPU/memoryPer-invocation + durationPer-node (always-on)
ComplexityLowLowestHigh
Best forAPIs, web appsEvent handlers, webhooksComplex microservices, stateful

Artifact Registry

Artifact Registry replaces Container Registry (gcr.io). It supports Docker images, npm, Maven, Python, Go, Apt, and Yum packages.

# Create a Docker repository
gcloud artifacts repositories create app-images \
  --repository-format=docker \
  --location=us-central1 \
  --description="Production application images" \
  --immutable-tags

# Create an npm repository
gcloud artifacts repositories create npm-packages \
  --repository-format=npm \
  --location=us-central1 \
  --description="Internal npm packages"

# Configure Docker to authenticate
gcloud auth configure-docker us-central1-docker.pkg.dev

# Push an image
docker tag webapp:latest us-central1-docker.pkg.dev/prod-web-app-2026/app-images/webapp:v1.2.3
docker push us-central1-docker.pkg.dev/prod-web-app-2026/app-images/webapp:v1.2.3

# Set up vulnerability scanning
gcloud artifacts docker images list \
  us-central1-docker.pkg.dev/prod-web-app-2026/app-images \
  --show-occurrences \
  --format="table(package,version,createTime)"

# Clean up old images
gcloud artifacts docker images delete \
  us-central1-docker.pkg.dev/prod-web-app-2026/app-images/webapp:old-tag \
  --delete-tags --quiet

Secret Manager

Secret Manager stores API keys, passwords, certificates, and other sensitive data with automatic versioning and IAM-based access control.

# Create a secret
echo -n "my-db-password" | gcloud secrets create db-password --data-file=-

# Access the latest version
gcloud secrets versions access latest --secret=db-password

# Add a new version
echo -n "new-password" | gcloud secrets versions add db-password --data-file=-

# Set up automatic rotation notification
gcloud secrets update db-password \
  --topics=projects/prod-web-app-2026/topics/secret-rotation \
  --next-rotation-time="2026-06-24T00:00:00Z" \
  --rotation-period=7776000s

gcloud CLI Essentials

The gcloud CLI is well-structured and consistent. Master these patterns:

# Authentication
gcloud auth login                          # Interactive login
gcloud auth application-default login      # For local development (SDK auth)
gcloud auth print-access-token             # Get current access token

# Configuration and profiles
gcloud config set project prod-web-app-2026
gcloud config set compute/region us-central1
gcloud config set compute/zone us-central1-a

# Named configurations (like AWS profiles)
gcloud config configurations create production
gcloud config configurations activate production
gcloud config configurations list

# Common operations with filtering
gcloud compute instances list \
  --filter="status=RUNNING AND labels.env=production" \
  --format="table(name,zone,machineType.basename(),networkInterfaces[0].networkIP)"

gcloud container clusters list \
  --format="table(name,location,currentMasterVersion,status,currentNodeCount)"

gcloud run services list --platform=managed \
  --format="table(SERVICE,REGION,URL,LAST_DEPLOYED_BY)"

# Output formats
gcloud compute instances list --format="json" | jq '.[].name'
gcloud compute instances list --format="csv(name,zone,status)"
gcloud compute instances list --format="value(name)" # Just values, one per line

# Impersonate a service account (for testing permissions)
gcloud auth print-access-token \
  --impersonate-service-account=sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com

Cost Management

GCP provides several tools for cost visibility and optimization:

# Export billing data to BigQuery for analysis
bq mk --dataset prod-web-app-2026:billing_export

# Use BigQuery to analyze costs
bq query --use_legacy_sql=false '
  SELECT
    service.description,
    SUM(cost) as total_cost,
    SUM(usage.amount) as total_usage,
    usage.unit
  FROM `prod-web-app-2026.billing_export.gcp_billing_export`
  WHERE DATE(usage_start_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY service.description, usage.unit
  ORDER BY total_cost DESC
  LIMIT 20
'

Cost Optimization Strategies

  1. Sustained Use Discounts -- automatic, no action needed. Applies to N1, N2, and E2 instances.
  2. Committed Use Discounts -- 37% (1-year) or 55% (3-year) for predictable workloads.
  3. Spot VMs -- 60-91% savings for interruptible workloads.
  4. Autopilot for GKE -- pay only for pod resources, no idle node waste.
  5. Cloud Run scale-to-zero -- no cost when no traffic.
  6. Right-sizing recommendations -- Compute Engine Recommender suggests optimal machine types.
  7. Storage lifecycle policies -- automatically transition data to cheaper classes.
  8. Budget alerts -- set budgets with email and Pub/Sub notifications.
  9. Billing export to BigQuery -- query your billing data with SQL for deep analysis.
  10. Active Assist -- GCP's umbrella for all optimization recommendations.

Migration Considerations

When migrating to GCP from other clouds or on-premises:

  • Migrate to Virtual Machines -- replicates VMs from on-premises (VMware, AWS, Azure) to Compute Engine.
  • Database Migration Service -- supports MySQL, PostgreSQL, SQL Server, and Oracle to Cloud SQL or AlloyDB.
  • Transfer Appliance -- physical device for large-scale data transfers (like AWS Snowball).
  • Storage Transfer Service -- transfers data from S3, Azure Blob, or on-premises to Cloud Storage.
  • BigQuery Data Transfer Service -- automates data movement from SaaS platforms into BigQuery.
  • Anthos -- run GKE clusters on-premises, on AWS, or on Azure with a consistent management plane.

GCP rewards engineers who invest in understanding its project model and IAM system. The developer experience is excellent -- services are well-integrated, the CLI is consistent, and the documentation is some of the best in the industry. If you are building on Kubernetes, need strong data analytics capabilities, or value automatic cost optimizations like sustained use discounts, GCP is a compelling choice that continues to close the gap with AWS in breadth while maintaining its lead in developer satisfaction.

Share:
Aareez Asif
Aareez Asif

Senior Kubernetes Architect

10+ years orchestrating containers in production. Battle-tested opinions on everything from pod scheduling to service mesh. I've seen clusters burn and helped rebuild them better.

Related Articles