GCP Core Services: The DevOps Engineer's Essential Guide

Google Cloud Platform has always been the engineer's cloud. Where AWS leads in breadth and Azure in enterprise integration, GCP wins on developer experience, data analytics, and Kubernetes -- which makes sense given that Google invented Kubernetes internally as Borg. If your organization is Kubernetes-first or data-heavy, GCP deserves serious consideration. This guide covers every core service a DevOps engineer needs, with real CLI examples, pricing details, architecture patterns, and the operational nuances that documentation often glosses over.

GCP Project Structure

GCP organizes resources in a hierarchy that looks simple but has important implications for billing, IAM, and policy inheritance. Understanding this hierarchy is foundational to everything else.

The Resource Hierarchy

Organization (company.com)
|-- Folders (optional, for departments or environments)
|   |-- Folder: Production
|   |   |-- Project: prod-web-app
|   |   +-- Project: prod-data-pipeline
|   |-- Folder: Staging
|   |   +-- Project: staging-web-app
|   +-- Folder: Shared
|       +-- Project: shared-networking
+-- Projects (can also exist directly under org)

Projects Are Everything

A project is the fundamental unit in GCP. Every resource belongs to a project. Projects provide:

Billing boundary -- each project links to a billing account.
IAM boundary -- permissions are granted at the project level (or above/below). Permissions granted at the folder or org level cascade down.
API enablement -- you explicitly enable APIs per project. This is a security feature: an unused API cannot be exploited.
Resource namespace -- resource names are unique within a project.
Quota management -- API quotas and resource limits are per-project.

# Create a new project
gcloud projects create prod-web-app-2026 \
  --name="Production Web App" \
  --folder=123456789 \
  --labels=env=production,team=platform

# Set your active project
gcloud config set project prod-web-app-2026

# Enable required APIs (only enable what you need)
gcloud services enable \
  compute.googleapis.com \
  container.googleapis.com \
  cloudbuild.googleapis.com \
  artifactregistry.googleapis.com \
  run.googleapis.com \
  monitoring.googleapis.com \
  logging.googleapis.com \
  secretmanager.googleapis.com \
  sqladmin.googleapis.com

# List enabled APIs
gcloud services list --enabled --format="table(config.name,config.title)"

Always enable only the APIs you need. Each enabled API is an attack surface and a potential cost vector. GCP's explicit API enablement model is more secure by default than AWS's approach where all services are available immediately.

Organization Policies

Organization policies are the GCP equivalent of AWS SCPs. They define constraints that cascade down the resource hierarchy.

# Restrict VM creation to specific regions
gcloud resource-manager org-policies set-policy \
  --project=prod-web-app-2026 \
  policy.yaml

# Where policy.yaml contains:
# constraint: constraints/compute.restrictLocations
# listPolicy:
#   allowedValues:
#     - us-central1
#     - europe-west1

Cross-Cloud Project Model Comparison

Concept	GCP	AWS	Azure	Alibaba Cloud
Top-level container	Organization	Organization	Entra ID Tenant	Resource Directory
Grouping mechanism	Folders	Organizational Units	Management Groups	Folders
Billing/isolation unit	Project	Account	Subscription	Account
Policy enforcement	Organization Policies	SCPs	Azure Policy	Config Rules
Resource grouping	Labels	Tags	Resource Groups	Resource Groups
API enablement	Explicit per-project	All available	Resource providers	Explicit activation

IAM: Identity and Access Management

GCP IAM uses a resource hierarchy model where permissions flow downward. A role granted at the organization level applies to every folder, project, and resource underneath. This inheritance model is powerful but requires careful planning to avoid over-permissioning.

Role Types

Type	Example	When to Use	Security Level
Basic	roles/viewer, roles/editor, roles/owner	Almost never in production -- too broad	Low
Predefined	roles/compute.instanceAdmin, roles/storage.objectViewer	Most common -- scoped to a service	Medium
Custom	roles/myCustomRole	When predefined roles grant too much	High

Basic roles are legacy and should be avoided in production. The roles/editor role, for example, grants write access to almost every GCP service. Use predefined roles instead, and create custom roles when even predefined roles are too permissive.

IAM Policy Bindings

GCP IAM uses a binding model: you bind a member (who) to a role (what) at a scope (where). Multiple bindings form a policy.

# Grant a user viewer access to a project
gcloud projects add-iam-policy-binding prod-web-app-2026 \
  --member="user:[email protected]" \
  --role="roles/viewer"

# Grant a group contributor access to a specific bucket
gcloud storage buckets add-iam-policy-binding gs://prod-app-data \
  --member="group:[email protected]" \
  --role="roles/storage.objectAdmin"

# Grant conditional access (only from corporate network)
gcloud projects add-iam-policy-binding prod-web-app-2026 \
  --member="user:[email protected]" \
  --role="roles/compute.instanceAdmin.v1" \
  --condition='expression=request.time.getHours("America/New_York") >= 9 && request.time.getHours("America/New_York") <= 17,title=business-hours-only,description=Only allow access during business hours'

# List all IAM bindings for a project
gcloud projects get-iam-policy prod-web-app-2026 --format=yaml

Service Accounts

Service accounts are the identities for workloads -- VMs, Cloud Functions, CI/CD pipelines. They are the equivalent of AWS IAM roles. Each service account is identified by an email address.

# Create a service account
gcloud iam service-accounts create sa-web-backend \
  --display-name="Web Backend Service Account" \
  --project=prod-web-app-2026

# Grant the minimum required roles
gcloud projects add-iam-policy-binding prod-web-app-2026 \
  --member="serviceAccount:[email protected]" \
  --role="roles/storage.objectViewer"

gcloud projects add-iam-policy-binding prod-web-app-2026 \
  --member="serviceAccount:[email protected]" \
  --role="roles/cloudsql.client"

gcloud projects add-iam-policy-binding prod-web-app-2026 \
  --member="serviceAccount:[email protected]" \
  --role="roles/secretmanager.secretAccessor"

Critical rule: Never download service account keys (JSON key files) unless absolutely necessary. Use Workload Identity for GKE, attached service accounts for Compute Engine, and Workload Identity Federation for external CI/CD systems. Every downloaded key is a credential that can be leaked.

Workload Identity Federation

For CI/CD pipelines running outside GCP (GitHub Actions, GitLab CI, Jenkins), use Workload Identity Federation instead of service account keys. This eliminates the need to manage and rotate JSON key files.

# Create a Workload Identity Pool
gcloud iam workload-identity-pools create github-pool \
  --location="global" \
  --display-name="GitHub Actions Pool" \
  --description="Identity pool for GitHub Actions CI/CD"

# Create a provider for GitHub
gcloud iam workload-identity-pools providers create-oidc github-provider \
  --location="global" \
  --workload-identity-pool=github-pool \
  --display-name="GitHub Provider" \
  --attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository,attribute.actor=assertion.actor" \
  --attribute-condition="assertion.repository_owner == 'myorg'" \
  --issuer-uri="https://token.actions.githubusercontent.com"

# Allow the service account to be impersonated
gcloud iam service-accounts add-iam-policy-binding \
  [email protected] \
  --role="roles/iam.workloadIdentityUser" \
  --member="principalSet://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/github-pool/attribute.repository/myorg/myrepo"

This is a significantly better security model than downloading JSON keys and storing them as CI/CD secrets. The token exchange happens automatically, credentials are short-lived, and there are no static secrets to rotate.

IAM Recommender

GCP provides an IAM Recommender that analyzes actual usage patterns and suggests tighter roles. This is one of the best tools for enforcing least privilege over time.

# List IAM recommendations for a project
gcloud recommender recommendations list \
  --project=prod-web-app-2026 \
  --location=global \
  --recommender=google.iam.policy.Recommender \
  --format="table(content.operationGroups[0].operations[0].pathFilters)"

Compute Engine

Compute Engine is GCP's VM service. It is fast, flexible, and has one feature that sets it apart from AWS and Azure: live migration. Google can move your running VM to another physical host for maintenance without any downtime. This means fewer maintenance windows and higher effective availability.

Machine Types

Family	Use Case	Example	On-Demand (us-central1)
e2	Cost-optimized, general purpose	e2-medium (2 vCPU, 4 GB)	~$0.034/hr
n2/n2d	Balanced, production workloads	n2-standard-4 (4 vCPU, 16 GB)	~$0.194/hr
n4	Latest gen, best price-performance	n4-standard-4 (4 vCPU, 16 GB)	~$0.170/hr
c2/c2d	Compute optimized	c2-standard-8 (8 vCPU, 32 GB)	~$0.334/hr
c3	Latest compute optimized	c3-standard-8 (8 vCPU, 32 GB)	~$0.320/hr
m2	Memory optimized	m2-ultramem-208 (208 vCPU, 5.75 TB)	~$42.18/hr
t2a	Arm-based (Ampere), cost savings	t2a-standard-4 (4 vCPU, 16 GB)	~$0.153/hr
t2d	AMD-based, balanced	t2d-standard-4 (4 vCPU, 16 GB)	~$0.156/hr
a2/g2	GPU workloads	g2-standard-4 (4 vCPU, 16 GB, 1 GPU)	~$0.73/hr

Custom Machine Types

One of GCP's unique features is custom machine types. You specify exact vCPU and memory combinations, so you never overpay for resources you do not use.

# Create a VM with a custom machine type (4 vCPUs, 8 GB RAM)
gcloud compute instances create web-server-01 \
  --zone=us-central1-a \
  --machine-type=e2-custom-4-8192 \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=50GB \
  --boot-disk-type=pd-ssd \
  --network=vpc-production \
  --subnet=subnet-app \
  --service-account=[email protected] \
  --scopes=cloud-platform \
  --tags=http-server,https-server \
  --labels=env=production,team=platform \
  --metadata-from-file=startup-script=bootstrap.sh \
  --shielded-secure-boot \
  --shielded-vtpm

Pricing Model: Sustained Use Discounts

GCP automatically applies sustained use discounts (SUDs) to VMs that run for more than 25% of a month. No commitment required -- it happens automatically. The discount increases the longer the instance runs, up to 30% off for instances running the full month. This is unique to GCP; AWS and Azure require upfront commitments for equivalent savings.

Usage Level	Discount
0-25% of month	0% (full price)
25-50%	~20% off
50-75%	~20% off
75-100%	~30% off
Full month effective	~30% average savings

Committed Use Discounts (CUDs)

For predictable workloads, CUDs offer 1-year (37% savings) or 3-year (55% savings) commitments. Unlike AWS Reserved Instances, GCP CUDs are applied at the project level and can cover any machine type within the committed resource class.

# Purchase a committed use discount
gcloud compute commitments create my-commitment \
  --region=us-central1 \
  --plan=twelve-month \
  --resources=vcpu=100,memory=400GB \
  --type=GENERAL_PURPOSE

Preemptible and Spot VMs

Spot VMs (formerly preemptible VMs) cost 60-91% less than regular VMs but can be terminated with 30 seconds notice. Perfect for CI/CD build agents, batch processing, non-critical data processing, and GKE node pools for fault-tolerant workloads.

gcloud compute instances create build-agent-01 \
  --machine-type=c2-standard-8 \
  --provisioning-model=SPOT \
  --instance-termination-action=DELETE \
  --zone=us-central1-a \
  --metadata=shutdown-script='#!/bin/bash
    # Gracefully drain work before termination
    curl -X POST http://localhost:8080/drain'

Managed Instance Groups (MIGs)

MIGs are the auto-scaling mechanism for Compute Engine, equivalent to AWS ASGs:

# Create an instance template
gcloud compute instance-templates create web-template-v2 \
  --machine-type=n2-standard-2 \
  --image-family=ubuntu-2204-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=50 \
  --boot-disk-type=pd-ssd \
  --service-account=[email protected] \
  --tags=http-server \
  --metadata-from-file=startup-script=bootstrap.sh \
  --network=vpc-production \
  --subnet=subnet-app \
  --region=us-central1

# Create a regional MIG with autoscaling
gcloud compute instance-groups managed create web-mig \
  --template=web-template-v2 \
  --size=2 \
  --region=us-central1 \
  --target-distribution-shape=EVEN \
  --health-check=http-health-check \
  --initial-delay=120

gcloud compute instance-groups managed set-autoscaling web-mig \
  --region=us-central1 \
  --min-num-replicas=2 \
  --max-num-replicas=10 \
  --target-cpu-utilization=0.70 \
  --cool-down-period=120 \
  --scale-in-control max-scaled-in-replicas=2,time-window=300

# Rolling update to a new template
gcloud compute instance-groups managed rolling-action start-update web-mig \
  --version=template=web-template-v3 \
  --region=us-central1 \
  --max-surge=3 \
  --max-unavailable=0

Persistent Disk Types

Disk Type	Max IOPS (read)	Max Throughput	Use Case	Cost (per GB/mo)
pd-standard	7,500	400 MB/s	Bulk storage, backups	~$0.040
pd-balanced	80,000	1,200 MB/s	Most workloads	~$0.100
pd-ssd	100,000	1,200 MB/s	Databases, latency-sensitive	~$0.170
pd-extreme	120,000	2,400 MB/s	Top-tier databases	~$0.125 + IOPS
Hyperdisk Balanced	160,000	2,400 MB/s	Next-gen balanced	~$0.060 + IOPS + throughput
Local SSD	900,000	9,360 MB/s	Ephemeral high-perf	~$0.080

VPC Networking

GCP VPCs are global by default -- a single VPC spans all regions. Subnets are regional. This is a fundamental architectural difference from AWS and Azure where VPCs/VNets are regional. A single GCP VPC can contain subnets in us-central1, europe-west1, and asia-east1, and they can all communicate privately without peering.

Network Architecture

VPC: vpc-production (global)
|-- subnet-web-us    (10.0.1.0/24)  -- us-central1
|-- subnet-app-us    (10.0.2.0/24)  -- us-central1
|-- subnet-data-us   (10.0.3.0/24)  -- us-central1
|-- subnet-web-eu    (10.10.1.0/24) -- europe-west1
|-- subnet-app-eu    (10.10.2.0/24) -- europe-west1
|-- subnet-gke-us    (10.0.16.0/20) -- us-central1
|   |-- Pod CIDR:     10.100.0.0/14  (secondary range)
|   +-- Service CIDR: 10.200.0.0/20  (secondary range)
+-- subnet-gke-eu    (10.10.16.0/20) -- europe-west1

Firewall Rules

GCP uses VPC-level firewall rules (not per-subnet like AWS/Azure). Rules are applied using network tags or service accounts as targets. This model is more flexible -- you can apply the same firewall rule to VMs across different subnets and regions within the same VPC.

# Allow HTTP/HTTPS to instances tagged 'http-server'
gcloud compute firewall-rules create allow-http-https \
  --network=vpc-production \
  --direction=INGRESS \
  --action=ALLOW \
  --rules=tcp:80,tcp:443 \
  --source-ranges=0.0.0.0/0 \
  --target-tags=http-server \
  --priority=1000 \
  --description="Allow HTTP and HTTPS from internet to web servers"

# Allow internal communication between app and data tiers
gcloud compute firewall-rules create allow-app-to-data \
  --network=vpc-production \
  --direction=INGRESS \
  --action=ALLOW \
  --rules=tcp:5432,tcp:6379,tcp:3306 \
  --source-tags=app-server \
  --target-tags=data-server \
  --priority=1000

# Service-account-based rules (more secure than tags)
gcloud compute firewall-rules create allow-backend-to-db \
  --network=vpc-production \
  --direction=INGRESS \
  --action=ALLOW \
  --rules=tcp:5432 \
  --source-service-accounts=[email protected] \
  --target-service-accounts=[email protected] \
  --priority=900

# Deny all other ingress (implicit, but making it explicit)
gcloud compute firewall-rules create deny-all-ingress \
  --network=vpc-production \
  --direction=INGRESS \
  --action=DENY \
  --rules=all \
  --source-ranges=0.0.0.0/0 \
  --priority=65534

# Enable firewall logging for troubleshooting
gcloud compute firewall-rules update allow-http-https \
  --enable-logging \
  --logging-metadata=INCLUDE_ALL_METADATA

Service-account-based firewall rules are more secure than tag-based rules because tags can be modified by anyone with compute.instances.setTags permission, while service accounts require iam.serviceAccounts.actAs permission.

Cloud NAT and Private Google Access

For private instances that need outbound internet access:

# Create a Cloud Router (required for Cloud NAT)
gcloud compute routers create router-production \
  --region=us-central1 \
  --network=vpc-production

# Create Cloud NAT
gcloud compute routers nats create nat-production \
  --router=router-production \
  --region=us-central1 \
  --nat-all-subnet-ip-ranges \
  --auto-allocate-nat-external-ips \
  --min-ports-per-vm=256 \
  --enable-logging

Enable Private Google Access on subnets to let VMs without external IPs reach Google APIs and services:

gcloud compute networks subnets update subnet-app-us \
  --region=us-central1 \
  --enable-private-ip-google-access

Shared VPC

For multi-project environments, Shared VPC lets you define the network in a host project and share subnets with service projects. This centralizes network management while allowing individual teams to manage their own resources.

# Enable shared VPC on the host project
gcloud compute shared-vpc enable shared-networking

# Associate a service project
gcloud compute shared-vpc associated-projects add prod-web-app-2026 \
  --host-project=shared-networking

Load Balancing

GCP offers a comprehensive load balancing portfolio:

Type	Scope	Layer	Use Case
External HTTP(S) LB	Global	L7	Web apps, APIs, CDN integration
External TCP/UDP Network LB	Regional	L4	Non-HTTP traffic, gaming
Internal HTTP(S) LB	Regional	L7	Internal microservices
Internal TCP/UDP LB	Regional	L4	Internal databases, gRPC
Cross-region Internal LB	Global	L7	Multi-region internal services

The Global HTTP(S) Load Balancer is one of GCP's strongest offerings. It provides a single anycast IP that routes traffic to the nearest healthy backend worldwide, with automatic SSL termination, Cloud CDN integration, and Cloud Armor (WAF/DDoS) built in.

Cloud Storage

Cloud Storage is GCP's object storage service, equivalent to S3. It uses a flat namespace with buckets and objects. Buckets can be regional, dual-region, or multi-region.

Storage Classes

Class	Min Storage Duration	Use Case	Monthly Cost (per GB)
Standard	None	Frequently accessed	~$0.020 (regional)
Nearline	30 days	Monthly access	~$0.010
Coldline	90 days	Quarterly access	~$0.004
Archive	365 days	Annual access or less	~$0.0012

GCP charges for early deletion: if you store data in Nearline and delete it before 30 days, you pay for the full 30 days. Plan your storage class based on actual access patterns.

Bucket Location Types

Location Type	Redundancy	Latency	Use Case
Regional	Single region, multiple zones	Lowest within region	Application data, compute co-location
Dual-region	Two specific regions	Low in both regions	DR between known regions
Multi-region	Three+ regions in a continent	Higher	Globally accessed content

# Create a bucket with lifecycle rules
gcloud storage buckets create gs://prod-app-logs-2026 \
  --location=us-central1 \
  --default-storage-class=STANDARD \
  --uniform-bucket-level-access \
  --public-access-prevention=enforced \
  --soft-delete-duration=7d

# Set lifecycle policy
cat > lifecycle.json << 'EOF'
{
  "rule": [
    {
      "action": { "type": "SetStorageClass", "storageClass": "NEARLINE" },
      "condition": { "age": 30, "matchesStorageClass": ["STANDARD"] }
    },
    {
      "action": { "type": "SetStorageClass", "storageClass": "COLDLINE" },
      "condition": { "age": 90, "matchesStorageClass": ["NEARLINE"] }
    },
    {
      "action": { "type": "SetStorageClass", "storageClass": "ARCHIVE" },
      "condition": { "age": 365, "matchesStorageClass": ["COLDLINE"] }
    },
    {
      "action": { "type": "Delete" },
      "condition": { "age": 2555 }
    }
  ]
}
EOF

gcloud storage buckets update gs://prod-app-logs-2026 \
  --lifecycle-file=lifecycle.json

# Enable versioning for state files
gcloud storage buckets update gs://prod-terraform-state \
  --versioning

# Enable Object Lock for compliance
gcloud storage buckets update gs://prod-audit-logs \
  --retention-period=365d \
  --locked-retention-period

Always enable uniform bucket-level access to simplify permissions. It prevents the confusing mix of IAM and ACLs that plagues older buckets. Enable public access prevention unless you explicitly need public access.

gsutil vs gcloud storage

Google is transitioning from gsutil to gcloud storage for Cloud Storage operations. Use gcloud storage for new work:

# Copy files (gcloud storage)
gcloud storage cp ./dist/* gs://prod-app-data/assets/ --recursive

# Sync a directory
gcloud storage rsync ./build/ gs://prod-app-data/static/ --recursive --delete-unmatched-destination-objects

# Parallel composite uploads for large files
gcloud storage cp large-backup.tar.gz gs://prod-backups/ --no-clobber

Cloud SQL

Cloud SQL is GCP's managed relational database service, supporting MySQL, PostgreSQL, and SQL Server. It handles patching, backups, replication, and failover.

# Create a PostgreSQL instance
gcloud sql instances create prod-postgres \
  --database-version=POSTGRES_15 \
  --tier=db-custom-4-16384 \
  --region=us-central1 \
  --availability-type=REGIONAL \
  --storage-type=SSD \
  --storage-size=100 \
  --storage-auto-increase \
  --backup-start-time=03:00 \
  --enable-point-in-time-recovery \
  --retained-backups-count=14 \
  --maintenance-window-day=MON \
  --maintenance-window-hour=4 \
  --insights-config-query-insights-enabled \
  --root-password="$(gcloud secrets versions access latest --secret=db-root-password)" \
  --network=vpc-production \
  --no-assign-ip \
  --labels=env=production,team=platform

# Create a read replica
gcloud sql instances create prod-postgres-replica \
  --master-instance-name=prod-postgres \
  --database-version=POSTGRES_15 \
  --tier=db-custom-4-16384 \
  --region=us-central1

# Connect via Cloud SQL Auth Proxy (recommended for applications)
cloud-sql-proxy \
  --auto-iam-authn \
  prod-web-app-2026:us-central1:prod-postgres

Cloud SQL vs AlloyDB vs Cloud Spanner

Feature	Cloud SQL	AlloyDB	Cloud Spanner
Engine	MySQL, PostgreSQL, SQL Server	PostgreSQL-compatible	Proprietary
Scaling	Vertical (up to 96 vCPU)	Vertical + read pools	Horizontal, global
Max storage	64 TB	128 TB	Unlimited
Global distribution	Read replicas	Regional	Multi-region, strong consistency
Price (entry)	~$50/mo (db-f1-micro)	~$500/mo	~$657/mo (1 node)
Best for	Standard RDBMS workloads	High-perf PostgreSQL	Global, financial-grade

GKE: Google Kubernetes Engine

GKE is where GCP truly shines. As the birthplace of Kubernetes, Google offers the most mature managed Kubernetes service with features that other clouds are still catching up on.

Autopilot vs Standard

Feature	Autopilot	Standard
Node management	Google manages	You manage
Pod-level billing	Yes	No (pay for nodes)
Node configuration	Limited	Full control
GPU support	Yes (with reservations)	Full
Security hardening	Automatic (workload isolation, CIS benchmarks)	Manual
Control plane cost	Free	Free (standard), $73/mo (enterprise)
Resource efficiency	Optimized by Google	You optimize
Best for	Most workloads	Specialized needs, custom kernels

Autopilot is the recommended choice for most teams. You define pods, GKE handles node provisioning, scaling, security hardening, and OS patching.

# Create an Autopilot cluster (recommended for most teams)
gcloud container clusters create-auto gke-production \
  --region=us-central1 \
  --release-channel=regular \
  --network=vpc-production \
  --subnetwork=subnet-gke-us \
  --cluster-secondary-range-name=pod-range \
  --services-secondary-range-name=service-range \
  --enable-master-authorized-networks \
  --master-authorized-networks=10.0.0.0/8 \
  --workload-pool=prod-web-app-2026.svc.id.goog \
  --enable-fleet

# Create a Standard cluster (when you need more control)
gcloud container clusters create gke-standard \
  --region=us-central1 \
  --num-nodes=1 \
  --machine-type=n2-standard-4 \
  --enable-autoscaling --min-nodes=1 --max-nodes=5 \
  --network=vpc-production \
  --subnetwork=subnet-gke-us \
  --cluster-secondary-range-name=pod-range \
  --services-secondary-range-name=service-range \
  --enable-ip-alias \
  --release-channel=regular \
  --workload-pool=prod-web-app-2026.svc.id.goog \
  --enable-network-policy \
  --enable-dataplane-v2 \
  --enable-shielded-nodes \
  --enable-autorepair \
  --enable-autoupgrade \
  --maintenance-window-start "2026-01-01T04:00:00Z" \
  --maintenance-window-end "2026-01-01T08:00:00Z" \
  --maintenance-window-recurrence "FREQ=WEEKLY;BYDAY=SA,SU"

# Get credentials
gcloud container clusters get-credentials gke-production --region=us-central1

# Add a Spot node pool for batch workloads
gcloud container node-pools create spot-pool \
  --cluster=gke-standard \
  --region=us-central1 \
  --machine-type=n2-standard-4 \
  --spot \
  --enable-autoscaling --min-nodes=0 --max-nodes=10 \
  --node-taints=cloud.google.com/gke-spot=true:NoSchedule \
  --node-labels=workload-type=batch

GKE Workload Identity

Workload Identity is the recommended way for pods to authenticate to GCP services. It maps Kubernetes service accounts to GCP service accounts, eliminating the need for service account keys.

# Create a Kubernetes service account
kubectl create serviceaccount app-backend --namespace=production

# Bind KSA to GSA
gcloud iam service-accounts add-iam-policy-binding \
  [email protected] \
  --role=roles/iam.workloadIdentityUser \
  --member="serviceAccount:prod-web-app-2026.svc.id.goog[production/app-backend]"

# Annotate the KSA
kubectl annotate serviceaccount app-backend \
  --namespace=production \
  iam.gke.io/gcp-service-account=[email protected]

GKE Cost Optimization

Use Autopilot to avoid paying for idle node capacity.
Use Spot VMs for fault-tolerant workloads (30-91% cheaper).
Enable Vertical Pod Autoscaler to right-size resource requests.
Use node auto-provisioning to let GKE choose optimal machine types.
Set resource requests and limits on every pod. Pods without requests waste capacity.

Cloud Build

Cloud Build is GCP's serverless CI/CD platform. It runs build steps as containers, making it highly flexible. You can use any Docker image as a build step, which means you can use any tool in your pipeline.

Pricing

Cloud Build charges $0.003 per build-minute for the first 120 minutes per day (free), then scales based on machine type:

Machine Type	vCPUs	RAM	Cost per minute
e2-medium	1	4 GB	$0.003
e2-highcpu-8	8	8 GB	$0.016
e2-highcpu-32	32	32 GB	$0.064

# cloudbuild.yaml with multi-stage pipeline
steps:
  # Run tests
  - name: 'node:20'
    entrypoint: 'bash'
    args: ['-c', 'npm ci && npm run lint && npm test']

  # Build container image
  - name: 'gcr.io/cloud-builders/docker'
    args:
      - 'build'
      - '-t'
      - 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:$SHORT_SHA'
      - '-t'
      - 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:latest'
      - '--cache-from'
      - 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:latest'
      - '.'

  # Push to Artifact Registry
  - name: 'gcr.io/cloud-builders/docker'
    args:
      - 'push'
      - '--all-tags'
      - 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp'

  # Deploy to Cloud Run
  - name: 'gcr.io/cloud-builders/gcloud'
    args:
      - 'run'
      - 'deploy'
      - 'webapp'
      - '--image=us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:$SHORT_SHA'
      - '--region=us-central1'
      - '--platform=managed'
      - '--no-traffic'
    id: deploy-canary

  # Run smoke tests
  - name: 'curlimages/curl'
    entrypoint: 'sh'
    args:
      - '-c'
      - 'curl -sf "$(gcloud run services describe webapp --region=us-central1 --format="value(status.url)")/health" || exit 1'
    waitFor: ['deploy-canary']

  # Shift traffic to new revision
  - name: 'gcr.io/cloud-builders/gcloud'
    args:
      - 'run'
      - 'services'
      - 'update-traffic'
      - 'webapp'
      - '--region=us-central1'
      - '--to-latest'

options:
  machineType: 'E2_HIGHCPU_8'
  logging: CLOUD_LOGGING_ONLY

images:
  - 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:$SHORT_SHA'

# Submit a build manually
gcloud builds submit --config=cloudbuild.yaml .

# Set up a trigger for GitHub pushes
gcloud builds triggers create github \
  --name=deploy-on-push \
  --repo-owner=myorg \
  --repo-name=webapp \
  --branch-pattern="^main$" \
  --build-config=cloudbuild.yaml \
  --include-logs-with-status

Cloud Run

Cloud Run is GCP's serverless container platform. It takes a container image and runs it with automatic scaling, including scale-to-zero. It is simpler than Kubernetes and cheaper for many workloads. If your service does not need the complexity of GKE, Cloud Run should be your first choice.

Pricing

Cloud Run uses a pay-per-use model:

Resource	Cost	Free Tier
CPU	$0.00002400 per vCPU-second	180,000 vCPU-seconds/month
Memory	$0.00000250 per GiB-second	360,000 GiB-seconds/month
Requests	$0.40 per million	2 million requests/month

A service handling 1 million requests/month with 100ms average response time at 1 vCPU and 512MB RAM costs approximately $1-3/month. Compare that to running an always-on VM or EKS cluster.

# Deploy a container to Cloud Run
gcloud run deploy webapp \
  --image=us-central1-docker.pkg.dev/prod-web-app-2026/app-images/webapp:v1.2.3 \
  --region=us-central1 \
  --platform=managed \
  --port=8080 \
  --memory=512Mi \
  --cpu=1 \
  --min-instances=1 \
  --max-instances=100 \
  --concurrency=80 \
  --timeout=60s \
  --cpu-throttling \
  --service-account=[email protected] \
  --set-env-vars="NODE_ENV=production,LOG_LEVEL=info" \
  --set-secrets="DB_PASSWORD=db-password:latest" \
  --vpc-connector=connector-production \
  --vpc-egress=private-ranges-only \
  --ingress=all \
  --allow-unauthenticated

# Traffic splitting for canary deployments
gcloud run services update-traffic webapp \
  --region=us-central1 \
  --to-revisions=webapp-00005-abc=90,webapp-00006-def=10

# Map a custom domain
gcloud run domain-mappings create \
  --service=webapp \
  --domain=app.example.com \
  --region=us-central1

Cloud Run vs Cloud Functions vs GKE

Feature	Cloud Run	Cloud Functions	GKE
Unit of deployment	Container	Function code	Pods (containers)
Scaling	0 to 1000 instances	0 to 3000 instances	Node-level
Max request timeout	60 min	9 min (2nd gen)	Unlimited
Minimum instances	0 (scale to zero)	0	1 node
Concurrency	Up to 1000 requests/instance	1 (1st gen), configurable (2nd gen)	Unlimited
Pricing	Per-second CPU/memory	Per-invocation + duration	Per-node (always-on)
Complexity	Low	Lowest	High
Best for	APIs, web apps	Event handlers, webhooks	Complex microservices, stateful

Artifact Registry

Artifact Registry replaces Container Registry (gcr.io). It supports Docker images, npm, Maven, Python, Go, Apt, and Yum packages.

# Create a Docker repository
gcloud artifacts repositories create app-images \
  --repository-format=docker \
  --location=us-central1 \
  --description="Production application images" \
  --immutable-tags

# Create an npm repository
gcloud artifacts repositories create npm-packages \
  --repository-format=npm \
  --location=us-central1 \
  --description="Internal npm packages"

# Configure Docker to authenticate
gcloud auth configure-docker us-central1-docker.pkg.dev

# Push an image
docker tag webapp:latest us-central1-docker.pkg.dev/prod-web-app-2026/app-images/webapp:v1.2.3
docker push us-central1-docker.pkg.dev/prod-web-app-2026/app-images/webapp:v1.2.3

# Set up vulnerability scanning
gcloud artifacts docker images list \
  us-central1-docker.pkg.dev/prod-web-app-2026/app-images \
  --show-occurrences \
  --format="table(package,version,createTime)"

# Clean up old images
gcloud artifacts docker images delete \
  us-central1-docker.pkg.dev/prod-web-app-2026/app-images/webapp:old-tag \
  --delete-tags --quiet

Secret Manager

Secret Manager stores API keys, passwords, certificates, and other sensitive data with automatic versioning and IAM-based access control.

# Create a secret
echo -n "my-db-password" | gcloud secrets create db-password --data-file=-

# Access the latest version
gcloud secrets versions access latest --secret=db-password

# Add a new version
echo -n "new-password" | gcloud secrets versions add db-password --data-file=-

# Set up automatic rotation notification
gcloud secrets update db-password \
  --topics=projects/prod-web-app-2026/topics/secret-rotation \
  --next-rotation-time="2026-06-24T00:00:00Z" \
  --rotation-period=7776000s

gcloud CLI Essentials

The gcloud CLI is well-structured and consistent. Master these patterns:

# Authentication
gcloud auth login                          # Interactive login
gcloud auth application-default login      # For local development (SDK auth)
gcloud auth print-access-token             # Get current access token

# Configuration and profiles
gcloud config set project prod-web-app-2026
gcloud config set compute/region us-central1
gcloud config set compute/zone us-central1-a

# Named configurations (like AWS profiles)
gcloud config configurations create production
gcloud config configurations activate production
gcloud config configurations list

# Common operations with filtering
gcloud compute instances list \
  --filter="status=RUNNING AND labels.env=production" \
  --format="table(name,zone,machineType.basename(),networkInterfaces[0].networkIP)"

gcloud container clusters list \
  --format="table(name,location,currentMasterVersion,status,currentNodeCount)"

gcloud run services list --platform=managed \
  --format="table(SERVICE,REGION,URL,LAST_DEPLOYED_BY)"

# Output formats
gcloud compute instances list --format="json" | jq '.[].name'
gcloud compute instances list --format="csv(name,zone,status)"
gcloud compute instances list --format="value(name)" # Just values, one per line

# Impersonate a service account (for testing permissions)
gcloud auth print-access-token \
  --impersonate-service-account=[email protected]

Cost Management

GCP provides several tools for cost visibility and optimization:

# Export billing data to BigQuery for analysis
bq mk --dataset prod-web-app-2026:billing_export

# Use BigQuery to analyze costs
bq query --use_legacy_sql=false '
  SELECT
    service.description,
    SUM(cost) as total_cost,
    SUM(usage.amount) as total_usage,
    usage.unit
  FROM `prod-web-app-2026.billing_export.gcp_billing_export`
  WHERE DATE(usage_start_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
  GROUP BY service.description, usage.unit
  ORDER BY total_cost DESC
  LIMIT 20
'

Cost Optimization Strategies

Sustained Use Discounts -- automatic, no action needed. Applies to N1, N2, and E2 instances.
Committed Use Discounts -- 37% (1-year) or 55% (3-year) for predictable workloads.
Spot VMs -- 60-91% savings for interruptible workloads.
Autopilot for GKE -- pay only for pod resources, no idle node waste.
Cloud Run scale-to-zero -- no cost when no traffic.
Right-sizing recommendations -- Compute Engine Recommender suggests optimal machine types.
Storage lifecycle policies -- automatically transition data to cheaper classes.
Budget alerts -- set budgets with email and Pub/Sub notifications.
Billing export to BigQuery -- query your billing data with SQL for deep analysis.
Active Assist -- GCP's umbrella for all optimization recommendations.

Migration Considerations

When migrating to GCP from other clouds or on-premises:

Migrate to Virtual Machines -- replicates VMs from on-premises (VMware, AWS, Azure) to Compute Engine.
Database Migration Service -- supports MySQL, PostgreSQL, SQL Server, and Oracle to Cloud SQL or AlloyDB.
Transfer Appliance -- physical device for large-scale data transfers (like AWS Snowball).
Storage Transfer Service -- transfers data from S3, Azure Blob, or on-premises to Cloud Storage.
BigQuery Data Transfer Service -- automates data movement from SaaS platforms into BigQuery.
Anthos -- run GKE clusters on-premises, on AWS, or on Azure with a consistent management plane.

GCP rewards engineers who invest in understanding its project model and IAM system. The developer experience is excellent -- services are well-integrated, the CLI is consistent, and the documentation is some of the best in the industry. If you are building on Kubernetes, need strong data analytics capabilities, or value automatic cost optimizations like sustained use discounts, GCP is a compelling choice that continues to close the gap with AWS in breadth while maintaining its lead in developer satisfaction.

On this page

Related Articles

Google GKE: Production Kubernetes Cluster on Google Cloud

Google Cloud Run: Serverless Containers That Scale to Zero

Fix GCP Cloud Storage '403 Forbidden' Errors

GKE Autopilot: Serverless Kubernetes on Google Cloud

GCP Cloud Function "Function Execution Took Too Long" Error: Memory And Timeout Configuration Fix Guide

Google Cloud Build: CI/CD Pipelines for GCP-Native Teams

Discussion