GCP Core Services: The DevOps Engineer's Essential Guide
Google Cloud Platform has always been the engineer's cloud. Where AWS leads in breadth and Azure in enterprise integration, GCP wins on developer experience, data analytics, and Kubernetes -- which makes sense given that Google invented Kubernetes internally as Borg. If your organization is Kubernetes-first or data-heavy, GCP deserves serious consideration. This guide covers every core service a DevOps engineer needs, with real CLI examples, pricing details, architecture patterns, and the operational nuances that documentation often glosses over.
GCP Project Structure
GCP organizes resources in a hierarchy that looks simple but has important implications for billing, IAM, and policy inheritance. Understanding this hierarchy is foundational to everything else.
The Resource Hierarchy
Organization (company.com)
|-- Folders (optional, for departments or environments)
| |-- Folder: Production
| | |-- Project: prod-web-app
| | +-- Project: prod-data-pipeline
| |-- Folder: Staging
| | +-- Project: staging-web-app
| +-- Folder: Shared
| +-- Project: shared-networking
+-- Projects (can also exist directly under org)
Projects Are Everything
A project is the fundamental unit in GCP. Every resource belongs to a project. Projects provide:
- Billing boundary -- each project links to a billing account.
- IAM boundary -- permissions are granted at the project level (or above/below). Permissions granted at the folder or org level cascade down.
- API enablement -- you explicitly enable APIs per project. This is a security feature: an unused API cannot be exploited.
- Resource namespace -- resource names are unique within a project.
- Quota management -- API quotas and resource limits are per-project.
# Create a new project
gcloud projects create prod-web-app-2026 \
--name="Production Web App" \
--folder=123456789 \
--labels=env=production,team=platform
# Set your active project
gcloud config set project prod-web-app-2026
# Enable required APIs (only enable what you need)
gcloud services enable \
compute.googleapis.com \
container.googleapis.com \
cloudbuild.googleapis.com \
artifactregistry.googleapis.com \
run.googleapis.com \
monitoring.googleapis.com \
logging.googleapis.com \
secretmanager.googleapis.com \
sqladmin.googleapis.com
# List enabled APIs
gcloud services list --enabled --format="table(config.name,config.title)"
Always enable only the APIs you need. Each enabled API is an attack surface and a potential cost vector. GCP's explicit API enablement model is more secure by default than AWS's approach where all services are available immediately.
Organization Policies
Organization policies are the GCP equivalent of AWS SCPs. They define constraints that cascade down the resource hierarchy.
# Restrict VM creation to specific regions
gcloud resource-manager org-policies set-policy \
--project=prod-web-app-2026 \
policy.yaml
# Where policy.yaml contains:
# constraint: constraints/compute.restrictLocations
# listPolicy:
# allowedValues:
# - us-central1
# - europe-west1
Cross-Cloud Project Model Comparison
| Concept | GCP | AWS | Azure | Alibaba Cloud |
|---|---|---|---|---|
| Top-level container | Organization | Organization | Entra ID Tenant | Resource Directory |
| Grouping mechanism | Folders | Organizational Units | Management Groups | Folders |
| Billing/isolation unit | Project | Account | Subscription | Account |
| Policy enforcement | Organization Policies | SCPs | Azure Policy | Config Rules |
| Resource grouping | Labels | Tags | Resource Groups | Resource Groups |
| API enablement | Explicit per-project | All available | Resource providers | Explicit activation |
IAM: Identity and Access Management
GCP IAM uses a resource hierarchy model where permissions flow downward. A role granted at the organization level applies to every folder, project, and resource underneath. This inheritance model is powerful but requires careful planning to avoid over-permissioning.
Role Types
| Type | Example | When to Use | Security Level |
|---|---|---|---|
| Basic | roles/viewer, roles/editor, roles/owner | Almost never in production -- too broad | Low |
| Predefined | roles/compute.instanceAdmin, roles/storage.objectViewer | Most common -- scoped to a service | Medium |
| Custom | roles/myCustomRole | When predefined roles grant too much | High |
Basic roles are legacy and should be avoided in production. The roles/editor role, for example, grants write access to almost every GCP service. Use predefined roles instead, and create custom roles when even predefined roles are too permissive.
IAM Policy Bindings
GCP IAM uses a binding model: you bind a member (who) to a role (what) at a scope (where). Multiple bindings form a policy.
# Grant a user viewer access to a project
gcloud projects add-iam-policy-binding prod-web-app-2026 \
--member="user:engineer@company.com" \
--role="roles/viewer"
# Grant a group contributor access to a specific bucket
gcloud storage buckets add-iam-policy-binding gs://prod-app-data \
--member="group:platform-team@company.com" \
--role="roles/storage.objectAdmin"
# Grant conditional access (only from corporate network)
gcloud projects add-iam-policy-binding prod-web-app-2026 \
--member="user:engineer@company.com" \
--role="roles/compute.instanceAdmin.v1" \
--condition='expression=request.time.getHours("America/New_York") >= 9 && request.time.getHours("America/New_York") <= 17,title=business-hours-only,description=Only allow access during business hours'
# List all IAM bindings for a project
gcloud projects get-iam-policy prod-web-app-2026 --format=yaml
Service Accounts
Service accounts are the identities for workloads -- VMs, Cloud Functions, CI/CD pipelines. They are the equivalent of AWS IAM roles. Each service account is identified by an email address.
# Create a service account
gcloud iam service-accounts create sa-web-backend \
--display-name="Web Backend Service Account" \
--project=prod-web-app-2026
# Grant the minimum required roles
gcloud projects add-iam-policy-binding prod-web-app-2026 \
--member="serviceAccount:sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com" \
--role="roles/storage.objectViewer"
gcloud projects add-iam-policy-binding prod-web-app-2026 \
--member="serviceAccount:sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com" \
--role="roles/cloudsql.client"
gcloud projects add-iam-policy-binding prod-web-app-2026 \
--member="serviceAccount:sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
Critical rule: Never download service account keys (JSON key files) unless absolutely necessary. Use Workload Identity for GKE, attached service accounts for Compute Engine, and Workload Identity Federation for external CI/CD systems. Every downloaded key is a credential that can be leaked.
Workload Identity Federation
For CI/CD pipelines running outside GCP (GitHub Actions, GitLab CI, Jenkins), use Workload Identity Federation instead of service account keys. This eliminates the need to manage and rotate JSON key files.
# Create a Workload Identity Pool
gcloud iam workload-identity-pools create github-pool \
--location="global" \
--display-name="GitHub Actions Pool" \
--description="Identity pool for GitHub Actions CI/CD"
# Create a provider for GitHub
gcloud iam workload-identity-pools providers create-oidc github-provider \
--location="global" \
--workload-identity-pool=github-pool \
--display-name="GitHub Provider" \
--attribute-mapping="google.subject=assertion.sub,attribute.repository=assertion.repository,attribute.actor=assertion.actor" \
--attribute-condition="assertion.repository_owner == 'myorg'" \
--issuer-uri="https://token.actions.githubusercontent.com"
# Allow the service account to be impersonated
gcloud iam service-accounts add-iam-policy-binding \
sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="principalSet://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/github-pool/attribute.repository/myorg/myrepo"
This is a significantly better security model than downloading JSON keys and storing them as CI/CD secrets. The token exchange happens automatically, credentials are short-lived, and there are no static secrets to rotate.
IAM Recommender
GCP provides an IAM Recommender that analyzes actual usage patterns and suggests tighter roles. This is one of the best tools for enforcing least privilege over time.
# List IAM recommendations for a project
gcloud recommender recommendations list \
--project=prod-web-app-2026 \
--location=global \
--recommender=google.iam.policy.Recommender \
--format="table(content.operationGroups[0].operations[0].pathFilters)"
Compute Engine
Compute Engine is GCP's VM service. It is fast, flexible, and has one feature that sets it apart from AWS and Azure: live migration. Google can move your running VM to another physical host for maintenance without any downtime. This means fewer maintenance windows and higher effective availability.
Machine Types
| Family | Use Case | Example | On-Demand (us-central1) |
|---|---|---|---|
| e2 | Cost-optimized, general purpose | e2-medium (2 vCPU, 4 GB) | ~$0.034/hr |
| n2/n2d | Balanced, production workloads | n2-standard-4 (4 vCPU, 16 GB) | ~$0.194/hr |
| n4 | Latest gen, best price-performance | n4-standard-4 (4 vCPU, 16 GB) | ~$0.170/hr |
| c2/c2d | Compute optimized | c2-standard-8 (8 vCPU, 32 GB) | ~$0.334/hr |
| c3 | Latest compute optimized | c3-standard-8 (8 vCPU, 32 GB) | ~$0.320/hr |
| m2 | Memory optimized | m2-ultramem-208 (208 vCPU, 5.75 TB) | ~$42.18/hr |
| t2a | Arm-based (Ampere), cost savings | t2a-standard-4 (4 vCPU, 16 GB) | ~$0.153/hr |
| t2d | AMD-based, balanced | t2d-standard-4 (4 vCPU, 16 GB) | ~$0.156/hr |
| a2/g2 | GPU workloads | g2-standard-4 (4 vCPU, 16 GB, 1 GPU) | ~$0.73/hr |
Custom Machine Types
One of GCP's unique features is custom machine types. You specify exact vCPU and memory combinations, so you never overpay for resources you do not use.
# Create a VM with a custom machine type (4 vCPUs, 8 GB RAM)
gcloud compute instances create web-server-01 \
--zone=us-central1-a \
--machine-type=e2-custom-4-8192 \
--image-family=ubuntu-2204-lts \
--image-project=ubuntu-os-cloud \
--boot-disk-size=50GB \
--boot-disk-type=pd-ssd \
--network=vpc-production \
--subnet=subnet-app \
--service-account=sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com \
--scopes=cloud-platform \
--tags=http-server,https-server \
--labels=env=production,team=platform \
--metadata-from-file=startup-script=bootstrap.sh \
--shielded-secure-boot \
--shielded-vtpm
Pricing Model: Sustained Use Discounts
GCP automatically applies sustained use discounts (SUDs) to VMs that run for more than 25% of a month. No commitment required -- it happens automatically. The discount increases the longer the instance runs, up to 30% off for instances running the full month. This is unique to GCP; AWS and Azure require upfront commitments for equivalent savings.
| Usage Level | Discount |
|---|---|
| 0-25% of month | 0% (full price) |
| 25-50% | ~20% off |
| 50-75% | ~20% off |
| 75-100% | ~30% off |
| Full month effective | ~30% average savings |
Committed Use Discounts (CUDs)
For predictable workloads, CUDs offer 1-year (37% savings) or 3-year (55% savings) commitments. Unlike AWS Reserved Instances, GCP CUDs are applied at the project level and can cover any machine type within the committed resource class.
# Purchase a committed use discount
gcloud compute commitments create my-commitment \
--region=us-central1 \
--plan=twelve-month \
--resources=vcpu=100,memory=400GB \
--type=GENERAL_PURPOSE
Preemptible and Spot VMs
Spot VMs (formerly preemptible VMs) cost 60-91% less than regular VMs but can be terminated with 30 seconds notice. Perfect for CI/CD build agents, batch processing, non-critical data processing, and GKE node pools for fault-tolerant workloads.
gcloud compute instances create build-agent-01 \
--machine-type=c2-standard-8 \
--provisioning-model=SPOT \
--instance-termination-action=DELETE \
--zone=us-central1-a \
--metadata=shutdown-script='#!/bin/bash
# Gracefully drain work before termination
curl -X POST http://localhost:8080/drain'
Managed Instance Groups (MIGs)
MIGs are the auto-scaling mechanism for Compute Engine, equivalent to AWS ASGs:
# Create an instance template
gcloud compute instance-templates create web-template-v2 \
--machine-type=n2-standard-2 \
--image-family=ubuntu-2204-lts \
--image-project=ubuntu-os-cloud \
--boot-disk-size=50 \
--boot-disk-type=pd-ssd \
--service-account=sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com \
--tags=http-server \
--metadata-from-file=startup-script=bootstrap.sh \
--network=vpc-production \
--subnet=subnet-app \
--region=us-central1
# Create a regional MIG with autoscaling
gcloud compute instance-groups managed create web-mig \
--template=web-template-v2 \
--size=2 \
--region=us-central1 \
--target-distribution-shape=EVEN \
--health-check=http-health-check \
--initial-delay=120
gcloud compute instance-groups managed set-autoscaling web-mig \
--region=us-central1 \
--min-num-replicas=2 \
--max-num-replicas=10 \
--target-cpu-utilization=0.70 \
--cool-down-period=120 \
--scale-in-control max-scaled-in-replicas=2,time-window=300
# Rolling update to a new template
gcloud compute instance-groups managed rolling-action start-update web-mig \
--version=template=web-template-v3 \
--region=us-central1 \
--max-surge=3 \
--max-unavailable=0
Persistent Disk Types
| Disk Type | Max IOPS (read) | Max Throughput | Use Case | Cost (per GB/mo) |
|---|---|---|---|---|
| pd-standard | 7,500 | 400 MB/s | Bulk storage, backups | ~$0.040 |
| pd-balanced | 80,000 | 1,200 MB/s | Most workloads | ~$0.100 |
| pd-ssd | 100,000 | 1,200 MB/s | Databases, latency-sensitive | ~$0.170 |
| pd-extreme | 120,000 | 2,400 MB/s | Top-tier databases | ~$0.125 + IOPS |
| Hyperdisk Balanced | 160,000 | 2,400 MB/s | Next-gen balanced | ~$0.060 + IOPS + throughput |
| Local SSD | 900,000 | 9,360 MB/s | Ephemeral high-perf | ~$0.080 |
VPC Networking
GCP VPCs are global by default -- a single VPC spans all regions. Subnets are regional. This is a fundamental architectural difference from AWS and Azure where VPCs/VNets are regional. A single GCP VPC can contain subnets in us-central1, europe-west1, and asia-east1, and they can all communicate privately without peering.
Network Architecture
VPC: vpc-production (global)
|-- subnet-web-us (10.0.1.0/24) -- us-central1
|-- subnet-app-us (10.0.2.0/24) -- us-central1
|-- subnet-data-us (10.0.3.0/24) -- us-central1
|-- subnet-web-eu (10.10.1.0/24) -- europe-west1
|-- subnet-app-eu (10.10.2.0/24) -- europe-west1
|-- subnet-gke-us (10.0.16.0/20) -- us-central1
| |-- Pod CIDR: 10.100.0.0/14 (secondary range)
| +-- Service CIDR: 10.200.0.0/20 (secondary range)
+-- subnet-gke-eu (10.10.16.0/20) -- europe-west1
Firewall Rules
GCP uses VPC-level firewall rules (not per-subnet like AWS/Azure). Rules are applied using network tags or service accounts as targets. This model is more flexible -- you can apply the same firewall rule to VMs across different subnets and regions within the same VPC.
# Allow HTTP/HTTPS to instances tagged 'http-server'
gcloud compute firewall-rules create allow-http-https \
--network=vpc-production \
--direction=INGRESS \
--action=ALLOW \
--rules=tcp:80,tcp:443 \
--source-ranges=0.0.0.0/0 \
--target-tags=http-server \
--priority=1000 \
--description="Allow HTTP and HTTPS from internet to web servers"
# Allow internal communication between app and data tiers
gcloud compute firewall-rules create allow-app-to-data \
--network=vpc-production \
--direction=INGRESS \
--action=ALLOW \
--rules=tcp:5432,tcp:6379,tcp:3306 \
--source-tags=app-server \
--target-tags=data-server \
--priority=1000
# Service-account-based rules (more secure than tags)
gcloud compute firewall-rules create allow-backend-to-db \
--network=vpc-production \
--direction=INGRESS \
--action=ALLOW \
--rules=tcp:5432 \
--source-service-accounts=sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com \
--target-service-accounts=sa-database@prod-web-app-2026.iam.gserviceaccount.com \
--priority=900
# Deny all other ingress (implicit, but making it explicit)
gcloud compute firewall-rules create deny-all-ingress \
--network=vpc-production \
--direction=INGRESS \
--action=DENY \
--rules=all \
--source-ranges=0.0.0.0/0 \
--priority=65534
# Enable firewall logging for troubleshooting
gcloud compute firewall-rules update allow-http-https \
--enable-logging \
--logging-metadata=INCLUDE_ALL_METADATA
Service-account-based firewall rules are more secure than tag-based rules because tags can be modified by anyone with compute.instances.setTags permission, while service accounts require iam.serviceAccounts.actAs permission.
Cloud NAT and Private Google Access
For private instances that need outbound internet access:
# Create a Cloud Router (required for Cloud NAT)
gcloud compute routers create router-production \
--region=us-central1 \
--network=vpc-production
# Create Cloud NAT
gcloud compute routers nats create nat-production \
--router=router-production \
--region=us-central1 \
--nat-all-subnet-ip-ranges \
--auto-allocate-nat-external-ips \
--min-ports-per-vm=256 \
--enable-logging
Enable Private Google Access on subnets to let VMs without external IPs reach Google APIs and services:
gcloud compute networks subnets update subnet-app-us \
--region=us-central1 \
--enable-private-ip-google-access
Shared VPC
For multi-project environments, Shared VPC lets you define the network in a host project and share subnets with service projects. This centralizes network management while allowing individual teams to manage their own resources.
# Enable shared VPC on the host project
gcloud compute shared-vpc enable shared-networking
# Associate a service project
gcloud compute shared-vpc associated-projects add prod-web-app-2026 \
--host-project=shared-networking
Load Balancing
GCP offers a comprehensive load balancing portfolio:
| Type | Scope | Layer | Use Case |
|---|---|---|---|
| External HTTP(S) LB | Global | L7 | Web apps, APIs, CDN integration |
| External TCP/UDP Network LB | Regional | L4 | Non-HTTP traffic, gaming |
| Internal HTTP(S) LB | Regional | L7 | Internal microservices |
| Internal TCP/UDP LB | Regional | L4 | Internal databases, gRPC |
| Cross-region Internal LB | Global | L7 | Multi-region internal services |
The Global HTTP(S) Load Balancer is one of GCP's strongest offerings. It provides a single anycast IP that routes traffic to the nearest healthy backend worldwide, with automatic SSL termination, Cloud CDN integration, and Cloud Armor (WAF/DDoS) built in.
Cloud Storage
Cloud Storage is GCP's object storage service, equivalent to S3. It uses a flat namespace with buckets and objects. Buckets can be regional, dual-region, or multi-region.
Storage Classes
| Class | Min Storage Duration | Use Case | Monthly Cost (per GB) |
|---|---|---|---|
| Standard | None | Frequently accessed | ~$0.020 (regional) |
| Nearline | 30 days | Monthly access | ~$0.010 |
| Coldline | 90 days | Quarterly access | ~$0.004 |
| Archive | 365 days | Annual access or less | ~$0.0012 |
GCP charges for early deletion: if you store data in Nearline and delete it before 30 days, you pay for the full 30 days. Plan your storage class based on actual access patterns.
Bucket Location Types
| Location Type | Redundancy | Latency | Use Case |
|---|---|---|---|
| Regional | Single region, multiple zones | Lowest within region | Application data, compute co-location |
| Dual-region | Two specific regions | Low in both regions | DR between known regions |
| Multi-region | Three+ regions in a continent | Higher | Globally accessed content |
# Create a bucket with lifecycle rules
gcloud storage buckets create gs://prod-app-logs-2026 \
--location=us-central1 \
--default-storage-class=STANDARD \
--uniform-bucket-level-access \
--public-access-prevention=enforced \
--soft-delete-duration=7d
# Set lifecycle policy
cat > lifecycle.json << 'EOF'
{
"rule": [
{
"action": { "type": "SetStorageClass", "storageClass": "NEARLINE" },
"condition": { "age": 30, "matchesStorageClass": ["STANDARD"] }
},
{
"action": { "type": "SetStorageClass", "storageClass": "COLDLINE" },
"condition": { "age": 90, "matchesStorageClass": ["NEARLINE"] }
},
{
"action": { "type": "SetStorageClass", "storageClass": "ARCHIVE" },
"condition": { "age": 365, "matchesStorageClass": ["COLDLINE"] }
},
{
"action": { "type": "Delete" },
"condition": { "age": 2555 }
}
]
}
EOF
gcloud storage buckets update gs://prod-app-logs-2026 \
--lifecycle-file=lifecycle.json
# Enable versioning for state files
gcloud storage buckets update gs://prod-terraform-state \
--versioning
# Enable Object Lock for compliance
gcloud storage buckets update gs://prod-audit-logs \
--retention-period=365d \
--locked-retention-period
Always enable uniform bucket-level access to simplify permissions. It prevents the confusing mix of IAM and ACLs that plagues older buckets. Enable public access prevention unless you explicitly need public access.
gsutil vs gcloud storage
Google is transitioning from gsutil to gcloud storage for Cloud Storage operations. Use gcloud storage for new work:
# Copy files (gcloud storage)
gcloud storage cp ./dist/* gs://prod-app-data/assets/ --recursive
# Sync a directory
gcloud storage rsync ./build/ gs://prod-app-data/static/ --recursive --delete-unmatched-destination-objects
# Parallel composite uploads for large files
gcloud storage cp large-backup.tar.gz gs://prod-backups/ --no-clobber
Cloud SQL
Cloud SQL is GCP's managed relational database service, supporting MySQL, PostgreSQL, and SQL Server. It handles patching, backups, replication, and failover.
# Create a PostgreSQL instance
gcloud sql instances create prod-postgres \
--database-version=POSTGRES_15 \
--tier=db-custom-4-16384 \
--region=us-central1 \
--availability-type=REGIONAL \
--storage-type=SSD \
--storage-size=100 \
--storage-auto-increase \
--backup-start-time=03:00 \
--enable-point-in-time-recovery \
--retained-backups-count=14 \
--maintenance-window-day=MON \
--maintenance-window-hour=4 \
--insights-config-query-insights-enabled \
--root-password="$(gcloud secrets versions access latest --secret=db-root-password)" \
--network=vpc-production \
--no-assign-ip \
--labels=env=production,team=platform
# Create a read replica
gcloud sql instances create prod-postgres-replica \
--master-instance-name=prod-postgres \
--database-version=POSTGRES_15 \
--tier=db-custom-4-16384 \
--region=us-central1
# Connect via Cloud SQL Auth Proxy (recommended for applications)
cloud-sql-proxy \
--auto-iam-authn \
prod-web-app-2026:us-central1:prod-postgres
Cloud SQL vs AlloyDB vs Cloud Spanner
| Feature | Cloud SQL | AlloyDB | Cloud Spanner |
|---|---|---|---|
| Engine | MySQL, PostgreSQL, SQL Server | PostgreSQL-compatible | Proprietary |
| Scaling | Vertical (up to 96 vCPU) | Vertical + read pools | Horizontal, global |
| Max storage | 64 TB | 128 TB | Unlimited |
| Global distribution | Read replicas | Regional | Multi-region, strong consistency |
| Price (entry) | ~$50/mo (db-f1-micro) | ~$500/mo | ~$657/mo (1 node) |
| Best for | Standard RDBMS workloads | High-perf PostgreSQL | Global, financial-grade |
GKE: Google Kubernetes Engine
GKE is where GCP truly shines. As the birthplace of Kubernetes, Google offers the most mature managed Kubernetes service with features that other clouds are still catching up on.
Autopilot vs Standard
| Feature | Autopilot | Standard |
|---|---|---|
| Node management | Google manages | You manage |
| Pod-level billing | Yes | No (pay for nodes) |
| Node configuration | Limited | Full control |
| GPU support | Yes (with reservations) | Full |
| Security hardening | Automatic (workload isolation, CIS benchmarks) | Manual |
| Control plane cost | Free | Free (standard), $73/mo (enterprise) |
| Resource efficiency | Optimized by Google | You optimize |
| Best for | Most workloads | Specialized needs, custom kernels |
Autopilot is the recommended choice for most teams. You define pods, GKE handles node provisioning, scaling, security hardening, and OS patching.
# Create an Autopilot cluster (recommended for most teams)
gcloud container clusters create-auto gke-production \
--region=us-central1 \
--release-channel=regular \
--network=vpc-production \
--subnetwork=subnet-gke-us \
--cluster-secondary-range-name=pod-range \
--services-secondary-range-name=service-range \
--enable-master-authorized-networks \
--master-authorized-networks=10.0.0.0/8 \
--workload-pool=prod-web-app-2026.svc.id.goog \
--enable-fleet
# Create a Standard cluster (when you need more control)
gcloud container clusters create gke-standard \
--region=us-central1 \
--num-nodes=1 \
--machine-type=n2-standard-4 \
--enable-autoscaling --min-nodes=1 --max-nodes=5 \
--network=vpc-production \
--subnetwork=subnet-gke-us \
--cluster-secondary-range-name=pod-range \
--services-secondary-range-name=service-range \
--enable-ip-alias \
--release-channel=regular \
--workload-pool=prod-web-app-2026.svc.id.goog \
--enable-network-policy \
--enable-dataplane-v2 \
--enable-shielded-nodes \
--enable-autorepair \
--enable-autoupgrade \
--maintenance-window-start "2026-01-01T04:00:00Z" \
--maintenance-window-end "2026-01-01T08:00:00Z" \
--maintenance-window-recurrence "FREQ=WEEKLY;BYDAY=SA,SU"
# Get credentials
gcloud container clusters get-credentials gke-production --region=us-central1
# Add a Spot node pool for batch workloads
gcloud container node-pools create spot-pool \
--cluster=gke-standard \
--region=us-central1 \
--machine-type=n2-standard-4 \
--spot \
--enable-autoscaling --min-nodes=0 --max-nodes=10 \
--node-taints=cloud.google.com/gke-spot=true:NoSchedule \
--node-labels=workload-type=batch
GKE Workload Identity
Workload Identity is the recommended way for pods to authenticate to GCP services. It maps Kubernetes service accounts to GCP service accounts, eliminating the need for service account keys.
# Create a Kubernetes service account
kubectl create serviceaccount app-backend --namespace=production
# Bind KSA to GSA
gcloud iam service-accounts add-iam-policy-binding \
sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com \
--role=roles/iam.workloadIdentityUser \
--member="serviceAccount:prod-web-app-2026.svc.id.goog[production/app-backend]"
# Annotate the KSA
kubectl annotate serviceaccount app-backend \
--namespace=production \
iam.gke.io/gcp-service-account=sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com
GKE Cost Optimization
- Use Autopilot to avoid paying for idle node capacity.
- Use Spot VMs for fault-tolerant workloads (30-91% cheaper).
- Enable Vertical Pod Autoscaler to right-size resource requests.
- Use node auto-provisioning to let GKE choose optimal machine types.
- Set resource requests and limits on every pod. Pods without requests waste capacity.
Cloud Build
Cloud Build is GCP's serverless CI/CD platform. It runs build steps as containers, making it highly flexible. You can use any Docker image as a build step, which means you can use any tool in your pipeline.
Pricing
Cloud Build charges $0.003 per build-minute for the first 120 minutes per day (free), then scales based on machine type:
| Machine Type | vCPUs | RAM | Cost per minute |
|---|---|---|---|
| e2-medium | 1 | 4 GB | $0.003 |
| e2-highcpu-8 | 8 | 8 GB | $0.016 |
| e2-highcpu-32 | 32 | 32 GB | $0.064 |
# cloudbuild.yaml with multi-stage pipeline
steps:
# Run tests
- name: 'node:20'
entrypoint: 'bash'
args: ['-c', 'npm ci && npm run lint && npm test']
# Build container image
- name: 'gcr.io/cloud-builders/docker'
args:
- 'build'
- '-t'
- 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:$SHORT_SHA'
- '-t'
- 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:latest'
- '--cache-from'
- 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:latest'
- '.'
# Push to Artifact Registry
- name: 'gcr.io/cloud-builders/docker'
args:
- 'push'
- '--all-tags'
- 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp'
# Deploy to Cloud Run
- name: 'gcr.io/cloud-builders/gcloud'
args:
- 'run'
- 'deploy'
- 'webapp'
- '--image=us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:$SHORT_SHA'
- '--region=us-central1'
- '--platform=managed'
- '--no-traffic'
id: deploy-canary
# Run smoke tests
- name: 'curlimages/curl'
entrypoint: 'sh'
args:
- '-c'
- 'curl -sf "$(gcloud run services describe webapp --region=us-central1 --format="value(status.url)")/health" || exit 1'
waitFor: ['deploy-canary']
# Shift traffic to new revision
- name: 'gcr.io/cloud-builders/gcloud'
args:
- 'run'
- 'services'
- 'update-traffic'
- 'webapp'
- '--region=us-central1'
- '--to-latest'
options:
machineType: 'E2_HIGHCPU_8'
logging: CLOUD_LOGGING_ONLY
images:
- 'us-central1-docker.pkg.dev/$PROJECT_ID/app-images/webapp:$SHORT_SHA'
# Submit a build manually
gcloud builds submit --config=cloudbuild.yaml .
# Set up a trigger for GitHub pushes
gcloud builds triggers create github \
--name=deploy-on-push \
--repo-owner=myorg \
--repo-name=webapp \
--branch-pattern="^main$" \
--build-config=cloudbuild.yaml \
--include-logs-with-status
Cloud Run
Cloud Run is GCP's serverless container platform. It takes a container image and runs it with automatic scaling, including scale-to-zero. It is simpler than Kubernetes and cheaper for many workloads. If your service does not need the complexity of GKE, Cloud Run should be your first choice.
Pricing
Cloud Run uses a pay-per-use model:
| Resource | Cost | Free Tier |
|---|---|---|
| CPU | $0.00002400 per vCPU-second | 180,000 vCPU-seconds/month |
| Memory | $0.00000250 per GiB-second | 360,000 GiB-seconds/month |
| Requests | $0.40 per million | 2 million requests/month |
A service handling 1 million requests/month with 100ms average response time at 1 vCPU and 512MB RAM costs approximately $1-3/month. Compare that to running an always-on VM or EKS cluster.
# Deploy a container to Cloud Run
gcloud run deploy webapp \
--image=us-central1-docker.pkg.dev/prod-web-app-2026/app-images/webapp:v1.2.3 \
--region=us-central1 \
--platform=managed \
--port=8080 \
--memory=512Mi \
--cpu=1 \
--min-instances=1 \
--max-instances=100 \
--concurrency=80 \
--timeout=60s \
--cpu-throttling \
--service-account=sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com \
--set-env-vars="NODE_ENV=production,LOG_LEVEL=info" \
--set-secrets="DB_PASSWORD=db-password:latest" \
--vpc-connector=connector-production \
--vpc-egress=private-ranges-only \
--ingress=all \
--allow-unauthenticated
# Traffic splitting for canary deployments
gcloud run services update-traffic webapp \
--region=us-central1 \
--to-revisions=webapp-00005-abc=90,webapp-00006-def=10
# Map a custom domain
gcloud run domain-mappings create \
--service=webapp \
--domain=app.example.com \
--region=us-central1
Cloud Run vs Cloud Functions vs GKE
| Feature | Cloud Run | Cloud Functions | GKE |
|---|---|---|---|
| Unit of deployment | Container | Function code | Pods (containers) |
| Scaling | 0 to 1000 instances | 0 to 3000 instances | Node-level |
| Max request timeout | 60 min | 9 min (2nd gen) | Unlimited |
| Minimum instances | 0 (scale to zero) | 0 | 1 node |
| Concurrency | Up to 1000 requests/instance | 1 (1st gen), configurable (2nd gen) | Unlimited |
| Pricing | Per-second CPU/memory | Per-invocation + duration | Per-node (always-on) |
| Complexity | Low | Lowest | High |
| Best for | APIs, web apps | Event handlers, webhooks | Complex microservices, stateful |
Artifact Registry
Artifact Registry replaces Container Registry (gcr.io). It supports Docker images, npm, Maven, Python, Go, Apt, and Yum packages.
# Create a Docker repository
gcloud artifacts repositories create app-images \
--repository-format=docker \
--location=us-central1 \
--description="Production application images" \
--immutable-tags
# Create an npm repository
gcloud artifacts repositories create npm-packages \
--repository-format=npm \
--location=us-central1 \
--description="Internal npm packages"
# Configure Docker to authenticate
gcloud auth configure-docker us-central1-docker.pkg.dev
# Push an image
docker tag webapp:latest us-central1-docker.pkg.dev/prod-web-app-2026/app-images/webapp:v1.2.3
docker push us-central1-docker.pkg.dev/prod-web-app-2026/app-images/webapp:v1.2.3
# Set up vulnerability scanning
gcloud artifacts docker images list \
us-central1-docker.pkg.dev/prod-web-app-2026/app-images \
--show-occurrences \
--format="table(package,version,createTime)"
# Clean up old images
gcloud artifacts docker images delete \
us-central1-docker.pkg.dev/prod-web-app-2026/app-images/webapp:old-tag \
--delete-tags --quiet
Secret Manager
Secret Manager stores API keys, passwords, certificates, and other sensitive data with automatic versioning and IAM-based access control.
# Create a secret
echo -n "my-db-password" | gcloud secrets create db-password --data-file=-
# Access the latest version
gcloud secrets versions access latest --secret=db-password
# Add a new version
echo -n "new-password" | gcloud secrets versions add db-password --data-file=-
# Set up automatic rotation notification
gcloud secrets update db-password \
--topics=projects/prod-web-app-2026/topics/secret-rotation \
--next-rotation-time="2026-06-24T00:00:00Z" \
--rotation-period=7776000s
gcloud CLI Essentials
The gcloud CLI is well-structured and consistent. Master these patterns:
# Authentication
gcloud auth login # Interactive login
gcloud auth application-default login # For local development (SDK auth)
gcloud auth print-access-token # Get current access token
# Configuration and profiles
gcloud config set project prod-web-app-2026
gcloud config set compute/region us-central1
gcloud config set compute/zone us-central1-a
# Named configurations (like AWS profiles)
gcloud config configurations create production
gcloud config configurations activate production
gcloud config configurations list
# Common operations with filtering
gcloud compute instances list \
--filter="status=RUNNING AND labels.env=production" \
--format="table(name,zone,machineType.basename(),networkInterfaces[0].networkIP)"
gcloud container clusters list \
--format="table(name,location,currentMasterVersion,status,currentNodeCount)"
gcloud run services list --platform=managed \
--format="table(SERVICE,REGION,URL,LAST_DEPLOYED_BY)"
# Output formats
gcloud compute instances list --format="json" | jq '.[].name'
gcloud compute instances list --format="csv(name,zone,status)"
gcloud compute instances list --format="value(name)" # Just values, one per line
# Impersonate a service account (for testing permissions)
gcloud auth print-access-token \
--impersonate-service-account=sa-web-backend@prod-web-app-2026.iam.gserviceaccount.com
Cost Management
GCP provides several tools for cost visibility and optimization:
# Export billing data to BigQuery for analysis
bq mk --dataset prod-web-app-2026:billing_export
# Use BigQuery to analyze costs
bq query --use_legacy_sql=false '
SELECT
service.description,
SUM(cost) as total_cost,
SUM(usage.amount) as total_usage,
usage.unit
FROM `prod-web-app-2026.billing_export.gcp_billing_export`
WHERE DATE(usage_start_time) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY service.description, usage.unit
ORDER BY total_cost DESC
LIMIT 20
'
Cost Optimization Strategies
- Sustained Use Discounts -- automatic, no action needed. Applies to N1, N2, and E2 instances.
- Committed Use Discounts -- 37% (1-year) or 55% (3-year) for predictable workloads.
- Spot VMs -- 60-91% savings for interruptible workloads.
- Autopilot for GKE -- pay only for pod resources, no idle node waste.
- Cloud Run scale-to-zero -- no cost when no traffic.
- Right-sizing recommendations -- Compute Engine Recommender suggests optimal machine types.
- Storage lifecycle policies -- automatically transition data to cheaper classes.
- Budget alerts -- set budgets with email and Pub/Sub notifications.
- Billing export to BigQuery -- query your billing data with SQL for deep analysis.
- Active Assist -- GCP's umbrella for all optimization recommendations.
Migration Considerations
When migrating to GCP from other clouds or on-premises:
- Migrate to Virtual Machines -- replicates VMs from on-premises (VMware, AWS, Azure) to Compute Engine.
- Database Migration Service -- supports MySQL, PostgreSQL, SQL Server, and Oracle to Cloud SQL or AlloyDB.
- Transfer Appliance -- physical device for large-scale data transfers (like AWS Snowball).
- Storage Transfer Service -- transfers data from S3, Azure Blob, or on-premises to Cloud Storage.
- BigQuery Data Transfer Service -- automates data movement from SaaS platforms into BigQuery.
- Anthos -- run GKE clusters on-premises, on AWS, or on Azure with a consistent management plane.
GCP rewards engineers who invest in understanding its project model and IAM system. The developer experience is excellent -- services are well-integrated, the CLI is consistent, and the documentation is some of the best in the industry. If you are building on Kubernetes, need strong data analytics capabilities, or value automatic cost optimizations like sustained use discounts, GCP is a compelling choice that continues to close the gap with AWS in breadth while maintaining its lead in developer satisfaction.
Senior Kubernetes Architect
10+ years orchestrating containers in production. Battle-tested opinions on everything from pod scheduling to service mesh. I've seen clusters burn and helped rebuild them better.
Related Articles
Alibaba Cloud for DevOps: ECS, ACK, and the China Cloud Ecosystem
Explore Alibaba Cloud's DevOps ecosystem — ECS compute, ACK Kubernetes, OSS storage, and why it matters for teams operating in China and Asia-Pacific.
AWS Core Services: The DevOps Engineer's Essential Guide
Navigate the essential AWS building blocks — EC2, S3, VPC, IAM, RDS, Lambda, and EKS explained for DevOps engineers with practical examples.
Azure Core Services: The DevOps Engineer's Essential Guide
Understand Azure's essential services — VMs, Storage, VNets, Azure AD (Entra ID), AKS, App Service, and Azure DevOps for infrastructure automation.