Alibaba Cloud for DevOps: ECS, ACK, and the China Cloud Ecosystem
If your infrastructure serves users in mainland China, Alibaba Cloud is not optional -- it is the default choice. China's internet regulations, data residency laws, and the Great Firewall create an environment where AWS, Azure, and GCP either cannot operate fully or deliver subpar performance. Alibaba Cloud is the largest cloud provider in Asia-Pacific and the third largest globally. Understanding it makes you a more complete DevOps engineer, and for companies expanding into Asian markets, it is a requirement, not a nice-to-have.
Why Alibaba Cloud Matters
The China Factor
Running workloads in China is fundamentally different from running them anywhere else. The regulatory and technical landscape creates challenges that only a China-native cloud provider can fully address:
- ICP License Requirement -- to host a website accessible from mainland China, you need an Internet Content Provider (ICP) license filed with the Chinese government. This is a legal requirement with no exceptions. Alibaba Cloud helps facilitate this process through their console, typically taking 2-4 weeks. Without an ICP license, Chinese ISPs will block your domain.
- Data Residency -- China's Cybersecurity Law (2017), Data Security Law (2021), and Personal Information Protection Law (PIPL, 2021) require certain data to remain within Chinese borders. Cross-border data transfers require security assessments. Alibaba Cloud has multiple regions within mainland China (Beijing, Shanghai, Shenzhen, Hangzhou, Zhangjiakou, Hohhot, and more).
- The Great Firewall -- connections to services outside China (including AWS, GCP, Azure global regions) are unreliable and slow. DNS resolution, API calls, package downloads (npm, pip, Docker Hub), and even Git operations all suffer. Response times of 500ms-2000ms to services outside China are common. Within Alibaba Cloud's China regions, latency is typically 1-10ms.
- Content Delivery -- CDNs must have Points of Presence (PoPs) within China to serve Chinese users effectively. International CDNs like CloudFront or Cloudflare perform poorly in China without a separate China configuration.
- Payment Processing -- Alipay and WeChat Pay are the dominant payment methods. Alibaba Cloud integrates natively with these systems.
Beyond China
Alibaba Cloud also competes well in Southeast Asia, the Middle East, and other markets where it has invested heavily in regional infrastructure:
| Region | Locations | Strengths |
|---|---|---|
| China | Beijing, Shanghai, Shenzhen, Hangzhou, Zhangjiakou, Hohhot, Chengdu, Heyuan, Wulanchabu, Nanjing, Fuzhou | Most China regions of any provider |
| Asia-Pacific | Singapore, Jakarta, Mumbai, Hong Kong, Tokyo, Sydney, Kuala Lumpur, Manila | Strong presence, competitive pricing |
| Middle East | Dubai, Riyadh | Growing market, government partnerships |
| Europe | Frankfurt, London | EU data residency compliance |
| Americas | Silicon Valley, Virginia | For China-outbound traffic |
Alibaba Cloud Account Structure
Alibaba Cloud uses a Resource Directory for multi-account governance, similar to AWS Organizations:
Resource Directory
|-- Root Folder
| |-- Folder: Production
| | |-- Account: prod-web (China regions)
| | +-- Account: prod-intl (International regions)
| |-- Folder: Staging
| | +-- Account: staging-all
| +-- Folder: Shared
| +-- Account: shared-services
An important distinction: Alibaba Cloud China and Alibaba Cloud International are separate platforms with separate accounts. A China account (aliyun.com) accesses China regions, while an International account (alibabacloud.com) accesses international regions. You need both if you serve users in China and globally.
ECS: Elastic Compute Service
ECS is Alibaba Cloud's VM service, directly comparable to AWS EC2 or Azure VMs. The service is mature, well-documented, and follows familiar patterns if you have experience with other clouds.
Instance Families
| Family | Use Case | Example | Specs | Approx Cost (cn-shanghai) |
|---|---|---|---|---|
| ecs.t6 | Burstable, dev/test | ecs.t6-c1m2.large | 2 vCPU, 4 GB | ~CNY 0.15/hr |
| ecs.g7 | General purpose | ecs.g7.xlarge | 4 vCPU, 16 GB | ~CNY 0.90/hr |
| ecs.g8i | Latest gen Intel | ecs.g8i.xlarge | 4 vCPU, 16 GB | ~CNY 0.85/hr |
| ecs.c7 | Compute optimized | ecs.c7.2xlarge | 8 vCPU, 16 GB | ~CNY 1.20/hr |
| ecs.r7 | Memory optimized | ecs.r7.xlarge | 4 vCPU, 32 GB | ~CNY 1.10/hr |
| ecs.gn7 | GPU (inference) | ecs.gn7i-c8g1.2xlarge | 8 vCPU, 30 GB, 1 GPU | ~CNY 8.50/hr |
| ecs.ebm | Bare metal | ecs.ebmg7.32xlarge | 128 vCPU, 512 GB | ~CNY 22.00/hr |
The naming convention follows the pattern: ecs.[family][generation].[size]. The g prefix means general purpose, c is compute, r is memory, similar to AWS conventions. The generation number matters -- always choose the latest generation available for best price-performance.
Pricing Options
| Option | Savings | Commitment | Best For |
|---|---|---|---|
| Pay-As-You-Go | 0% (baseline) | None | Unpredictable workloads |
| Subscription | 15-60% | 1 month to 3 years | Predictable production workloads |
| Preemptible Instances | Up to 90% | None (can be reclaimed) | Batch processing, CI/CD |
| Reserved Instances | Up to 55% | 1 or 3 years | Flexible commitment |
| Savings Plans | Up to 57% | 1 or 3 years | Cross-instance flexibility |
Subscription pricing is unique to Alibaba Cloud and common in the China market. You prepay for a fixed period (monthly, yearly) and receive significant discounts. Most production workloads in China use subscription pricing.
Creating and Managing ECS Instances
Alibaba Cloud provides the aliyun CLI (also called Alibaba Cloud CLI):
# Configure the CLI
aliyun configure set \
--profile production \
--mode AK \
--region cn-shanghai \
--access-key-id LTAI5tXXXXXXXXXXXX \
--access-key-secret XXXXXXXXXXXXXXXXXXXXXXXX
# Create an ECS instance
aliyun ecs CreateInstance \
--RegionId cn-shanghai \
--ZoneId cn-shanghai-b \
--InstanceType ecs.g7.xlarge \
--ImageId ubuntu_22_04_x64_20G_alibase_20230907.vhd \
--SecurityGroupId sg-bp1abc123def456 \
--VSwitchId vsw-bp1abc123 \
--InstanceName web-server-01 \
--HostName web-server-01 \
--InternetMaxBandwidthOut 10 \
--SystemDiskCategory cloud_essd \
--SystemDiskSize 50 \
--KeyPairName my-key-pair \
--Tag.1.Key Environment \
--Tag.1.Value production \
--Tag.2.Key Team \
--Tag.2.Value platform
# Start the instance
aliyun ecs StartInstance --InstanceId i-bp1abc123def456
# List running instances
aliyun ecs DescribeInstances \
--RegionId cn-shanghai \
--Status Running \
--output cols=InstanceId,InstanceName,Status,PublicIpAddress
# Stop an instance
aliyun ecs StopInstance --InstanceId i-bp1abc123def456
# Describe instance details
aliyun ecs DescribeInstanceAttribute --InstanceId i-bp1abc123def456
Disk Types
| Disk Type | Max IOPS | Max Throughput | Use Case | Cost (per GB/mo, China) |
|---|---|---|---|---|
| ESSD PL0 | 10,000 | 180 MB/s | Dev/test | ~CNY 0.50 |
| ESSD PL1 | 50,000 | 350 MB/s | Most production workloads | ~CNY 1.00 |
| ESSD PL2 | 100,000 | 750 MB/s | Database workloads | ~CNY 2.00 |
| ESSD PL3 | 1,000,000 | 4,000 MB/s | High-performance databases | ~CNY 4.00 |
| Cloud SSD | 25,000 | 300 MB/s | Standard SSD | ~CNY 1.00 |
| Cloud Efficiency | 5,000 | 140 MB/s | Bulk storage | ~CNY 0.35 |
ESSD (Enhanced SSD) uses NVMe technology backed by RDMA networking. For production workloads, ESSD PL1 is the standard choice. Alibaba Cloud's ESSD performance levels are a notable advantage -- you can scale IOPS independently of capacity.
Terraform Support
Most DevOps teams will manage Alibaba Cloud resources through Terraform, which has a mature Alibaba Cloud provider with excellent coverage:
terraform {
required_providers {
alicloud = {
source = "aliyun/alicloud"
version = "~> 1.220"
}
}
backend "oss" {
bucket = "terraform-state-prod"
prefix = "web-app"
region = "cn-shanghai"
encrypt = true
}
}
provider "alicloud" {
region = "cn-shanghai"
}
resource "alicloud_instance" "web_server" {
instance_name = "web-server-01"
instance_type = "ecs.g7.xlarge"
image_id = "ubuntu_22_04_x64_20G_alibase_20230907.vhd"
security_groups = [alicloud_security_group.web.id]
vswitch_id = alicloud_vswitch.app.id
system_disk_category = "cloud_essd"
system_disk_size = 50
internet_max_bandwidth_out = 10
key_name = alicloud_key_pair.deployer.key_name
user_data = base64encode(file("${path.module}/scripts/bootstrap.sh"))
tags = {
Environment = "production"
Team = "platform"
ManagedBy = "terraform"
}
}
# Auto Scaling Group
resource "alicloud_ess_scaling_group" "web" {
scaling_group_name = "web-scaling-group"
min_size = 2
max_size = 10
desired_capacity = 3
vswitch_ids = [alicloud_vswitch.app_a.id, alicloud_vswitch.app_b.id]
removal_policies = ["OldestScalingConfiguration", "OldestInstance"]
multi_az_policy = "BALANCE"
lifecycle {
ignore_changes = [desired_capacity]
}
}
resource "alicloud_ess_scaling_rule" "scale_out" {
scaling_group_id = alicloud_ess_scaling_group.web.id
scaling_rule_name = "scale-out-cpu"
scaling_rule_type = "TargetTrackingScalingRule"
target_value = 70.0
metric_name = "CpuUtilization"
}
ACK: Container Service for Kubernetes
ACK (Alibaba Cloud Container Service for Kubernetes) is the managed Kubernetes offering. It is fully CNCF-certified and comes in three flavors, each suited to different operational models.
ACK Variants
| Variant | Control Plane | Worker Nodes | Best For | Cost |
|---|---|---|---|---|
| ACK Managed | Alibaba manages | You manage (ECS) | Standard production use | Free control plane + ECS nodes |
| ACK Pro | Alibaba manages (enhanced SLA, etcd backup) | You manage (ECS) | Large-scale, mission-critical | ~CNY 3,600/yr + ECS nodes |
| ACK Serverless | Alibaba manages | Elastic Container Instances | Variable workloads, no node ops | Per-pod pricing |
ACK Pro includes features that matter for production: managed etcd with automatic backups, enhanced monitoring, Sandboxed-Container support for stronger isolation, and 99.95% SLA on the control plane.
ACK Networking
ACK supports two CNI plugins:
- Flannel -- simple overlay network. Pods get IPs from a separate CIDR. Lower performance, simpler setup. Good for small clusters.
- Terway -- Alibaba Cloud's advanced CNI. Pods get real VPC IP addresses (like AWS VPC CNI). Supports network policies natively. Better performance and security. Recommended for production.
# Create a managed Kubernetes cluster via CLI
aliyun cs CreateCluster \
--ClusterType ManagedKubernetes \
--Name ack-production \
--RegionId cn-shanghai \
--ZoneId cn-shanghai-b \
--VpcId vpc-bp1abc123 \
--VSwitchIds '["vsw-bp1abc123"]' \
--ContainerCidr 172.20.0.0/16 \
--ServiceCidr 172.21.0.0/20 \
--NumOfNodes 3 \
--WorkerInstanceTypes '["ecs.g7.xlarge"]' \
--WorkerSystemDiskCategory cloud_essd \
--WorkerSystemDiskSize 120 \
--KeyPair my-key-pair \
--SnatEntry true \
--Addons '[{"name":"terway-eniip"},{"name":"csi-plugin"},{"name":"csi-provisioner"},{"name":"nginx-ingress-controller","config":"{\"IngressSlbNetworkType\":\"intranet\"}"}]'
With Terraform:
resource "alicloud_cs_managed_kubernetes" "production" {
name = "ack-production"
cluster_spec = "ack.pro.small"
version = "1.28.9-aliyun.1"
pod_cidr = "172.20.0.0/16"
service_cidr = "172.21.0.0/20"
slb_internet_enabled = false
worker_vswitch_ids = [
alicloud_vswitch.app_a.id,
alicloud_vswitch.app_b.id,
]
dynamic "addons" {
for_each = [
{ name = "terway-eniip", config = "" },
{ name = "csi-plugin", config = "" },
{ name = "csi-provisioner", config = "" },
{ name = "nginx-ingress-controller", config = jsonencode({ IngressSlbNetworkType = "intranet" }) },
{ name = "arms-prometheus", config = "" }
]
content {
name = addons.value.name
config = addons.value.config
}
}
maintenance_window {
enable = true
maintenance_time = "04:00:00Z"
duration = "4h"
weekly_period = "Saturday"
}
}
resource "alicloud_cs_kubernetes_node_pool" "workers" {
cluster_id = alicloud_cs_managed_kubernetes.production.id
name = "worker-pool"
vswitch_ids = [alicloud_vswitch.app_a.id, alicloud_vswitch.app_b.id]
instance_types = ["ecs.g7.xlarge"]
system_disk_category = "cloud_essd"
system_disk_size = 120
desired_size = 3
key_name = alicloud_key_pair.deployer.key_name
scaling_config {
min_size = 2
max_size = 10
}
labels = {
"workload-type" = "production"
}
taints {
key = "dedicated"
value = "production"
effect = "NoSchedule"
}
management {
auto_repair = true
auto_upgrade = true
max_unavailable = 1
}
}
# Spot node pool for batch workloads
resource "alicloud_cs_kubernetes_node_pool" "spot_workers" {
cluster_id = alicloud_cs_managed_kubernetes.production.id
name = "spot-pool"
vswitch_ids = [alicloud_vswitch.app_a.id]
instance_types = ["ecs.g7.xlarge", "ecs.g7.2xlarge"]
desired_size = 0
spot_strategy = "SpotWithPriceLimit"
spot_price_limit {
instance_type = "ecs.g7.xlarge"
price_limit = "0.5"
}
scaling_config {
min_size = 0
max_size = 20
}
labels = {
"workload-type" = "batch"
}
taints {
key = "spot"
value = "true"
effect = "NoSchedule"
}
}
Kubernetes Cross-Cloud Comparison
| Feature | ACK (Alibaba) | EKS (AWS) | AKS (Azure) | GKE (GCP) |
|---|---|---|---|---|
| Control plane cost | Free (Managed) / ~CNY 3,600/yr (Pro) | $73/mo | Free | Free (Autopilot) / $73/mo (Standard) |
| Pod networking | Terway (VPC IPs) or Flannel | VPC CNI | Azure CNI or Kubenet | GKE VPC-native |
| Serverless pods | ECI | Fargate | ACI | Autopilot |
| Max nodes | 5,000 | 5,000 | 5,000 | 15,000 |
| China regions | 10+ | 2 (Beijing, Ningxia via NWCD/Sinnet) | 3 (China East, North, East 2) | 0 |
| Container runtime | containerd | containerd | containerd | containerd |
| Sandboxed containers | Yes (runV) | No (native) | No (native) | GKE Sandbox (gVisor) |
OSS: Object Storage Service
OSS is Alibaba Cloud's object storage, equivalent to AWS S3. It supports the same concepts: buckets, objects, storage classes, lifecycle policies, and cross-region replication. OSS also provides an S3-compatible API, making migrations from AWS easier.
Storage Classes
| Class | Use Case | Minimum Storage | Monthly Cost (per GB, cn-shanghai) | Retrieval Fee |
|---|---|---|---|---|
| Standard | Frequent access | None | ~CNY 0.12 | None |
| Infrequent Access | Monthly access | 30 days | ~CNY 0.08 | CNY 0.0325/GB |
| Archive | Quarterly access | 60 days | ~CNY 0.033 | CNY 0.06/GB (1 min to restore) |
| Cold Archive | Rare access | 180 days | ~CNY 0.015 | CNY 0.10/GB (1-5 hours to restore) |
| Deep Cold Archive | Extremely rare | 180 days | ~CNY 0.0075 | CNY 0.14/GB (12 hours to restore) |
OSS Operations
# Create a bucket
aliyun oss mb oss://prod-app-data --region cn-shanghai --storage-class Standard
# Upload files
aliyun oss cp ./dist/ oss://prod-app-data/assets/ --recursive
# Sync a directory (like aws s3 sync)
aliyun oss sync ./build/ oss://prod-app-data/static/ --delete --include '*.js' --include '*.css'
# Download files
aliyun oss cp oss://prod-app-data/config/app.yaml ./config/
# Set lifecycle rules
aliyun oss bucket-lifecycle --method put oss://prod-app-logs \
--lifecycle '{
"Rule": [
{
"ID": "ArchiveOldLogs",
"Prefix": "logs/",
"Status": "Enabled",
"Transition": [
{ "Days": 30, "StorageClass": "IA" },
{ "Days": 90, "StorageClass": "Archive" },
{ "Days": 365, "StorageClass": "ColdArchive" }
],
"Expiration": { "Days": 1095 }
}
]
}'
# Enable versioning
aliyun oss bucket-versioning --method put oss://prod-terraform-state --versioning-configuration Enabled
# Configure cross-region replication for DR
aliyun oss bucket-replication --method put oss://prod-app-data \
--replication-configuration '{
"Rule": {
"Action": "ALL",
"Destination": {
"Bucket": "prod-app-data-dr",
"Location": "oss-cn-beijing"
}
}
}'
With Terraform:
resource "alicloud_oss_bucket" "app_data" {
bucket = "prod-app-data"
acl = "private"
server_side_encryption_rule {
sse_algorithm = "AES256"
}
lifecycle_rule {
id = "archive-old-data"
enabled = true
prefix = "logs/"
transitions {
days = 30
storage_class = "IA"
}
transitions {
days = 90
storage_class = "Archive"
}
expiration {
days = 1095
}
}
versioning {
status = "Enabled"
}
cors_rule {
allowed_origins = ["https://app.example.com"]
allowed_methods = ["GET", "HEAD"]
allowed_headers = ["*"]
max_age_seconds = 3600
}
tags = {
Environment = "production"
Team = "platform"
}
}
S3 Compatibility
OSS provides an S3-compatible endpoint, which means tools like aws s3, boto3, and other S3 SDKs can work with OSS by changing the endpoint:
# Use AWS CLI with OSS (S3-compatible endpoint)
aws s3 ls s3://prod-app-data \
--endpoint-url https://oss-cn-shanghai.aliyuncs.com
aws s3 cp ./file.txt s3://prod-app-data/uploads/ \
--endpoint-url https://oss-cn-shanghai.aliyuncs.com
This compatibility simplifies migrations and allows teams to use familiar tools.
VPC and Networking
Alibaba Cloud VPCs follow the same regional model as AWS. A VPC contains VSwitches (their term for subnets), and security groups control traffic. Each VSwitch maps to a single availability zone.
Network Architecture
VPC: 10.0.0.0/8 (vpc-production, cn-shanghai)
|-- VSwitch: vsw-web-a (10.0.1.0/24) -- cn-shanghai-a
|-- VSwitch: vsw-web-b (10.0.2.0/24) -- cn-shanghai-b
|-- VSwitch: vsw-app-a (10.0.11.0/24) -- cn-shanghai-a
|-- VSwitch: vsw-app-b (10.0.12.0/24) -- cn-shanghai-b
|-- VSwitch: vsw-data-a (10.0.21.0/24) -- cn-shanghai-a
|-- VSwitch: vsw-data-b (10.0.22.0/24) -- cn-shanghai-b
+-- VSwitch: vsw-k8s (10.0.32.0/20) -- cn-shanghai-b
VPC and Security Groups with Terraform
resource "alicloud_vpc" "production" {
vpc_name = "vpc-production"
cidr_block = "10.0.0.0/8"
tags = {
Environment = "production"
}
}
resource "alicloud_vswitch" "app_a" {
vswitch_name = "vsw-app-a"
vpc_id = alicloud_vpc.production.id
cidr_block = "10.0.11.0/24"
zone_id = "cn-shanghai-a"
}
resource "alicloud_vswitch" "app_b" {
vswitch_name = "vsw-app-b"
vpc_id = alicloud_vpc.production.id
cidr_block = "10.0.12.0/24"
zone_id = "cn-shanghai-b"
}
resource "alicloud_security_group" "web" {
name = "sg-web-servers"
vpc_id = alicloud_vpc.production.id
description = "Security group for web-facing servers"
}
resource "alicloud_security_group_rule" "allow_https" {
type = "ingress"
ip_protocol = "tcp"
port_range = "443/443"
cidr_ip = "0.0.0.0/0"
security_group_id = alicloud_security_group.web.id
description = "Allow HTTPS from internet"
}
resource "alicloud_security_group_rule" "allow_http" {
type = "ingress"
ip_protocol = "tcp"
port_range = "80/80"
cidr_ip = "0.0.0.0/0"
security_group_id = alicloud_security_group.web.id
description = "Allow HTTP from internet (redirect to HTTPS)"
}
resource "alicloud_security_group_rule" "allow_internal" {
type = "ingress"
ip_protocol = "tcp"
port_range = "1/65535"
cidr_ip = "10.0.0.0/8"
security_group_id = alicloud_security_group.web.id
description = "Allow all internal VPC traffic"
}
# NAT Gateway for private subnet internet access
resource "alicloud_nat_gateway" "production" {
vpc_id = alicloud_vpc.production.id
nat_gateway_name = "nat-production"
payment_type = "PayAsYouGo"
vswitch_id = alicloud_vswitch.app_a.id
nat_type = "Enhanced"
}
resource "alicloud_eip_address" "nat" {
address_name = "eip-nat-production"
bandwidth = 200
payment_type = "PayAsYouGo"
}
resource "alicloud_eip_association" "nat" {
allocation_id = alicloud_eip_address.nat.id
instance_id = alicloud_nat_gateway.production.id
instance_type = "Nat"
}
resource "alicloud_snat_entry" "app" {
snat_table_id = alicloud_nat_gateway.production.snat_table_ids
source_vswitch_id = alicloud_vswitch.app_a.id
snat_ip = alicloud_eip_address.nat.ip_address
}
VPC Peering and CEN
For multi-VPC and multi-region connectivity, Alibaba Cloud offers Cloud Enterprise Network (CEN), equivalent to AWS Transit Gateway. CEN provides a global network mesh that connects VPCs across regions with automatic route distribution.
resource "alicloud_cen_instance" "global_network" {
cen_instance_name = "cen-global"
description = "Global network for all production VPCs"
}
resource "alicloud_cen_instance_attachment" "shanghai" {
instance_id = alicloud_cen_instance.global_network.id
child_instance_id = alicloud_vpc.production.id
child_instance_type = "VPC"
child_instance_region_id = "cn-shanghai"
}
resource "alicloud_cen_instance_attachment" "beijing" {
instance_id = alicloud_cen_instance.global_network.id
child_instance_id = alicloud_vpc.production_beijing.id
child_instance_type = "VPC"
child_instance_region_id = "cn-beijing"
}
Networking Comparison
| Feature | Alibaba Cloud | AWS | Azure | GCP |
|---|---|---|---|---|
| Virtual network | VPC (regional) | VPC (regional) | VNet (regional) | VPC (global) |
| Subnet | VSwitch | Subnet | Subnet | Subnet (regional) |
| Firewall | Security Groups | Security Groups + NACLs | NSGs | Firewall Rules |
| NAT | NAT Gateway | NAT Gateway | NAT Gateway | Cloud NAT |
| Global transit | CEN | Transit Gateway | Virtual WAN | VPC (native global) |
| Private link | PrivateLink | PrivateLink | Private Endpoint | Private Service Connect |
| DDoS | Anti-DDoS Basic/Pro | AWS Shield | Azure DDoS Protection | Cloud Armor |
| DNS | Alibaba Cloud DNS | Route 53 | Azure DNS | Cloud DNS |
SLB: Server Load Balancer
SLB is Alibaba Cloud's load balancing service. The product line has evolved to include multiple options:
| Product | Layer | Scope | Use Case |
|---|---|---|---|
| CLB (Classic LB) | L4/L7 | Regional | Legacy, basic load balancing |
| ALB (Application LB) | L7 | Regional | Advanced HTTP routing, WAF integration |
| NLB (Network LB) | L4 | Regional | High-performance TCP/UDP |
| GA (Global Accelerator) | L4/L7 | Global | Cross-region acceleration, anycast |
For new deployments, use ALB for HTTP/HTTPS traffic and NLB for TCP/UDP. CLB is being superseded.
resource "alicloud_alb_load_balancer" "web" {
vpc_id = alicloud_vpc.production.id
address_type = "Internet"
address_allocated_mode = "Dynamic"
load_balancer_name = "alb-web-production"
load_balancer_edition = "Standard"
load_balancer_billing_config {
pay_type = "PayAsYouGo"
}
zone_mappings {
vswitch_id = alicloud_vswitch.app_a.id
zone_id = "cn-shanghai-a"
}
zone_mappings {
vswitch_id = alicloud_vswitch.app_b.id
zone_id = "cn-shanghai-b"
}
tags = {
Environment = "production"
}
}
resource "alicloud_alb_server_group" "web" {
server_group_name = "sg-web-production"
vpc_id = alicloud_vpc.production.id
protocol = "HTTPS"
health_check_config {
health_check_enabled = true
health_check_path = "/health"
health_check_codes = ["http_2xx", "http_3xx"]
health_check_interval = 5
healthy_threshold = 3
unhealthy_threshold = 3
}
sticky_session_config {
sticky_session_enabled = true
sticky_session_type = "Insert"
cookie_timeout = 3600
}
}
RAM: Resource Access Management
RAM is Alibaba Cloud's identity and access management service, equivalent to AWS IAM. The policy language is similar but simpler.
Key Concepts
| Concept | Alibaba Cloud RAM | AWS IAM Equivalent |
|---|---|---|
| User | RAM User | IAM User |
| Group | RAM Group | IAM Group |
| Role | RAM Role | IAM Role |
| Policy | RAM Policy | IAM Policy |
| Instance Role | Instance RAM Role | Instance Profile |
| STS | STS (Security Token Service) | STS |
| SSO | IDaaS / SAML 2.0 | IAM Identity Center |
# Create a RAM user for CI/CD
aliyun ram CreateUser --UserName cicd-deployer --DisplayName "CI/CD Deployer"
# Create an access key for the user
aliyun ram CreateAccessKey --UserName cicd-deployer
# Create a custom policy with least privilege
aliyun ram CreatePolicy \
--PolicyName CustomDeployPolicy \
--PolicyDocument '{
"Version": "1",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ecs:DescribeInstances",
"ecs:StartInstance",
"ecs:StopInstance",
"oss:GetObject",
"oss:PutObject",
"oss:ListObjects",
"cs:GetClusterById",
"cs:GetUserClusterKubeConfig",
"cr:GetRepository",
"cr:PushImage"
],
"Resource": "*",
"Condition": {
"IpAddress": {
"acs:SourceIp": ["203.0.113.0/24"]
}
}
}
]
}'
# Attach the policy to the user
aliyun ram AttachPolicyToUser \
--PolicyType Custom \
--PolicyName CustomDeployPolicy \
--UserName cicd-deployer
Instance RAM Roles
For production workloads, use RAM Roles (instance roles) instead of access keys, just like AWS IAM roles for EC2:
resource "alicloud_ram_role" "ecs_role" {
name = "ECSAppServerRole"
description = "Role for application servers"
document = jsonencode({
Version = "1"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = ["ecs.aliyuncs.com"]
}
}
]
})
}
resource "alicloud_ram_policy" "app_policy" {
policy_name = "AppServerPolicy"
policy_document = jsonencode({
Version = "1"
Statement = [
{
Effect = "Allow"
Action = [
"oss:GetObject",
"oss:ListObjects",
"kms:Decrypt",
"log:PostLogStoreLogs"
]
Resource = "*"
}
]
})
}
resource "alicloud_ram_role_policy_attachment" "app_attach" {
role_name = alicloud_ram_role.ecs_role.name
policy_name = alicloud_ram_policy.app_policy.policy_name
policy_type = "Custom"
}
STS for Temporary Credentials
# Assume a role for temporary credentials
aliyun sts AssumeRole \
--RoleArn acs:ram::123456789012:role/CrossAccountRole \
--RoleSessionName deploy-session \
--DurationSeconds 3600
ApsaraDB RDS
Alibaba Cloud's managed database service supports MySQL, PostgreSQL, SQL Server, and MariaDB. It is feature-rich and competitively priced, especially in China regions.
resource "alicloud_db_instance" "production" {
engine = "PostgreSQL"
engine_version = "15.0"
instance_type = "pg.x2.medium.2c"
instance_storage = 100
instance_charge_type = "Postpaid"
vswitch_id = alicloud_vswitch.data_a.id
security_ips = ["10.0.0.0/8"]
db_instance_storage_type = "cloud_essd"
parameters {
name = "max_connections"
value = "500"
}
tags = {
Environment = "production"
Team = "platform"
}
}
resource "alicloud_db_readonly_instance" "replica" {
master_db_instance_id = alicloud_db_instance.production.id
engine_version = "15.0"
instance_type = "pg.x2.medium.2c"
instance_storage = 100
vswitch_id = alicloud_vswitch.data_b.id
db_instance_storage_type = "cloud_essd"
}
Database Comparison
| Feature | ApsaraDB RDS | PolarDB | AnalyticDB |
|---|---|---|---|
| Type | Standard RDBMS | Cloud-native RDBMS | OLAP/HTAP |
| Engines | MySQL, PostgreSQL, SQL Server, MariaDB | MySQL, PostgreSQL, Oracle-compatible | MySQL, PostgreSQL |
| Max storage | 32 TB | 128 TB | Unlimited |
| Read replicas | Up to 5 | Up to 15 (shared storage) | N/A |
| Failover time | 30 seconds | Seconds (shared storage) | N/A |
| Best for | Standard workloads | High-performance, large-scale | Analytics, reporting |
Container Registry (ACR)
Alibaba Cloud Container Registry provides Docker image hosting, vulnerability scanning, and image signing. The Enterprise Edition includes geo-replication across regions -- critical for multi-region deployments in China.
ACR Editions
| Feature | Personal Edition | Enterprise Basic | Enterprise Standard | Enterprise Advanced |
|---|---|---|---|---|
| Private repos | 300 | 1,000 | 5,000 | Unlimited |
| Image scanning | No | Yes | Yes | Yes |
| Geo-replication | No | No | Yes (3 regions) | Yes (unlimited) |
| Image signing | No | No | No | Yes |
| Cost | Free | ~CNY 60/mo | ~CNY 200/mo | ~CNY 400/mo |
# Login to ACR
docker login --username=your-username registry.cn-shanghai.aliyuncs.com
# Tag and push
docker tag webapp:latest registry.cn-shanghai.aliyuncs.com/myorg/webapp:v1.2.3
docker push registry.cn-shanghai.aliyuncs.com/myorg/webapp:v1.2.3
# List images
aliyun cr GetRepoTags \
--RepoNamespace myorg \
--RepoName webapp \
--output cols=tag,imageCreate,imageSize
With Terraform (Enterprise Edition):
resource "alicloud_cr_ee_instance" "registry" {
instance_type = "Standard"
instance_name = "acr-production"
payment_type = "Subscription"
renewal_status = "AutoRenewal"
period = 12
}
resource "alicloud_cr_ee_namespace" "production" {
instance_id = alicloud_cr_ee_instance.registry.id
name = "production"
auto_create = true
default_visibility = "PRIVATE"
}
resource "alicloud_cr_ee_repo" "webapp" {
instance_id = alicloud_cr_ee_instance.registry.id
namespace = alicloud_cr_ee_namespace.production.name
name = "webapp"
repo_type = "PRIVATE"
summary = "Production web application"
}
DevOps Pipeline on Alibaba Cloud
Alibaba Cloud offers Flow (their native CI/CD platform), but most international teams use familiar tools with Alibaba Cloud integrations. The key challenge is network access -- builds running outside China will be slow pulling images and packages from Chinese registries.
GitHub Actions with Alibaba Cloud
name: Deploy to Alibaba Cloud
on:
push:
branches: [main]
env:
REGION: cn-shanghai
ACR_REGISTRY: registry.cn-shanghai.aliyuncs.com
ACR_NAMESPACE: production
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'
- run: npm ci && npm run lint && npm test
build-and-push:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Login to ACR
uses: docker/login-action@v3
with:
registry: ${{ env.ACR_REGISTRY }}
username: ${{ secrets.ACR_USERNAME }}
password: ${{ secrets.ACR_PASSWORD }}
- name: Build and Push
uses: docker/build-push-action@v5
with:
push: true
tags: |
${{ env.ACR_REGISTRY }}/${{ env.ACR_NAMESPACE }}/webapp:${{ github.sha }}
${{ env.ACR_REGISTRY }}/${{ env.ACR_NAMESPACE }}/webapp:latest
cache-from: type=registry,ref=${{ env.ACR_REGISTRY }}/${{ env.ACR_NAMESPACE }}/webapp:latest
cache-to: type=inline
deploy:
needs: build-and-push
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up aliyun CLI
uses: aliyun/setup-cli@v1
with:
aliyun-access-key-id: ${{ secrets.ALICLOUD_ACCESS_KEY }}
aliyun-access-key-secret: ${{ secrets.ALICLOUD_SECRET_KEY }}
aliyun-region: ${{ env.REGION }}
- name: Deploy to ACK
run: |
# Get kubeconfig
aliyun cs GetUserClusterKubeConfig \
--ClusterId ${{ secrets.ACK_CLUSTER_ID }} \
| jq -r '.config' > kubeconfig
export KUBECONFIG=./kubeconfig
# Update deployment image
kubectl set image deployment/webapp \
webapp=${{ env.ACR_REGISTRY }}/${{ env.ACR_NAMESPACE }}/webapp:${{ github.sha }} \
-n production
# Wait for rollout
kubectl rollout status deployment/webapp -n production --timeout=300s
Dealing with Network Challenges
When your CI/CD runs outside China (e.g., GitHub Actions), consider these strategies:
- Mirror npm/pip registries -- Use Alibaba Cloud's npm mirror (npmmirror.com) and pip mirror in your Dockerfile.
- Use ACR image acceleration -- Enterprise Edition ACR supports acceleration from international networks.
- Multi-stage builds -- Build outside China, push to an international ACR region, then use geo-replication to sync to China regions.
- Self-hosted runners in China -- Run GitHub Actions runners on ECS instances in China for the fastest builds.
# Dockerfile optimized for China builds
FROM node:20-alpine
# Use Alibaba Cloud npm mirror for faster package installation
RUN npm config set registry https://registry.npmmirror.com
# Use Alibaba Cloud Alpine mirror
RUN sed -i 's/dl-cdn.alpinelinux.org/mirrors.aliyun.com/g' /etc/apk/repositories
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
EXPOSE 8080
CMD ["node", "server.js"]
AWS to Alibaba Cloud Service Mapping
If you are coming from AWS, this mapping helps you navigate Alibaba Cloud:
| AWS Service | Alibaba Cloud Equivalent | Notes |
|---|---|---|
| EC2 | ECS | Very similar API model |
| S3 | OSS | S3-compatible API available |
| VPC | VPC | Nearly identical concepts (VSwitches instead of Subnets) |
| IAM | RAM | Similar but simpler policy language |
| EKS | ACK | ACK Pro adds enhanced SLA |
| RDS | ApsaraDB RDS | Supports MySQL, PostgreSQL, SQL Server |
| Aurora | PolarDB | Cloud-native distributed database |
| Lambda | Function Compute | Supports Node.js, Python, Java, Go, PHP, C# |
| CloudWatch | CloudMonitor + SLS (Log Service) | SLS is particularly powerful for log analytics |
| Route 53 | Alibaba Cloud DNS | ICP filing integrated |
| CloudFront | Alibaba Cloud CDN / DCDN | Essential for China delivery, DCDN adds edge compute |
| ELB/ALB | CLB/ALB/NLB | CLB (legacy), ALB for L7, NLB for L4 |
| ECR | ACR | Enterprise edition has geo-replication |
| Systems Manager | Cloud Assistant | Remote command execution on ECS |
| CloudFormation | ROS (Resource Orchestration) | Or just use Terraform (recommended) |
| CodePipeline | Flow | Most international teams use GitHub Actions |
| KMS | KMS | Key management and encryption |
| ElastiCache | Tair (ApsaraDB for Redis) | Redis-compatible, enhanced features |
| SQS | Message Queue (MQ) | Supports MQTT, RocketMQ, Kafka, RabbitMQ |
| API Gateway | API Gateway | Includes China-specific auth integrations |
| WAF | Web Application Firewall | Includes China-specific threat intelligence |
| Transit Gateway | CEN (Cloud Enterprise Network) | Global network mesh |
Cost Optimization on Alibaba Cloud
Pricing Differences from Western Clouds
Alibaba Cloud pricing in China regions is typically 20-40% lower than equivalent AWS services in the US. However, international regions are competitively priced with or slightly above AWS.
Key Strategies
- Use Subscription pricing for predictable workloads. 1-year subscriptions save 15-30%, 3-year saves 40-60%.
- Preemptible instances for batch and CI/CD. Up to 90% savings.
- Reserved Instances for flexible commitment across instance types.
- Storage tiering with OSS lifecycle policies. Cold data in Archive or Cold Archive is extremely cheap.
- Right-size ECS instances. CloudMonitor provides utilization reports.
- Use CEN instead of multiple EIPs for inter-region traffic. CEN pricing is more predictable.
- Alibaba Cloud CDN for content delivery instead of origin-pull. Bandwidth in China is expensive; CDN offloads significantly.
- Spot instances for ACK node pools. Scale batch workloads on spot nodes.
# Check current costs
aliyun bssopenapi QueryAccountBill \
--BillingCycle 2026-03 \
--Granularity MONTHLY \
--output cols=ProductCode,AdjustAmount,Currency
When to Choose Alibaba Cloud
Choose Alibaba Cloud when:
- Your users are in mainland China. No other cloud provider can match the performance, compliance tooling, and regulatory support. This is the number one reason.
- You need ICP filing support. Alibaba Cloud streamlines the process and offers guidance through the bureaucratic requirements.
- Data must stay in China. Alibaba Cloud has more China regions and availability zones than any competitor, with full compliance tooling for PIPL and Cybersecurity Law.
- You serve the Asia-Pacific market. Strong presence in Singapore, Indonesia, Malaysia, and Hong Kong with competitive pricing.
- Your organization has an existing Alibaba ecosystem relationship (e.g., using DingTalk, Tmall, or other Alibaba services).
- You need global acceleration into China. Alibaba Cloud's Global Accelerator (GA) provides optimized routing from international users to China origins.
Be cautious when:
- Your team has zero Chinese language capability -- some documentation and console sections are Chinese-first, especially for newer services. The English documentation has gaps.
- You need deep integration with Western SaaS tools (Datadog, PagerDuty, etc.) -- integrations exist but are less mature. Consider using Alibaba Cloud's native monitoring (CloudMonitor, ARMS, SLS) instead.
- You are running exclusively in North America or Europe with no APAC presence -- AWS or Azure will be more cost-effective and better supported.
- You need cutting-edge AI/ML services -- while Alibaba Cloud has ML services (PAI), the ecosystem is smaller than AWS SageMaker or GCP Vertex AI for English-language users.
Migration Path to Alibaba Cloud
For teams migrating to Alibaba Cloud (typically for China market entry):
- Start with Terraform. The alicloud provider has excellent coverage. Write your infrastructure as code from day one.
- Mirror your container images to ACR before deploying. Docker Hub is unreliable from China.
- Set up a private npm/pip mirror using Alibaba Cloud's mirrors or self-hosted solutions.
- Plan your network architecture including CEN if you need multi-region. China bandwidth is expensive -- minimize cross-region data transfer.
- Apply for ICP filing early. It takes 2-4 weeks and blocks your launch date if delayed.
- Test the Great Firewall impact on your application. If your app calls external APIs, those calls may fail or be very slow from China.
- Use China-specific CDN configuration. A global CDN configuration will not perform well in China.
Alibaba Cloud is not a niche provider. It serves millions of businesses and handles the infrastructure behind Singles' Day (11.11), the world's largest online shopping event, which processes over 580,000 transactions per second at peak. For DevOps engineers working with global infrastructure, understanding Alibaba Cloud is a genuine competitive advantage, especially as more companies expand into Asian markets. The skills transfer well from AWS and Azure, and the Terraform provider makes the transition manageable.
Senior Kubernetes Architect
10+ years orchestrating containers in production. Battle-tested opinions on everything from pod scheduling to service mesh. I've seen clusters burn and helped rebuild them better.
Related Articles
AWS Core Services: The DevOps Engineer's Essential Guide
Navigate the essential AWS building blocks — EC2, S3, VPC, IAM, RDS, Lambda, and EKS explained for DevOps engineers with practical examples.
Azure Core Services: The DevOps Engineer's Essential Guide
Understand Azure's essential services — VMs, Storage, VNets, Azure AD (Entra ID), AKS, App Service, and Azure DevOps for infrastructure automation.
GCP Core Services: The DevOps Engineer's Essential Guide
Learn GCP's core services — Compute Engine, GKE, Cloud Storage, VPC, IAM, Cloud Build, and Cloud Run for modern DevOps workflows.