Linux Shell Scripting: Bash Automation Essentials
Bash scripting is the glue that holds DevOps together. Infrastructure as code tools handle provisioning, CI/CD handles pipelines, but there is always a gap that only a well-written shell script can fill: health checks, log rotation, deployment orchestration, data migration helpers, and a hundred other operational tasks. Whether you are managing a single VPS or orchestrating hundreds of containers, Bash is the universal language every server speaks. This guide covers everything from the fundamentals to production-grade patterns, with real examples you can adapt immediately.
Script Basics
The Shebang Line
Every script starts with a shebang line that tells the kernel which interpreter to use:
#!/bin/bash
Or, for portability across systems where Bash might not live at /bin/bash (FreeBSD, NixOS, some containers):
#!/usr/bin/env bash
The env form searches $PATH for the bash binary, which makes it resilient to different installation paths. Use this as your default.
Making Scripts Executable and Running Them
There are several ways to execute a script:
# Make it executable, then run directly
chmod +x myscript.sh
./myscript.sh
# Run it explicitly with bash (no chmod needed)
bash myscript.sh
# Source it into the current shell (variables persist in your session)
source myscript.sh
. myscript.sh
When you source a script with . or source, it runs in your current shell process. That means any variables it sets, any cd commands it runs, and any functions it defines all affect your current session. Running a script with ./ or bash spawns a child process, which is almost always what you want for automation scripts.
Exit Codes
Every command in Bash returns an integer exit code between 0 and 255. Zero means success, anything else means failure. This is fundamental to how conditionals, loops, and error handling work:
# Check if a command succeeded
grep "ERROR" /var/log/app.log
echo $? # 0 if pattern found, 1 if not found, 2 if file error
# Set your own exit codes
exit 0 # Success
exit 1 # General error
exit 2 # Misuse of command
exit 127 # Command not found (convention)
# Use exit codes in conditionals
if grep -q "ERROR" /var/log/app.log; then
echo "Errors found in log"
fi
The $? variable always holds the exit code of the most recently executed command. You will see it everywhere in shell scripting.
Variables
Defining and Using Variables
Variable assignment in Bash has a strict syntax: no spaces around the = sign.
# Correct
APP_NAME="myapp"
VERSION="2.3.1"
PORT=3000
# WRONG: these are errors
APP_NAME = "myapp" # Bash interprets APP_NAME as a command
PORT =3000 # Same problem
# Using variables (always quote to protect against word splitting)
echo "Deploying $APP_NAME version $VERSION"
echo "Listening on port ${PORT}"
The curly braces in ${PORT} are optional when the variable name is unambiguous, but they become essential when you need to concatenate: "${APP_NAME}_config" works, while "$APP_NAME_config" tries to reference a variable called APP_NAME_config.
Environment vs Local Variables
# Environment variable: visible to child processes
export DATABASE_URL="postgresql://localhost:5432/mydb"
# Regular variable: only exists in the current shell
local_var="only here"
# Inside functions, use the local keyword for proper scoping
process_item() {
local item="$1"
local result=""
result=$(transform "$item")
echo "$result"
}
Readonly Variables
For constants that should never change during script execution:
readonly CONFIG_DIR="/etc/myapp"
readonly MAX_RETRIES=5
readonly LOG_FILE="/var/log/myapp/deploy.log"
# Attempting to change a readonly variable produces an error
CONFIG_DIR="/tmp" # bash: CONFIG_DIR: readonly variable
Special Variables
| Variable | Meaning |
|---|---|
$0 | Script name |
$1, $2, ... | Positional arguments |
$# | Number of arguments |
$@ | All arguments (preserves quoting when double-quoted) |
$* | All arguments as a single string |
$? | Exit code of last command |
$$ | PID of current script |
$! | PID of last background process |
$_ | Last argument of previous command |
$LINENO | Current line number in the script |
$FUNCNAME | Current function name |
Arrays
# Indexed array
SERVERS=("web01" "web02" "web03" "web04")
# Access elements
echo "${SERVERS[0]}" # web01
echo "${SERVERS[@]}" # All elements
echo "${#SERVERS[@]}" # Array length: 4
echo "${SERVERS[@]:1:2}" # Slice: web02 web03
# Append elements
SERVERS+=("web05")
SERVERS+=("web06" "web07")
# Remove an element (leaves a gap in indices)
unset 'SERVERS[2]'
# Iterate safely (always double-quote)
for server in "${SERVERS[@]}"; do
echo "Deploying to $server"
done
Associative Arrays
Associative arrays (Bash 4+) let you use strings as keys:
declare -A SERVICE_PORTS
SERVICE_PORTS[web]=80
SERVICE_PORTS[api]=3000
SERVICE_PORTS[db]=5432
SERVICE_PORTS[cache]=6379
# Access a single value
echo "API runs on port ${SERVICE_PORTS[api]}"
# Iterate over keys
for service in "${!SERVICE_PORTS[@]}"; do
echo "$service runs on port ${SERVICE_PORTS[$service]}"
done
# Check if a key exists
if [[ -v SERVICE_PORTS[web] ]]; then
echo "Web service port is configured"
fi
String Operations
Concatenation and Length
FIRST="Hello"
SECOND="World"
# Concatenation is just placing strings next to each other
GREETING="${FIRST} ${SECOND}"
echo "$GREETING" # Hello World
# String length
echo "${#GREETING}" # 11
Substring Extraction
VERSION="v2.3.1-beta"
echo "${VERSION:1}" # 2.3.1-beta (skip first char)
echo "${VERSION:1:5}" # 2.3.1 (5 chars starting at position 1)
echo "${VERSION: -4}" # beta (last 4 chars, note the space before -)
Pattern-Based Removal and Replacement
FILE="backup-2026-03-23.tar.gz"
# Remove shortest match from start
echo "${FILE#*.}" # tar.gz
# Remove longest match from start
echo "${FILE##*.}" # gz (just the extension)
# Remove shortest match from end
echo "${FILE%.*}" # backup-2026-03-23.tar
# Remove longest match from end
echo "${FILE%%.*}" # backup-2026-03-23
# Replace first occurrence
echo "${FILE/2026/2027}" # backup-2027-03-23.tar.gz
# Replace all occurrences
echo "${FILE//[0-9]/X}" # backup-XXXX-XX-XX.tar.gz
# Remove a pattern (replace with empty)
echo "${FILE//-/}" # backup20260323.tar.gz
Default Values and Parameter Expansion
# Use default if variable is unset or empty
DB_HOST="${DB_HOST:-localhost}"
DB_PORT="${DB_PORT:-5432}"
# Assign default if unset or empty (actually sets the variable)
: "${LOG_LEVEL:=info}"
# Error if unset or empty
: "${API_KEY:?API_KEY must be set}"
# Use alternative value if variable IS set
echo "${DEBUG:+Debug mode is on}" # Prints only if DEBUG is set
# Case modification (Bash 4+)
NAME="hello world"
echo "${NAME^}" # Hello world (capitalize first)
echo "${NAME^^}" # HELLO WORLD (all uppercase)
UPPER="HELLO"
echo "${UPPER,}" # hELLO (lowercase first)
echo "${UPPER,,}" # hello (all lowercase)
Arithmetic Operations
Integer Arithmetic with $(( ))
A=10
B=3
echo $((A + B)) # 13
echo $((A - B)) # 7
echo $((A * B)) # 30
echo $((A / B)) # 3 (integer division, no decimals)
echo $((A % B)) # 1 (modulo)
echo $((A ** B)) # 1000 (exponentiation)
# Increment and decrement
COUNT=0
((COUNT++))
((COUNT += 5))
echo "$COUNT" # 6
# Compound expressions
MEMORY_KB=2097152
MEMORY_MB=$((MEMORY_KB / 1024))
MEMORY_GB=$((MEMORY_MB / 1024))
echo "${MEMORY_GB}GB" # 2GB
The let Command
let "x = 5 + 3"
let "y = x * 2"
let "x++"
echo "$x $y" # 9 16
Floating-Point Arithmetic with bc
Bash only handles integers natively. For floating-point math, use bc:
# Basic floating point
echo "scale=2; 10 / 3" | bc # 3.33
# More complex calculations
CPU_TOTAL=800
CPU_USED=623
CPU_PERCENT=$(echo "scale=1; $CPU_USED * 100 / $CPU_TOTAL" | bc)
echo "CPU usage: ${CPU_PERCENT}%" # CPU usage: 77.8%
# Comparison with floating point
THRESHOLD="80.0"
if (( $(echo "$CPU_PERCENT > $THRESHOLD" | bc -l) )); then
echo "CPU usage is above threshold"
fi
Conditionals
if/elif/else
if [[ "$ENV" == "production" ]]; then
echo "Running in production mode"
set_production_config
elif [[ "$ENV" == "staging" ]]; then
echo "Running in staging mode"
set_staging_config
else
echo "Running in development mode"
set_dev_config
fi
test, [ ], and [[ ]]
There are three ways to write conditionals. The test command and [ ] are POSIX-compatible. The [[ ]] form is a Bash keyword with several advantages: it handles unquoted variables safely, supports pattern matching and regex, and allows && / || inside the brackets. Always prefer [[ ]] in Bash scripts.
# These are equivalent
test -f "/etc/hosts"
[ -f "/etc/hosts" ]
[[ -f "/etc/hosts" ]]
# But [[ ]] is safer with unquoted variables
# [ $var == "hello" ] # breaks if var is empty
# [[ $var == "hello" ]] # works fine even if var is empty
String Comparisons
[[ "$a" == "$b" ]] # Equal
[[ "$a" != "$b" ]] # Not equal
[[ -z "$a" ]] # Is empty (zero length)
[[ -n "$a" ]] # Is not empty (non-zero length)
[[ "$a" == deploy* ]] # Glob pattern match
[[ "$a" =~ ^[0-9]+$ ]] # Regex match
[[ "$a" > "$b" ]] # Lexicographic greater than
Numeric Comparisons
[[ "$a" -eq "$b" ]] # Equal
[[ "$a" -ne "$b" ]] # Not equal
[[ "$a" -gt "$b" ]] # Greater than
[[ "$a" -lt "$b" ]] # Less than
[[ "$a" -ge "$b" ]] # Greater or equal
[[ "$a" -le "$b" ]] # Less or equal
# Arithmetic context (cleaner for numeric comparisons)
if (( count > 10 )); then
echo "Count exceeds limit"
fi
if (( total_errors == 0 )); then
echo "No errors detected"
fi
File Test Operators
[[ -f "$file" ]] # Is a regular file
[[ -d "$dir" ]] # Is a directory
[[ -e "$path" ]] # Exists (any type)
[[ -r "$file" ]] # Is readable
[[ -w "$file" ]] # Is writable
[[ -x "$file" ]] # Is executable
[[ -s "$file" ]] # Exists and is non-empty
[[ -L "$path" ]] # Is a symbolic link
[[ "$a" -nt "$b" ]] # File a is newer than file b
[[ "$a" -ot "$b" ]] # File a is older than file b
A practical example combining several tests:
check_config() {
local config="$1"
if [[ ! -e "$config" ]]; then
echo "ERROR: Config file does not exist: $config"
return 1
fi
if [[ ! -f "$config" ]]; then
echo "ERROR: Not a regular file: $config"
return 1
fi
if [[ ! -r "$config" ]]; then
echo "ERROR: Config file is not readable: $config"
return 1
fi
if [[ ! -s "$config" ]]; then
echo "WARNING: Config file is empty: $config"
fi
return 0
}
Logical Operators
# Inside [[ ]]
[[ "$a" == "yes" && "$b" == "yes" ]] # AND
[[ "$a" == "yes" || "$b" == "yes" ]] # OR
[[ ! -f "$file" ]] # NOT
# Between commands
command1 && command2 # Run command2 only if command1 succeeds
command1 || command2 # Run command2 only if command1 fails
# Common pattern: do something or die
cd /opt/myapp || { echo "Cannot cd to /opt/myapp"; exit 1; }
Loops
for Loops
# Iterate over a list
for server in web01 web02 web03; do
echo "Deploying to $server"
ssh deploy@"$server" "/opt/deploy/run.sh"
done
# C-style for loop
for ((i = 1; i <= 10; i++)); do
echo "Iteration $i"
done
# Iterate over files (safe, handles spaces in filenames)
for config_file in /etc/nginx/conf.d/*.conf; do
[[ -f "$config_file" ]] || continue # skip if glob matched nothing
echo "Checking $config_file"
nginx -t -c "$config_file" 2>/dev/null || echo "INVALID: $config_file"
done
# Iterate over array elements
PACKAGES=("nginx" "curl" "jq" "htop")
for pkg in "${PACKAGES[@]}"; do
if ! dpkg -l "$pkg" &>/dev/null; then
echo "Installing $pkg"
sudo apt-get install -y "$pkg"
fi
done
# Iterate over a range
for i in {1..10}; do
echo "Processing batch $i"
done
while Loops
# Wait for a service to become healthy
RETRIES=30
COUNT=0
while ! curl -sf http://localhost:3000/health > /dev/null; do
COUNT=$((COUNT + 1))
if [[ "$COUNT" -ge "$RETRIES" ]]; then
echo "Service failed to start after $RETRIES attempts"
exit 1
fi
echo "Waiting for service... ($COUNT/$RETRIES)"
sleep 2
done
echo "Service is healthy"
# Read a file line by line (the robust way)
while IFS= read -r line; do
[[ -z "$line" ]] && continue # skip empty lines
[[ "$line" == \#* ]] && continue # skip comments
echo "Processing: $line"
done < config.txt
# Process command output line by line
find /var/log -name "*.log" -mtime +30 | while IFS= read -r logfile; do
echo "Archiving $logfile"
gzip "$logfile"
done
until Loops
The until loop runs as long as the condition is false (the opposite of while):
until [[ -f /tmp/deploy-complete ]]; do
echo "Waiting for deploy signal..."
sleep 5
done
echo "Deploy signal received"
select for Interactive Menus
echo "Choose an environment:"
select ENV in production staging development quit; do
case "$ENV" in
production) echo "Selected production"; break ;;
staging) echo "Selected staging"; break ;;
development) echo "Selected development"; break ;;
quit) echo "Exiting"; exit 0 ;;
*) echo "Invalid choice, try again" ;;
esac
done
echo "Deploying to $ENV"
break and continue
# break exits the loop entirely
for server in "${SERVERS[@]}"; do
if ! ping -c 1 -W 2 "$server" &>/dev/null; then
echo "CRITICAL: $server is unreachable, aborting deployment"
break
fi
deploy_to "$server"
done
# continue skips to the next iteration
for file in /data/*.csv; do
[[ -f "$file" ]] || continue
if [[ ! -s "$file" ]]; then
echo "Skipping empty file: $file"
continue
fi
process_csv "$file"
done
Functions
Definition and Arguments
# Standard function definition
log() {
local level="$1"
shift
echo "[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*" >&2
}
log INFO "Starting deployment"
log ERROR "Connection refused to database"
log WARN "Disk usage above 80%"
Function arguments work the same as script arguments: $1, $2, etc. for positional parameters, $@ for all arguments, and $# for the count.
Return Values
Functions communicate results two ways: exit codes (via return) and stdout (via echo). Use return for success/failure signaling and echo for actual data:
# Return value via echo (capture with command substitution)
get_container_id() {
local name="$1"
docker ps -q --filter "name=$name" 2>/dev/null
}
CONTAINER_ID=$(get_container_id "myapp")
# Return value via exit code
is_service_running() {
systemctl is-active --quiet "$1"
}
if is_service_running nginx; then
echo "nginx is running"
fi
# Return specific exit codes for different error conditions
validate_input() {
local input="$1"
[[ -z "$input" ]] && return 1 # empty input
[[ ${#input} -gt 255 ]] && return 2 # too long
[[ "$input" =~ [^a-zA-Z0-9_-] ]] && return 3 # invalid chars
return 0
}
Local Variables
Always use local inside functions. Without it, variables leak into the global scope and create subtle bugs:
# BAD: counter leaks into global scope
count_files() {
result=$(find "$1" -type f | wc -l)
echo "$result"
}
# GOOD: properly scoped
count_files() {
local dir="$1"
local result
result=$(find "$dir" -type f | wc -l)
echo "$result"
}
Input Handling
Reading User Input
# Simple prompt
read -rp "Enter your name: " username
echo "Hello, $username"
# Read with a timeout
if read -rt 10 -p "Continue? (y/n) " answer; then
[[ "$answer" == "y" ]] && echo "Continuing..."
else
echo "Timed out, assuming no"
fi
# Read a password (no echo)
read -rsp "Enter password: " password
echo ""
# Read into an array
read -ra words -p "Enter words: "
echo "You entered ${#words[@]} words"
Command-Line Arguments
#!/usr/bin/env bash
set -euo pipefail
if [[ $# -lt 2 ]]; then
echo "Usage: $0 ENVIRONMENT VERSION"
echo "Example: $0 production v2.3.1"
exit 1
fi
ENVIRONMENT="$1"
VERSION="$2"
EXTRA_ARGS=("${@:3}") # remaining arguments as array
Option Parsing with getopts
For scripts that accept flags and options, getopts is the standard approach:
#!/usr/bin/env bash
set -euo pipefail
usage() {
cat <<USAGE
Usage: $0 -e ENVIRONMENT -v VERSION [-f] [-t TIMEOUT] [-h]
Options:
-e ENV Target environment (production, staging, development)
-v VERSION Version tag to deploy
-f Force deploy (skip confirmation prompt)
-t TIMEOUT Health check timeout in seconds (default: 30)
-h Show this help message
Example:
$0 -e production -v v2.3.1 -f -t 60
USAGE
exit 1
}
ENVIRONMENT=""
VERSION=""
FORCE=false
TIMEOUT=30
while getopts "e:v:ft:h" opt; do
case "$opt" in
e) ENVIRONMENT="$OPTARG" ;;
v) VERSION="$OPTARG" ;;
f) FORCE=true ;;
t) TIMEOUT="$OPTARG" ;;
h) usage ;;
*) usage ;;
esac
done
shift $((OPTIND - 1)) # Remove parsed options, leaving remaining args
[[ -z "$ENVIRONMENT" ]] && { echo "Error: -e is required"; usage; }
[[ -z "$VERSION" ]] && { echo "Error: -v is required"; usage; }
echo "Deploying $VERSION to $ENVIRONMENT (force=$FORCE, timeout=$TIMEOUT)"
A colon after a letter in the getopts string means that option requires an argument. Options without colons are boolean flags.
Error Handling
The Big Three: set -e, set -u, set -o pipefail
After the shebang, this is the single most important line in any production script:
set -euo pipefail
| Flag | Effect |
|---|---|
-e | Exit immediately if any command returns non-zero |
-u | Treat unset variables as errors instead of empty strings |
-o pipefail | A pipeline fails if any command in it fails, not just the last one |
Without these flags, scripts silently swallow errors and keep running with corrupted state. Here is what happens without pipefail:
# Without pipefail: this "succeeds" even though grep found nothing
set -e
echo "test" | grep "missing" | wc -l # exits 0 because wc succeeds
# With pipefail: pipeline correctly fails
set -eo pipefail
echo "test" | grep "missing" | wc -l # exits 1 because grep failed
trap for Cleanup
The trap command registers cleanup functions that run when the script exits, regardless of whether it succeeded or failed:
#!/usr/bin/env bash
set -euo pipefail
TMPDIR=$(mktemp -d)
LOCKFILE="/var/lock/myapp-deploy.lock"
cleanup() {
local exit_code=$?
echo "Cleaning up..."
rm -rf "$TMPDIR"
rm -f "$LOCKFILE"
if [[ $exit_code -ne 0 ]]; then
echo "Script failed with exit code $exit_code"
# Send alert, write to error log, etc.
fi
exit "$exit_code"
}
trap cleanup EXIT
# Trap specific signals for graceful shutdown
on_interrupt() {
echo "Interrupted by user, cleaning up..."
exit 130
}
trap on_interrupt INT TERM
# Script logic here: temp files and locks are always cleaned up
echo "lock-$$" > "$LOCKFILE"
cp important-files "$TMPDIR/"
process_data "$TMPDIR/"
Available signals for trap: EXIT (always runs), ERR (on any error when set -e is active), INT (Ctrl+C), TERM (kill signal), HUP (terminal closed).
Retry Pattern
retry() {
local max_attempts="$1"
local delay="$2"
shift 2
local attempt=1
while [[ "$attempt" -le "$max_attempts" ]]; do
if "$@"; then
return 0
fi
echo "Attempt $attempt/$max_attempts failed. Retrying in ${delay}s..." >&2
sleep "$delay"
attempt=$((attempt + 1))
done
echo "All $max_attempts attempts failed for: $*" >&2
return 1
}
# Usage
retry 5 3 curl -sf http://localhost:3000/health
retry 3 10 ssh deploy@web01 "systemctl restart myapp"
Process Substitution and Command Substitution
Command Substitution
Command substitution captures the stdout of a command into a variable or inline expression:
# Modern syntax (preferred, nestable)
CURRENT_DATE=$(date +%Y-%m-%d)
GIT_BRANCH=$(git rev-parse --abbrev-ref HEAD)
FREE_MEM=$(free -m | awk '/^Mem:/{print $4}')
# Nested command substitution
DEPLOY_MSG="Deployed $(git log -1 --format='%h') by $(whoami) at $(date)"
# Old backtick syntax (avoid: hard to nest, hard to read)
CURRENT_DATE=`date +%Y-%m-%d`
Process Substitution
Process substitution feeds the output of a command as if it were a file. It uses the syntax command_that_reads_file /dev/fd/N behind the scenes:
# Compare two command outputs as if they were files
diff <(ls /opt/myapp/current/) <(ls /opt/myapp/releases/v2.3.1/)
# Feed multiple streams into a command
paste <(cut -d, -f1 users.csv) <(cut -d, -f3 users.csv)
# Avoid subshell issues with while loops
# BAD: variables set inside the loop are lost (runs in subshell)
count=0
cat file.txt | while IFS= read -r line; do
((count++))
done
echo "$count" # always 0!
# GOOD: process substitution avoids the subshell
count=0
while IFS= read -r line; do
((count++))
done < <(cat file.txt)
echo "$count" # correct count
Here Documents and Here Strings
Here Documents
A here document passes a multi-line string as stdin to a command:
# Write a config file
cat > /etc/myapp/config.yaml <<EOF
database:
host: ${DB_HOST}
port: ${DB_PORT}
name: ${DB_NAME}
logging:
level: info
file: /var/log/myapp/app.log
EOF
# Prevent variable expansion with quoted delimiter
cat > /tmp/script.sh <<'EOF'
#!/bin/bash
echo "PID is $$"
echo "User is $USER"
EOF
# Indent-friendly version (strips leading tabs)
if true; then
cat <<-EOF
This line has a leading tab that will be stripped.
Useful inside indented blocks.
EOF
fi
Here Strings
A here string passes a single string as stdin:
# Instead of: echo "hello world" | grep "hello"
grep "hello" <<< "hello world"
# Parse a variable as input
IFS=',' read -ra fields <<< "$CSV_LINE"
echo "First field: ${fields[0]}"
# Feed a variable to a command
bc <<< "scale=2; $NUMERATOR / $DENOMINATOR"
Regular Expressions in Bash
The =~ operator inside [[ ]] performs regex matching. Matches are stored in the BASH_REMATCH array:
# Basic regex matching
VERSION="v2.13.1"
if [[ "$VERSION" =~ ^v([0-9]+)\.([0-9]+)\.([0-9]+)$ ]]; then
echo "Full match: ${BASH_REMATCH[0]}" # v2.13.1
echo "Major: ${BASH_REMATCH[1]}" # 2
echo "Minor: ${BASH_REMATCH[2]}" # 13
echo "Patch: ${BASH_REMATCH[3]}" # 1
else
echo "Invalid version format"
fi
# Validate an email (simplified)
EMAIL="user@example.com"
if [[ "$EMAIL" =~ ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ ]]; then
echo "Valid email"
fi
# Validate an IP address
IP="192.168.1.100"
if [[ "$IP" =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
echo "Looks like an IPv4 address"
fi
# Extract data from log lines
LOG_LINE="2026-03-23 14:22:01 ERROR [api] Connection timeout after 30s"
if [[ "$LOG_LINE" =~ ^([0-9-]+\ [0-9:]+)\ (ERROR|WARN|INFO)\ \[([a-z]+)\]\ (.+)$ ]]; then
TIMESTAMP="${BASH_REMATCH[1]}"
LEVEL="${BASH_REMATCH[2]}"
COMPONENT="${BASH_REMATCH[3]}"
MESSAGE="${BASH_REMATCH[4]}"
echo "[$LEVEL] $COMPONENT at $TIMESTAMP: $MESSAGE"
fi
Important: do not quote the regex pattern on the right side of =~. Quoting it turns it into a literal string comparison.
Quoting Rules and Word Splitting
This is the number-one source of Bash bugs. Understand these rules and you will avoid most scripting pitfalls.
NAME="world"
# Double quotes: variable expansion and command substitution happen
echo "Hello $NAME" # Hello world
echo "Today is $(date +%A)" # Today is Monday
# Single quotes: everything is literal, no expansion
echo 'Hello $NAME' # Hello $NAME
echo 'Today is $(date +%A)' # Today is $(date +%A)
# $'...' syntax: interprets escape sequences
echo $'Line one\nLine two' # two lines
echo $'Tab\there' # tab separated
Always double-quote your variables. Without quotes, Bash performs word splitting and globbing on the expanded value:
# BAD: filename with spaces gets split into multiple arguments
FILE="my important file.txt"
rm $FILE # tries to remove "my", "important", "file.txt"
# GOOD: quotes preserve the value as a single argument
rm "$FILE" # removes "my important file.txt"
# BAD: glob expansion in variable
PATTERN="*.txt"
echo $PATTERN # expands to matching filenames
# GOOD: glob stays literal
echo "$PATTERN" # prints *.txt
# BAD: breaks if filename has spaces
for f in $(ls *.txt); do echo "$f"; done
# GOOD: handles spaces correctly, no useless use of ls
for f in *.txt; do echo "$f"; done
Common Production Patterns
Log Rotation Script
#!/usr/bin/env bash
set -euo pipefail
LOG_DIR="${1:-/var/log/myapp}"
COMPRESS_AFTER_DAYS=1
DELETE_AFTER_DAYS=90
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"; }
if [[ ! -d "$LOG_DIR" ]]; then
log "ERROR: Directory does not exist: $LOG_DIR"
exit 1
fi
# Compress plain log files older than threshold
COMPRESSED=0
while IFS= read -r -d '' logfile; do
gzip "$logfile"
((COMPRESSED++))
done < <(find "$LOG_DIR" -name "*.log" -mtime +"$COMPRESS_AFTER_DAYS" -print0)
log "Compressed $COMPRESSED log files"
# Delete old compressed logs
DELETED=0
while IFS= read -r -d '' logfile; do
rm -f "$logfile"
((DELETED++))
done < <(find "$LOG_DIR" -name "*.log.gz" -mtime +"$DELETE_AFTER_DAYS" -print0)
log "Deleted $DELETED old compressed log files"
# Report
TOTAL_SIZE=$(du -sh "$LOG_DIR" | cut -f1)
FILE_COUNT=$(find "$LOG_DIR" -type f | wc -l)
log "Log directory: $LOG_DIR | Size: $TOTAL_SIZE | Files: $FILE_COUNT"
Backup Script
#!/usr/bin/env bash
set -euo pipefail
readonly BACKUP_BASE="/opt/backups"
readonly DB_NAME="myapp_production"
readonly DB_USER="backup_user"
readonly S3_BUCKET="s3://mycompany-backups/database"
readonly RETENTION_DAYS=30
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
BACKUP_DIR="$BACKUP_BASE/$TIMESTAMP"
BACKUP_FILE="$BACKUP_DIR/${DB_NAME}-${TIMESTAMP}.sql.gz"
log() { echo "[$(date '+%H:%M:%S')] $*"; }
cleanup() {
local exit_code=$?
if [[ -d "$BACKUP_DIR" ]]; then
rm -rf "$BACKUP_DIR"
fi
if [[ $exit_code -ne 0 ]]; then
log "ERROR: Backup failed with exit code $exit_code"
# Could send a Slack alert or email here
fi
}
trap cleanup EXIT
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Dump and compress database
log "Starting database backup for $DB_NAME"
pg_dump -U "$DB_USER" -Fc "$DB_NAME" | gzip > "$BACKUP_FILE"
BACKUP_SIZE=$(du -h "$BACKUP_FILE" | cut -f1)
log "Backup complete: $BACKUP_FILE ($BACKUP_SIZE)"
# Upload to S3
log "Uploading to $S3_BUCKET"
aws s3 cp "$BACKUP_FILE" "$S3_BUCKET/$(basename "$BACKUP_FILE")" --quiet
# Verify upload
if aws s3 ls "$S3_BUCKET/$(basename "$BACKUP_FILE")" &>/dev/null; then
log "Upload verified"
else
log "ERROR: Upload verification failed"
exit 1
fi
# Clean up old remote backups
log "Removing remote backups older than $RETENTION_DAYS days"
CUTOFF_DATE=$(date -d "$RETENTION_DAYS days ago" +%Y%m%d)
aws s3 ls "$S3_BUCKET/" | while IFS= read -r line; do
FILENAME=$(echo "$line" | awk '{print $4}')
FILE_DATE=$(echo "$FILENAME" | grep -oP '\d{8}' | head -1)
if [[ -n "$FILE_DATE" && "$FILE_DATE" < "$CUTOFF_DATE" ]]; then
aws s3 rm "$S3_BUCKET/$FILENAME" --quiet
log "Removed old backup: $FILENAME"
fi
done
log "Backup pipeline complete"
Health Check Script
#!/usr/bin/env bash
set -euo pipefail
declare -A ENDPOINTS
ENDPOINTS[api]="https://api.example.com/health"
ENDPOINTS[web]="https://www.example.com"
ENDPOINTS[admin]="https://admin.example.com/ping"
ENDPOINTS[cdn]="https://cdn.example.com/status"
TIMEOUT=5
FAILURES=0
RESULTS=()
check_endpoint() {
local name="$1"
local url="$2"
local http_code
local response_time
response_time=$( { time curl -sf -o /dev/null -w "%{http_code}" \
--max-time "$TIMEOUT" "$url" 2>/dev/null || echo "000"; } 2>&1 )
http_code=$(curl -sf -o /dev/null -w "%{http_code}" \
--max-time "$TIMEOUT" "$url" 2>/dev/null || echo "000")
if [[ "$http_code" -ge 200 && "$http_code" -lt 300 ]]; then
echo "[OK] $name ($url) - HTTP $http_code"
else
echo "[FAIL] $name ($url) - HTTP $http_code"
return 1
fi
}
for name in "${!ENDPOINTS[@]}"; do
if ! check_endpoint "$name" "${ENDPOINTS[$name]}"; then
((FAILURES++))
fi
done
echo ""
if [[ "$FAILURES" -gt 0 ]]; then
echo "ALERT: $FAILURES service(s) unhealthy!"
exit 1
fi
echo "All services healthy."
Deployment Script
#!/usr/bin/env bash
set -euo pipefail
readonly APP_DIR="/opt/myapp"
readonly RELEASES_DIR="$APP_DIR/releases"
readonly SHARED_DIR="$APP_DIR/shared"
readonly CURRENT_LINK="$APP_DIR/current"
readonly REPO_URL="git@github.com:org/myapp.git"
readonly KEEP_RELEASES=5
VERSION="${1:?Usage: $0 VERSION_TAG}"
RELEASE_DIR="$RELEASES_DIR/$VERSION"
log() { echo "[$(date '+%H:%M:%S')] $*"; }
rollback() {
log "Rolling back..."
local previous
previous=$(ls -t "$RELEASES_DIR" | grep -v "$VERSION" | head -1)
if [[ -n "$previous" ]]; then
ln -sfn "$RELEASES_DIR/$previous" "$CURRENT_LINK"
sudo systemctl restart myapp
log "Rolled back to $previous"
else
log "No previous release to roll back to"
fi
}
# Prevent concurrent deploys
LOCKFILE="/var/lock/myapp-deploy.lock"
if [[ -f "$LOCKFILE" ]]; then
log "ERROR: Another deployment is in progress (lockfile exists)"
exit 1
fi
trap 'rm -f "$LOCKFILE"' EXIT
echo "$$" > "$LOCKFILE"
log "Deploying $VERSION..."
# Clone release
if [[ -d "$RELEASE_DIR" ]]; then
log "Release $VERSION already exists, removing..."
rm -rf "$RELEASE_DIR"
fi
git clone --depth 1 --branch "$VERSION" "$REPO_URL" "$RELEASE_DIR"
# Link shared resources (env files, uploads, etc.)
ln -sfn "$SHARED_DIR/.env" "$RELEASE_DIR/.env"
ln -sfn "$SHARED_DIR/uploads" "$RELEASE_DIR/public/uploads"
# Install dependencies
log "Installing dependencies..."
cd "$RELEASE_DIR"
npm ci --production
# Build
log "Building application..."
npm run build
# Switch symlink atomically
ln -sfn "$RELEASE_DIR" "$CURRENT_LINK"
# Restart application
log "Restarting application..."
sudo systemctl restart myapp
# Health check with retry
log "Running health checks..."
HEALTHY=false
for i in {1..15}; do
if curl -sf http://localhost:3000/health > /dev/null 2>&1; then
HEALTHY=true
break
fi
echo " Health check attempt $i/15..."
sleep 2
done
if [[ "$HEALTHY" == "false" ]]; then
log "ERROR: Health check failed after 30 seconds"
rollback
exit 1
fi
log "Deployment successful"
# Cleanup old releases
cd "$RELEASES_DIR"
ls -t | tail -n +"$((KEEP_RELEASES + 1))" | while IFS= read -r old_release; do
log "Removing old release: $old_release"
rm -rf "$old_release"
done
log "Done. $VERSION is live."
Debugging
set -x and PS4
The most powerful debugging tool is set -x, which prints every command before it executes:
#!/usr/bin/env bash
set -euo pipefail
set -x # Enable debug tracing
# Customize the debug prefix for more context
export PS4='+${BASH_SOURCE}:${LINENO}: ${FUNCNAME[0]:+${FUNCNAME[0]}(): }'
# Now every command prints with file, line number, and function name
# +myscript.sh:15: main(): RESULT=hello
You can enable debugging for just a section of your script:
# Debug only the tricky part
set -x
problematic_function "$arg1" "$arg2"
set +x # Disable debug tracing
Syntax Checking and Linting
# Check syntax without running the script
bash -n myscript.sh
# ShellCheck: the essential Bash linter
# Install: apt install shellcheck (or brew install shellcheck)
shellcheck myscript.sh
# ShellCheck catches common mistakes:
# - Unquoted variables
# - Useless use of cat
# - Missing shebang
# - Incorrect test operators
# - And hundreds more
Debugging Techniques
# Print variable state at a specific point
debug() {
if [[ "${DEBUG:-}" == "true" ]]; then
echo "DEBUG: $*" >&2
fi
}
debug "ENVIRONMENT=$ENVIRONMENT VERSION=$VERSION"
# Run your script with debugging: DEBUG=true ./deploy.sh
# Trace a specific function
enable_trace() { set -x; }
disable_trace() { set +x; }
Best Practices and Common Pitfalls
Always quote variables. Unquoted variables are the single most common source of Bash bugs. Word splitting and globbing will eventually bite you in production.
Use [[ ]] instead of [ ]. The double-bracket form is a Bash keyword, not a command, so it handles empty variables, pattern matching, and regex without special gymnastics.
Use set -euo pipefail in every script. Without it, your script will silently continue past failures. This is how "rm -rf" incidents happen: a variable expands to empty because the command that was supposed to set it failed silently.
Use trap for cleanup. Temp files, lock files, and mounted volumes should always be cleaned up, even if the script fails halfway through.
Avoid parsing ls output. Filenames can contain spaces, newlines, and special characters. Use globs or find with -print0 instead.
Use mktemp for temporary files. Never hardcode temp file paths like /tmp/myfile. Concurrent executions will collide.
TMPFILE=$(mktemp)
TMPDIR=$(mktemp -d)
Use readonly for constants. It prevents accidental reassignment and documents intent.
Prefer printf over echo for portability. The echo command behaves differently across systems (especially with -e and -n flags). For anything beyond simple strings, printf is more reliable.
# printf is consistent everywhere
printf "Name: %s\nAge: %d\n" "$name" "$age"
Use ShellCheck. Run it in CI. It catches real bugs that even experienced developers miss. Every production script should pass ShellCheck without warnings before it gets deployed.
Handle the "glob matches nothing" case. When a glob pattern matches no files, it expands to itself by default. Guard against it:
# This iterates once with the literal string "*.log" if no matches
for f in /var/log/*.log; do
[[ -f "$f" ]] || continue # skip if not a real file
process "$f"
done
# Or set nullglob (Bash option)
shopt -s nullglob
for f in /var/log/*.log; do
process "$f"
done
A poorly written shell script running on a cron job at 2 AM is a time bomb. Write your scripts as carefully as you would any other production code. Use version control, add comments, implement proper error handling, and test thoroughly before deployment. The techniques in this guide give you the foundation to write Bash scripts that are reliable, maintainable, and safe to run in production.
Senior Kubernetes Architect
10+ years orchestrating containers in production. Battle-tested opinions on everything from pod scheduling to service mesh. I've seen clusters burn and helped rebuild them better.
Related Articles
Linux Fundamentals: File System Navigation and Permissions
Navigate the Linux file system hierarchy, master essential commands, understand file permissions and ownership, and work with links, pipes, and redirection.
Systemd & Service Management: Master systemctl and Unit Files
Manage Linux services with systemctl, write custom unit files, understand the boot process, configure targets, and use journalctl for log analysis.
Linux Networking Commands: Cheat Sheet
Linux networking commands cheat sheet for troubleshooting — interfaces, routing, DNS lookups, connections, iptables firewalls, and tcpdump packet capture.