Keepalived VRRP: Automatic Failover for Load Balancers
You set up HAProxy or Nginx as a load balancer, and now your application servers have redundancy. But the load balancer itself is a single point of failure. If it goes down, everything behind it becomes unreachable. Keepalived solves this by providing automatic failover between two or more load balancer nodes using the VRRP protocol. Failover happens in under two seconds with zero client-side configuration changes.
This guide covers the VRRP protocol in depth, complete Keepalived configurations for both master and backup nodes, health check scripting, virtual IP management, split-brain prevention, unicast mode for cloud environments, notification and alerting, integration with HAProxy and Nginx, multi-VIP setups, and a thorough production checklist.
Why You Need Failover
A single load balancer means one server failure takes your entire application offline. Even with the most reliable hardware, you will eventually face kernel panics, NIC failures, misconfigured firewall rules, botched upgrades, or simple power supply failures. The question is not whether your load balancer will fail -- it is when.
The cost of downtime depends on your business, but even a five-minute outage can mean lost revenue, damaged trust, and SLA violations. For any production system that matters, load balancer failover is not optional.
The Standard Architecture
The pattern is simple and proven:
DNS: example.com
|
Virtual IP (192.168.1.100)
|
+---------+---------+
| |
LB1 (MASTER) LB2 (BACKUP)
192.168.1.10 192.168.1.20
Keepalived Keepalived
HAProxy HAProxy
| |
+-------+-------+ +------+------+
| | | | | |
App1 App2 App3 (same backends)
Clients and DNS records point to the virtual IP (VIP), not to any physical server. Keepalived ensures exactly one node holds that VIP at any time. When the master becomes unavailable, the backup takes ownership of the VIP within seconds. Clients reconnect transparently -- they never know a failover happened.
Failover Timeline
Here is what happens during a typical failover event:
| Time | Event |
|---|---|
| T+0s | HAProxy crashes on LB1 (MASTER) |
| T+2s | Keepalived health check detects HAProxy is down (interval 2s) |
| T+4s | Second check confirms failure (fall threshold met) |
| T+6s | Third check confirms -- effective priority drops below BACKUP |
| T+7s | LB2 stops receiving VRRP advertisements from LB1 |
| T+8s | LB2 sends gratuitous ARP, claims VIP 192.168.1.100 |
| T+8s | LB2 begins serving traffic as new MASTER |
| T+8s | Network switches update MAC tables |
| T+9s | Existing TCP connections are reset; clients reconnect |
Total failover time: approximately 6-10 seconds depending on health check intervals and advertisement timing. For most web applications, this is imperceptible to users because browsers automatically retry failed requests.
VRRP Protocol Deep Dive
VRRP (Virtual Router Redundancy Protocol) is defined in RFC 5798. It was designed specifically for router redundancy but works perfectly for load balancer failover. Understanding the protocol helps you debug issues and tune parameters.
Core Concepts
-
Virtual IP (VIP): A floating IP address that is not permanently assigned to any physical interface. The active MASTER node binds it to its network interface. When failover occurs, the new MASTER binds the VIP and sends a gratuitous ARP to update network switches.
-
VRID (Virtual Router ID): An identifier from 1 to 255 that groups nodes into a failover set. All nodes competing for the same VIP must share the same VRID. You can run multiple VRIDs on the same network for different VIPs.
-
Priority: A value from 1 to 254 that determines which node becomes MASTER. The highest priority wins. Default is 100. The value 255 is reserved for the IP address owner (the node whose physical IP matches the VIP).
-
Advertisement Interval: How often the MASTER announces it is alive. Default is 1 second. Shorter intervals mean faster failover detection but more network traffic.
-
Preemption: Whether a higher-priority node reclaims the VIP after recovering from a failure. Preemption is enabled by default but is often disabled in production to avoid unnecessary double-failovers.
How VRRP Elections Work
- All nodes in a VRID start up and compare priorities.
- The node with the highest effective priority (base priority plus/minus health check adjustments) becomes MASTER.
- The MASTER multicasts VRRP advertisements to
224.0.0.18(or unicasts to configured peers) everyadvert_intseconds. - BACKUP nodes listen for these advertisements. If a BACKUP misses
(3 * advert_int) + skew_timeworth of advertisements, it assumes the MASTER is down. - The BACKUP with the highest priority transitions to MASTER, binds the VIP, and sends gratuitous ARPs.
- If multiple BACKUPs exist, the highest-priority one wins. Ties are broken by the highest physical IP address.
The skew time prevents simultaneous transitions when multiple BACKUPs detect the MASTER is down. It is calculated as (256 - priority) / 256 seconds, meaning higher-priority nodes transition faster.
VRRP Packet Structure
A VRRP advertisement contains:
| Field | Size | Content |
|---|---|---|
| Version | 4 bits | VRRP version (3 for RFC 5798) |
| Type | 4 bits | Always 1 (ADVERTISEMENT) |
| Virtual Router ID | 8 bits | VRID (1-255) |
| Priority | 8 bits | Sender's priority (1-254) |
| Count IP Addrs | 8 bits | Number of VIPs |
| Auth Type | 8 bits | Authentication type |
| Advert Int | 16 bits | Advertisement interval in centiseconds |
| Checksum | 16 bits | Standard IP checksum |
| IP Addresses | Variable | List of VIPs |
VRRP uses IP protocol number 112 (not TCP or UDP). This is important for firewall rules -- you must allow protocol 112, not a specific port.
Installing Keepalived
Ubuntu / Debian
sudo apt update
sudo apt install -y keepalived
RHEL / Rocky / Alma
sudo dnf install -y keepalived
Building from Source (Latest Features)
sudo apt install -y build-essential libssl-dev libnl-3-dev libnl-genl-3-dev
wget https://www.keepalived.org/software/keepalived-2.2.8.tar.gz
tar xzf keepalived-2.2.8.tar.gz
cd keepalived-2.2.8
./configure --prefix=/usr --sysconfdir=/etc --enable-json
make -j$(nproc)
sudo make install
Post-Installation Setup
Enable and start the service:
sudo systemctl enable keepalived
sudo systemctl start keepalived
Allow VRRP traffic through the firewall:
# iptables
sudo iptables -A INPUT -p vrrp -j ACCEPT
sudo iptables -A INPUT -d 224.0.0.18/32 -j ACCEPT
# firewalld
sudo firewall-cmd --add-protocol=vrrp --permanent
sudo firewall-cmd --add-rich-rule='rule family="ipv4" destination address="224.0.0.18" accept' --permanent
sudo firewall-cmd --reload
# ufw
sudo ufw allow in proto vrrp
Enable non-local IP binding so the node can bind the VIP even though it is not its own address:
# /etc/sysctl.d/99-keepalived.conf
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1
Apply with sudo sysctl -p /etc/sysctl.d/99-keepalived.conf.
Configuring MASTER and BACKUP Nodes
The configuration file lives at /etc/keepalived/keepalived.conf.
MASTER Node (LB1)
# /etc/keepalived/keepalived.conf on LB1
global_defs {
router_id LB1
enable_script_security
script_user root
max_auto_priority -1
vrrp_garp_master_delay 5
vrrp_garp_master_repeat 3
vrrp_garp_master_refresh 60
vrrp_garp_master_refresh_repeat 2
}
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
weight -20
fall 3
rise 2
user root
}
vrrp_script chk_http {
script "/etc/keepalived/check_http.sh"
interval 3
weight -30
fall 2
rise 3
timeout 2
user root
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 110
advert_int 1
authentication {
auth_type PASS
auth_pass MySecret123
}
virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:vip
}
track_script {
chk_haproxy
chk_http
}
track_interface {
eth0 weight -50
}
notify_master "/etc/keepalived/notify.sh MASTER VI_1"
notify_backup "/etc/keepalived/notify.sh BACKUP VI_1"
notify_fault "/etc/keepalived/notify.sh FAULT VI_1"
notify_stop "/etc/keepalived/notify.sh STOP VI_1"
# Smtp notifications (optional)
smtp_alert
}
BACKUP Node (LB2)
# /etc/keepalived/keepalived.conf on LB2
global_defs {
router_id LB2
enable_script_security
script_user root
max_auto_priority -1
vrrp_garp_master_delay 5
vrrp_garp_master_repeat 3
vrrp_garp_master_refresh 60
vrrp_garp_master_refresh_repeat 2
}
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
weight -20
fall 3
rise 2
user root
}
vrrp_script chk_http {
script "/etc/keepalived/check_http.sh"
interval 3
weight -30
fall 2
rise 3
timeout 2
user root
}
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass MySecret123
}
virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:vip
}
track_script {
chk_haproxy
chk_http
}
track_interface {
eth0 weight -50
}
notify_master "/etc/keepalived/notify.sh MASTER VI_1"
notify_backup "/etc/keepalived/notify.sh BACKUP VI_1"
notify_fault "/etc/keepalived/notify.sh FAULT VI_1"
notify_stop "/etc/keepalived/notify.sh STOP VI_1"
smtp_alert
}
The differences between MASTER and BACKUP configs are minimal: state, priority, and router_id. Everything else is identical. This makes configuration management straightforward.
Configuration Parameter Reference
| Parameter | MASTER Value | BACKUP Value | Purpose |
|---|---|---|---|
state | MASTER | BACKUP | Initial role on startup |
priority | 110 | 100 | Election priority (higher wins) |
router_id | LB1 | LB2 | Unique identifier in logs |
virtual_router_id | 51 | 51 | Must match on both nodes |
advert_int | 1 | 1 | Must match on both nodes |
auth_pass | MySecret123 | MySecret123 | Must match on both nodes |
interface | eth0 | eth0 | Must be the same network segment |
Gratuitous ARP Tuning
The vrrp_garp_* parameters control how aggressively the new MASTER announces its ownership of the VIP:
| Parameter | Value | Purpose |
|---|---|---|
vrrp_garp_master_delay | 5 | Delay (seconds) before first GARP after becoming MASTER |
vrrp_garp_master_repeat | 3 | Number of GARP packets to send initially |
vrrp_garp_master_refresh | 60 | Send periodic GARPs every 60 seconds |
vrrp_garp_master_refresh_repeat | 2 | Number of GARP packets per refresh |
Periodic GARP refresh is important in environments with switches that have short ARP cache timeouts. Without it, some switches may "forget" the MAC-to-IP mapping and fail to route traffic to the MASTER.
Virtual IP Addresses
The VIP is the address your DNS records point to. When the MASTER takes over, it sends a gratuitous ARP announcing it now owns that IP, causing network switches to update their MAC tables immediately.
Single VIP
virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:vip
}
The label eth0:vip makes the VIP visible with a clear name in ip addr show output, which helps during debugging.
Multiple VIPs
You can assign multiple VIPs for different services:
virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:web
192.168.1.101/24 dev eth0 label eth0:api
192.168.1.102/24 dev eth0 label eth0:admin
}
Multi-Interface VIPs
For environments with separate networks for different traffic types:
virtual_ipaddress {
192.168.1.100/24 dev eth0 label eth0:vip
10.0.0.100/24 dev eth1 label eth1:internal
}
Active-Active with Multiple VRRP Instances
Instead of having one active and one standby node, you can run active-active by splitting VIPs across two VRRP instances. Each node is MASTER for one VIP and BACKUP for the other:
# On LB1:
vrrp_instance VI_WEB {
state MASTER
virtual_router_id 51
priority 110
virtual_ipaddress {
192.168.1.100/24
}
}
vrrp_instance VI_API {
state BACKUP
virtual_router_id 52
priority 100
virtual_ipaddress {
192.168.1.101/24
}
}
# On LB2:
vrrp_instance VI_WEB {
state BACKUP
virtual_router_id 51
priority 100
virtual_ipaddress {
192.168.1.100/24
}
}
vrrp_instance VI_API {
state MASTER
virtual_router_id 52
priority 110
virtual_ipaddress {
192.168.1.101/24
}
}
With this setup, DNS routes www.example.com to 192.168.1.100 (handled by LB1) and api.example.com to 192.168.1.101 (handled by LB2). Both nodes are actively serving traffic, and either can absorb the other's load during a failure.
Health Check Scripts
The vrrp_script block defines health checks that influence the MASTER election. When a check fails, the node's effective priority drops by the weight value. If this drops the MASTER below the BACKUP's priority, failover occurs.
Basic Process Check
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2 # Check every 2 seconds
weight -20 # Subtract 20 from priority on failure
fall 3 # Mark as failed after 3 consecutive failures
rise 2 # Mark as recovered after 2 consecutive successes
timeout 2 # Script timeout in seconds
user root # User to run the script as
}
The killall -0 command checks if a process exists without sending any signal. Exit code 0 means the process is running; non-zero means it is not found.
HTTP Health Check
For a deeper check that verifies the load balancer is actually serving traffic, not just that the process exists:
#!/bin/bash
# /etc/keepalived/check_http.sh
# Check if HAProxy is responding to HTTP requests
response=$(curl -s -o /dev/null -w "%{http_code}" \
http://127.0.0.1:80/health \
--max-time 2 \
--connect-timeout 1)
if [ "$response" = "200" ]; then
exit 0
else
logger -t keepalived "HTTP health check failed: status=$response"
exit 1
fi
HAProxy Stats Socket Check
The most thorough check queries HAProxy's stats socket to verify backends are available:
#!/bin/bash
# /etc/keepalived/check_haproxy_backends.sh
# Check if HAProxy is running
/usr/bin/killall -0 haproxy 2>/dev/null || exit 1
# Check if at least one backend server is UP
backend_status=$(echo "show stat" | socat stdio /run/haproxy/admin.sock 2>/dev/null | \
grep "web_servers" | grep -c "UP")
if [ "$backend_status" -ge 1 ]; then
exit 0
else
logger -t keepalived "No healthy backends in web_servers pool"
exit 1
fi
Nginx Health Check
For Keepalived paired with Nginx instead of HAProxy:
#!/bin/bash
# /etc/keepalived/check_nginx.sh
# Check if Nginx master process is running
if ! /usr/bin/pgrep -x nginx > /dev/null 2>&1; then
logger -t keepalived "Nginx process not found"
exit 1
fi
# Check if Nginx is actually responding
response=$(curl -s -o /dev/null -w "%{http_code}" \
http://127.0.0.1/nginx_status \
--max-time 2 \
--connect-timeout 1)
if [ "$response" = "200" ]; then
exit 0
else
logger -t keepalived "Nginx status check failed: status=$response"
exit 1
fi
Make all scripts executable:
chmod +x /etc/keepalived/check_http.sh
chmod +x /etc/keepalived/check_haproxy_backends.sh
chmod +x /etc/keepalived/check_nginx.sh
Weight Calculation
Understanding how weights interact with priorities is essential for correct failover behavior:
| Scenario | MASTER Priority | Weight | Effective Priority | BACKUP Priority | Result |
|---|---|---|---|---|---|
| All healthy | 110 | 0 | 110 | 100 | LB1 is MASTER |
| HAProxy down (-20) | 110 | -20 | 90 | 100 | LB2 takes over |
| HTTP check down (-30) | 110 | -30 | 80 | 100 | LB2 takes over |
| Both checks fail | 110 | -50 | 60 | 100 | LB2 takes over |
The rule: set weights so that the worst-case effective priority of the MASTER is always below the BACKUP's priority. If MASTER is 110 and BACKUP is 100, any weight of -11 or more will trigger failover.
Warning: if the weight is too small (e.g., -5 with priorities 110/100), a health check failure will not cause failover because the MASTER's effective priority (105) is still higher than the BACKUP (100). Always verify your math.
Track Script and Track Interface
Track Script
The track_script directive inside vrrp_instance links health checks to the VRRP instance:
vrrp_instance VI_1 {
track_script {
chk_haproxy weight -20
chk_http weight -30
}
}
You can override weights in the track_script block, which takes precedence over the weight defined in the vrrp_script block.
Track Interface
Monitor network interfaces. If a tracked interface goes down, the node immediately transitions to FAULT state:
vrrp_instance VI_1 {
track_interface {
eth0 weight -50
eth1 weight -30
}
}
Interface tracking is faster than script-based checks because Keepalived receives kernel notifications instantly when an interface state changes.
Track File
Keepalived can also track the contents of a file, which is useful for manual failover triggers:
vrrp_track_file manual_override {
file /etc/keepalived/manual_priority
weight -100
}
vrrp_instance VI_1 {
track_file {
manual_override
}
}
Write a value to trigger manual failover:
# Force this node to give up MASTER
echo 1 > /etc/keepalived/manual_priority
# Return to normal
echo 0 > /etc/keepalived/manual_priority
Notification Scripts
Notification scripts run when state transitions occur. Use them for logging, alerting, running post-failover tasks, or triggering external automation:
#!/bin/bash
# /etc/keepalived/notify.sh
STATE=$1
INSTANCE=$2
HOSTNAME=$(hostname)
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')
LOGFILE="/var/log/keepalived-state.log"
# Log the transition
echo "$TIMESTAMP - $HOSTNAME transitioning to $STATE for $INSTANCE" >> "$LOGFILE"
case $STATE in
"MASTER")
logger -t keepalived "$HOSTNAME is now MASTER for $INSTANCE"
# Send Slack notification
if [ -n "$SLACK_WEBHOOK_URL" ]; then
curl -s -X POST "$SLACK_WEBHOOK_URL" \
-H 'Content-type: application/json' \
-d "{\"text\":\"$HOSTNAME is now MASTER for $INSTANCE (VIP 192.168.1.100)\"}" \
> /dev/null 2>&1
fi
# Send PagerDuty event
# curl -s -X POST https://events.pagerduty.com/v2/enqueue ...
# Ensure HAProxy is running on this node
systemctl is-active --quiet haproxy || systemctl start haproxy
;;
"BACKUP")
logger -t keepalived "$HOSTNAME is now BACKUP for $INSTANCE"
# Optional: verify HAProxy is also running on BACKUP
# (should be running and ready to serve if it becomes MASTER)
systemctl is-active --quiet haproxy || systemctl start haproxy
;;
"FAULT")
logger -t keepalived "$HOSTNAME is in FAULT state for $INSTANCE"
# Attempt to recover
systemctl restart haproxy
sleep 2
# If recovery worked, Keepalived will detect it via health checks
# and transition back to BACKUP or MASTER
;;
"STOP")
logger -t keepalived "Keepalived stopped on $HOSTNAME for $INSTANCE"
;;
esac
Make it executable:
chmod +x /etc/keepalived/notify.sh
Preempt vs Nopreempt
By default, VRRP is preemptive -- if the original MASTER recovers, it reclaims the VIP from the BACKUP. This causes two failover events: one when the MASTER fails, and another when it recovers. Each failover disrupts active connections.
Preempt Mode (Default)
T0: LB1 is MASTER (priority 110)
T1: LB1 HAProxy crashes
T2: LB2 becomes MASTER (failover #1 -- connections disrupted)
T5: LB1 HAProxy is restarted
T6: LB1 reclaims MASTER (failover #2 -- connections disrupted again)
Nopreempt Mode (Recommended)
# IMPORTANT: Both nodes must have state BACKUP for nopreempt to work
# LB1:
vrrp_instance VI_1 {
state BACKUP # Not MASTER
nopreempt
priority 110 # Higher priority still wins on simultaneous startup
}
# LB2:
vrrp_instance VI_1 {
state BACKUP
nopreempt
priority 100
}
With nopreempt:
T0: LB1 becomes MASTER on startup (higher priority)
T1: LB1 HAProxy crashes
T2: LB2 becomes MASTER (failover #1 -- connections disrupted)
T5: LB1 HAProxy is restarted
T6: LB1 becomes BACKUP -- VIP stays on LB2 (no second disruption)
Nopreempt is recommended for production because it minimizes unnecessary failovers. The VIP stays on whichever node is currently serving successfully.
Critical note: both nodes must have state BACKUP for nopreempt to work correctly. If you set one to state MASTER, it will always preempt regardless of the nopreempt directive.
Split-Brain Scenarios and Unicast Mode
Split-brain is the most dangerous failure mode in any HA system. It occurs when both nodes believe they are MASTER and both hold the same VIP. This causes IP conflicts, unpredictable routing, and data corruption if both nodes process writes.
What Causes Split-Brain
- Network partition: The link between LB1 and LB2 fails, but both can still reach clients. Neither sees the other's VRRP advertisements, so both promote themselves.
- Multicast filtering: A switch or firewall blocks multicast traffic (224.0.0.18). VRRP advertisements do not reach the peer.
- CPU starvation: The MASTER is so overloaded that it cannot send VRRP advertisements on time. The BACKUP assumes it is dead.
- Firewall misconfiguration: Protocol 112 (VRRP) is blocked between the nodes.
Unicast Mode
The most reliable prevention is to switch from multicast to unicast. Each node sends VRRP advertisements directly to its peer's IP:
# LB1:
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 110
nopreempt
unicast_src_ip 192.168.1.10
unicast_peer {
192.168.1.20
}
virtual_ipaddress {
192.168.1.100/24
}
}
# LB2:
vrrp_instance VI_1 {
state BACKUP
interface eth0
virtual_router_id 51
priority 100
nopreempt
unicast_src_ip 192.168.1.20
unicast_peer {
192.168.1.10
}
virtual_ipaddress {
192.168.1.100/24
}
}
Unicast mode is required in most cloud environments (AWS, GCP, Azure, DigitalOcean) where multicast traffic is not supported between instances.
Gateway-Aware Fencing
A fencing script adds an extra layer of protection. Before a node promotes itself to MASTER, it checks whether it can reach the network gateway. If it cannot, the node is likely isolated and should not take the VIP:
#!/bin/bash
# /etc/keepalived/check_fence.sh
#
# Logic:
# - If we CAN reach the gateway, we are properly connected to the network.
# - If we CANNOT reach the gateway, WE might be the isolated node,
# so we should drop our priority and NOT become MASTER.
GATEWAY="192.168.1.1"
PEER="192.168.1.20"
DNS_SERVER="8.8.8.8"
# Check multiple network endpoints to avoid false positives
gateway_ok=0
peer_ok=0
dns_ok=0
ping -c 1 -W 1 $GATEWAY > /dev/null 2>&1 && gateway_ok=1
ping -c 1 -W 1 $PEER > /dev/null 2>&1 && peer_ok=1
ping -c 1 -W 1 $DNS_SERVER > /dev/null 2>&1 && dns_ok=1
total=$((gateway_ok + peer_ok + dns_ok))
if [ $total -ge 2 ]; then
# We can reach at least 2 out of 3 endpoints -- network is healthy
exit 0
elif [ $gateway_ok -eq 0 ] && [ $dns_ok -eq 0 ]; then
# Cannot reach gateway or DNS -- we are likely isolated
logger -t keepalived "FENCE: Cannot reach gateway or DNS, dropping priority"
exit 1
else
# Ambiguous state -- err on the side of caution
exit 0
fi
vrrp_script chk_fence {
script "/etc/keepalived/check_fence.sh"
interval 5
weight -100
fall 2
rise 3
}
The weight of -100 ensures that an isolated node will always have lower priority than any healthy node, preventing split-brain.
Additional Split-Brain Defenses
| Defense | How It Works | Limitation |
|---|---|---|
| Unicast mode | Direct peer communication instead of multicast | Requires knowing peer IPs statically |
| Gateway fencing | Check network connectivity before promoting | Adds latency to failover |
| VRRP sync groups | Group multiple instances that must fail together | Complexity |
| ARP monitoring | Verify ARP table updates after failover | Platform-dependent |
| External witness | Third node or service confirms which node should be MASTER | Additional infrastructure |
Testing Failover
Test failover systematically before trusting it in production. Never deploy Keepalived without testing every failure mode.
Test 1: Service Failure
# On LB1 (MASTER), verify VIP is present
ip addr show eth0 | grep 192.168.1.100
# Stop HAProxy to simulate a crash
sudo systemctl stop haproxy
# On LB2, verify VIP has moved (should take 6-10 seconds)
watch -n 1 'ip addr show eth0 | grep 192.168.1.100'
# Check keepalived logs
sudo journalctl -u keepalived -f
# Restore HAProxy on LB1
sudo systemctl start haproxy
# With nopreempt: VIP stays on LB2
# With preempt: VIP moves back to LB1
Test 2: Full Server Reboot
# Reboot the MASTER
sudo reboot
# On LB2, verify it becomes MASTER within seconds
# After LB1 comes back, verify it stays as BACKUP (with nopreempt)
Test 3: Network Interface Failure
# On LB1, bring down the network interface
sudo ip link set eth0 down
# Verify LB2 takes over immediately (track_interface)
# Bring interface back up
sudo ip link set eth0 up
Test 4: Keepalived Service Failure
# Kill keepalived on the MASTER
sudo systemctl stop keepalived
# VIP should be released immediately
# LB2 should take over
Test 5: Simultaneous Startup
# Stop keepalived on both nodes
sudo systemctl stop keepalived # on LB1
sudo systemctl stop keepalived # on LB2
# Start both at the same time
# The higher-priority node should become MASTER
sudo systemctl start keepalived # on both nodes simultaneously
Test 6: Load Under Failover
# Run continuous HTTP requests during failover
while true; do
curl -s -o /dev/null -w "%{http_code} %{time_total}s\n" http://192.168.1.100/health
sleep 0.1
done
# Trigger failover and observe:
# - How many requests fail (expect 1-3)
# - How long until recovery (expect under 10 seconds)
Integration with Configuration Management
Keep configurations synchronized between nodes using automation. Manual configuration drift is the number one cause of failover failures.
Ansible Playbook
---
- name: Configure Keepalived HA Pair
hosts: load_balancers
become: true
vars:
vip: "192.168.1.100"
vrid: 51
auth_pass: "{{ vault_keepalived_pass }}"
tasks:
- name: Install Keepalived
apt:
name: keepalived
state: present
- name: Deploy HAProxy config
template:
src: haproxy.cfg.j2
dest: /etc/haproxy/haproxy.cfg
validate: "haproxy -c -f %s"
notify: reload haproxy
- name: Deploy Keepalived config
template:
src: keepalived.conf.j2
dest: /etc/keepalived/keepalived.conf
notify: restart keepalived
- name: Deploy health check scripts
copy:
src: "{{ item }}"
dest: "/etc/keepalived/{{ item }}"
mode: "0755"
loop:
- check_http.sh
- check_fence.sh
- notify.sh
notify: restart keepalived
- name: Enable sysctl settings
sysctl:
name: "{{ item.key }}"
value: "{{ item.value }}"
sysctl_file: /etc/sysctl.d/99-keepalived.conf
reload: true
loop:
- { key: "net.ipv4.ip_nonlocal_bind", value: "1" }
- { key: "net.ipv4.ip_forward", value: "1" }
handlers:
- name: reload haproxy
systemd:
name: haproxy
state: reloaded
- name: restart keepalived
systemd:
name: keepalived
state: restarted
Keepalived Jinja2 Template
# keepalived.conf.j2
global_defs {
router_id {{ inventory_hostname }}
enable_script_security
script_user root
}
vrrp_script chk_haproxy {
script "/usr/bin/killall -0 haproxy"
interval 2
weight -20
fall 3
rise 2
}
vrrp_instance VI_1 {
state BACKUP
nopreempt
interface {{ ansible_default_ipv4.interface }}
virtual_router_id {{ vrid }}
priority {{ 110 if inventory_hostname == groups['load_balancers'][0] else 100 }}
advert_int 1
unicast_src_ip {{ ansible_default_ipv4.address }}
unicast_peer {
{% for host in groups['load_balancers'] %}
{% if host != inventory_hostname %}
{{ hostvars[host].ansible_default_ipv4.address }}
{% endif %}
{% endfor %}
}
authentication {
auth_type PASS
auth_pass {{ auth_pass }}
}
virtual_ipaddress {
{{ vip }}/24
}
track_script {
chk_haproxy
}
notify_master "/etc/keepalived/notify.sh MASTER VI_1"
notify_backup "/etc/keepalived/notify.sh BACKUP VI_1"
notify_fault "/etc/keepalived/notify.sh FAULT VI_1"
}
Monitoring and Observability
Log Monitoring
# Watch state transitions in real time
sudo journalctl -u keepalived -f
# Check current VRRP state
sudo journalctl -u keepalived | grep -i "entering"
# View custom state log
tail -f /var/log/keepalived-state.log
Prometheus Monitoring
Use the keepalived_exporter to expose metrics:
# Install
wget https://github.com/mehdy/keepalived-exporter/releases/latest/download/keepalived-exporter-linux-amd64
chmod +x keepalived-exporter-linux-amd64
sudo mv keepalived-exporter-linux-amd64 /usr/local/bin/keepalived-exporter
Key metrics to alert on:
| Metric | Alert Condition | Meaning |
|---|---|---|
keepalived_vrrp_state | Changed unexpectedly | VRRP state transition occurred |
keepalived_vrrp_state{state="master"} | Both nodes show master | Split-brain detected |
keepalived_script_status | 0 (failed) | Health check script failing |
keepalived_gratuitous_arp_delay | High value | ARP announcement delays |
Quick Status Check Script
#!/bin/bash
# /usr/local/bin/keepalived-status.sh
echo "=== Keepalived Status ==="
echo ""
# Service status
systemctl is-active --quiet keepalived && echo "Service: RUNNING" || echo "Service: STOPPED"
# VIP status
if ip addr show | grep -q "192.168.1.100"; then
echo "VIP: HELD (this node is MASTER)"
else
echo "VIP: NOT HELD (this node is BACKUP)"
fi
# HAProxy status
systemctl is-active --quiet haproxy && echo "HAProxy: RUNNING" || echo "HAProxy: STOPPED"
# Recent state changes
echo ""
echo "=== Recent State Changes ==="
tail -5 /var/log/keepalived-state.log 2>/dev/null || echo "No state log found"
Production Checklist
Before going live with Keepalived failover, verify every item on this list:
Configuration
- Identical HAProxy/Nginx configurations on all LB nodes
- Virtual router ID is unique on the network segment
- Authentication password is set and matches on all nodes
nopreemptenabled to avoid unnecessary double-failovers- Both nodes set to
state BACKUPwhen usingnopreempt
Network
- Unicast mode enabled (required for cloud, recommended everywhere)
- Firewall rules allow VRRP protocol (112) between nodes
net.ipv4.ip_nonlocal_bind = 1set in sysctl- Gratuitous ARP parameters tuned for your switch environment
Health Checks
- Health check scripts are executable and tested independently
- Weight values calculated correctly: MASTER effective priority after failure must be less than BACKUP priority
fallandrisethresholds balance speed vs. stability- Scripts have timeouts to prevent hanging
Alerting
- Notification scripts working and sending alerts to your on-call channel
- Alerts fire on state transitions (especially unexpected MASTER changes)
- Split-brain detection alert configured (both nodes reporting MASTER)
Testing
- Tested service failure (HAProxy/Nginx crash)
- Tested full server reboot of the MASTER
- Tested network interface failure (
ip link set down) - Tested Keepalived service failure
- Tested simultaneous startup of both nodes
- Tested failback behavior (with nopreempt, VIP stays)
- Measured failover time under load
Monitoring
- Logs being collected from
journalctl -u keepalivedand/var/log/keepalived-state.log - VIP ownership tracked in monitoring dashboard
- HAProxy/Nginx health tracked separately from Keepalived state
Key Takeaways
- Keepalived with VRRP gives you sub-ten-second failover for load balancers with minimal complexity. It is the standard solution used across the industry.
- Use
nopreemptwith both nodes set tostate BACKUPto avoid unnecessary VIP flapping when the original MASTER recovers. This is the single most impactful configuration choice. - Always use unicast mode in cloud environments where multicast is not available. Even in on-premise environments, unicast is more predictable.
- Health check scripts should verify the actual service is working (HTTP response, backend availability), not just that a process ID exists. A running process that does not respond is worse than a crashed one.
- Set weight values so that a failed health check drops the MASTER's effective priority below the BACKUP. Always verify your math: MASTER_PRIORITY - WEIGHT must be less than BACKUP_PRIORITY.
- Implement gateway-aware fencing to prevent split-brain. If a node cannot reach the network gateway, it should not hold the VIP.
- Test every failure mode before going to production: service crash, full reboot, network interface down, Keepalived stop, simultaneous startup. Untested failover is not failover -- it is hope.
- Keep configurations synchronized between nodes using Ansible, Salt, or Puppet. Manual configuration drift is the top cause of failover failures.
- Monitor VRRP state transitions. An unexpected MASTER change is often the first sign of a deeper network or application problem.
- Active-active setups with multiple VRRP instances let you utilize both load balancer nodes under normal conditions, not just during failures.
SRE & Observability Engineer
If it's not measured, it doesn't exist. SLO-driven, metrics-obsessed, and the person who gets paged at 3 AM so you don't have to. Observability isn't optional.
Related Articles
HAProxy Load Balancing: From Installation to Production
Configure HAProxy for HTTP and TCP load balancing — installation, frontends, backends, health checks, ACLs, SSL termination, and the stats dashboard.
The Complete AWS Cost Optimization Playbook: Compute, Storage, Networking, and Reserved Capacity
A data-driven playbook for cutting AWS costs across compute, storage, networking, and reserved capacity with real numbers and actions.
Linux Networking Commands: Cheat Sheet
Linux networking commands cheat sheet for troubleshooting — interfaces, routing, DNS lookups, connections, iptables firewalls, and tcpdump packet capture.