Keepalived + HAProxy 高可用集群安装部署文档
文档版本
- 版本:v1.0
- 本次用于环境:Kubernetes 高可用控制平面 VIP 管理
1. 环境规划
1.1 服务器信息
| 主机名 | IP地址 | 角色 | Keepalived优先级 | VIP |
|---|---|---|---|---|
| node1 | 100.100.157.10 | 负载均衡节点1 | 101 | 100.100.157.200 |
| node2 | 100.100.157.11 | 负载均衡节点2 | 100 | 100.100.157.200 |
| node3 | 100.100.157.12 | 负载均衡节点3 | 99 | 100.100.157.200 |
1.2 网络规划
- 虚拟IP(VIP):100.100.157.200/24
- 服务端口:6443(Kubernetes API Server)
- VRRP组ID:51
- 认证密码:k8svip
2. 前置准备
2.1 所有节点执行
# 更新系统
apt update && apt upgrade -y
# 安装必要工具
apt install -y curl wget vim net-tools nftables
# 关闭防火墙或配置放行规则(生产环境建议配置精确规则)
systemctl stop ufw
systemctl disable ufw2.2 网络配置检查
# 确认网卡名称(本环境为 ens34)
ip addr show
# 确认网络互通
ping -c 3 100.100.157.10
ping -c 3 100.100.157.11
ping -c 3 100.100.157.12
# 检查 VRRP 协议是否允许(协议号 112)
nft list ruleset3. HAProxy 安装配置
3.1 所有节点安装 HAProxy
# 安装 HAProxy
apt install -y haproxy
# 检查版本
haproxy -v3.2 配置 HAProxy(所有节点配置相同)
# 备份原配置
cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak
# 创建新配置
cat > /etc/haproxy/haproxy.cfg << 'EOF'
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
maxconn 4096
defaults
log global
mode tcp # Kubernetes API 使用 TCP 模式
option tcplog # TCP 日志格式
option dontlognull
timeout connect 5s
timeout client 30s
timeout server 30s
retries 3
# 统计页面(可选)
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
stats admin if LOCALHOST
# Kubernetes API Server 负载均衡
frontend k8s-api
bind *:6443
mode tcp
default_backend k8s-api-backend
backend k8s-api-backend
mode tcp
balance roundrobin
option tcp-check
tcp-check connect port 6443 ssl
tcp-check send "PING\r\n"
tcp-check expect string "pong"
server node1 100.100.157.10:6443 check fall 3 rise 2 inter 2000 weight 1
server node2 100.100.157.11:6443 check fall 3 rise 2 inter 2000 weight 1
server node3 100.100.157.12:6443 check fall 3 rise 2 inter 2000 weight 1
EOF3.3 验证配置并启动 HAProxy
# 验证配置文件语法
haproxy -c -f /etc/haproxy/haproxy.cfg
# 创建 haproxy 用户(如果不存在)
id haproxy || useradd -r haproxy
# 创建必要的目录
mkdir -p /var/lib/haproxy
chown -R haproxy:haproxy /var/lib/haproxy
# 启动并设置开机自启
systemctl start haproxy
systemctl enable haproxy
systemctl status haproxy
# 检查端口监听
ss -tlnp | grep 6443
ss -tlnp | grep 84044. Keepalived 安装配置
4.1 所有节点安装 Keepalived
# 安装 Keepalived
apt install -y keepalived
# 检查版本
keepalived --version4.2 节点差异化配置
node1 (100.100.157.10) 配置
# 创建配置文件
cat > /etc/keepalived/keepalived.conf << 'EOF'
! Configuration File for keepalived - Kubernetes HA Control Plane VIP
global_defs {
router_id K8S_VIP_NODE1
vrrp_skip_check_adv_addr
#vrrp_strict # 生产环境建议启用,但需要配置防火墙
enable_script_security
script_user root
max_auto_priority
}
vrrp_script chk_haproxy {
script "/bin/sh -c '/usr/bin/pgrep haproxy > /dev/null 2>&1'"
interval 2
weight 2
fall 3
rise 2
timeout 2
user root
}
vrrp_instance VI_K8S_API {
state BACKUP # 所有节点都设为 BACKUP,通过优先级选举
interface ens34 # 根据实际网卡名修改
virtual_router_id 51 # 所有节点必须相同
priority 101 # node1 优先级最高
advert_int 1
authentication {
auth_type PASS
auth_pass k8svip # 所有节点密码相同
}
# 单播模式配置(避免多播/广播问题)
unicast_src_ip 100.100.157.10 # 本机 IP
unicast_peer {
100.100.157.11 # node2
100.100.157.12 # node3
}
virtual_ipaddress {
100.100.157.200/24 dev ens34 label ens34:vip
}
track_script {
chk_haproxy
}
# 非抢占模式,避免频繁切换
nopreempt
preempt_delay 300
# 状态切换通知脚本(可选)
# notify /etc/keepalived/notify.sh
# notify_master "/etc/keepalived/notify.sh master"
# notify_backup "/etc/keepalived/notify.sh backup"
# notify_fault "/etc/keepalived/notify.sh fault"
# 调试日志(生产环境可关闭)
debug
}
EOFnode2 (100.100.157.11) 配置
# 与 node1 配置基本相同,只修改以下参数:
# router_id K8S_VIP_NODE2
# unicast_src_ip 100.100.157.11
# priority 100node3 (100.100.157.12) 配置
# 与 node1 配置基本相同,只修改以下参数:
# router_id K8S_VIP_NODE3
# unicast_src_ip 100.100.157.12
# priority 994.3 启动并验证 Keepalived
# 验证配置文件语法
keepalived -t -f /etc/keepalived/keepalived.conf
# 启动 Keepalived
systemctl start keepalived
systemctl enable keepalived
systemctl status keepalived
# 查看实时日志
journalctl -u keepalived -f5. 功能验证
5.1 基础功能验证
# 检查 VIP 绑定(应该在 MASTER 节点上)
ip addr show ens34 | grep 100.100.157.200
# 检查 Keepalived 状态
systemctl status keepalived
# 检查 HAProxy 状态
systemctl status haproxy
# 检查端口监听
ss -tlnp | grep -E '(6443|8404)'
# 查看进程
ps aux | grep -E '(keepalived|haproxy)'5.2 故障切换测试
# 1. 查看当前 MASTER 节点
cat /var/log/syslog | grep "Entering MASTER" | tail -1
# 2. 停止当前 MASTER 节点的 Keepalived
systemctl stop keepalived
# 3. 查看哪个节点接管了 VIP(等待约10秒)
ip addr show ens34 | grep 100.100.157.200
# 4. 恢复原 MASTER 节点
systemctl start keepalived
# 5. 验证是否保持当前状态(因为配置了 nopreempt)
# 6. 切换选举日志
journalctl -u keepalived -f
Jan 16 15:45:15 hxy Keepalived[41811]: Startup complete
Jan 16 15:45:15 hxy systemd[1]: Started keepalived.service - Keepalive Daemon (LVS and VRRP).
Jan 16 15:45:15 hxy Keepalived_vrrp[41812]: (VI_K8S_API) Entering BACKUP STATE (init)
Jan 16 15:45:15 hxy Keepalived_vrrp[41812]: VRRP_Script(chk_haproxy) succeeded
Jan 16 15:45:15 hxy Keepalived_vrrp[41812]: (VI_K8S_API) Changing effective priority from 100 to 102
Jan 16 15:45:15 hxy Keepalived_vrrp[41812]: (VI_K8S_API) received lower priority (101) advert from 100.100.157.10 - discarding
Jan 16 15:45:16 hxy Keepalived_vrrp[41812]: (VI_K8S_API) received lower priority (101) advert from 100.100.157.10 - discarding
Jan 16 15:45:17 hxy Keepalived_vrrp[41812]: (VI_K8S_API) received lower priority (101) advert from 100.100.157.10 - discarding
Jan 16 15:45:18 hxy Keepalived_vrrp[41812]: (VI_K8S_API) received lower priority (101) advert from 100.100.157.10 - discarding
Jan 16 15:45:18 hxy Keepalived_vrrp[41812]: (VI_K8S_API) Entering MASTER STATE
Jan 16 15:47:47 hxy Keepalived_vrrp[41812]: (VI_K8S_API) Master received advert from 100.100.157.10 with higher priority 103, ours 102
Jan 16 15:47:47 hxy Keepalived_vrrp[41812]: (VI_K8S_API) Entering BACKUP STATE
5.3 HAProxy 健康检查测试
# 测试 HAProxy 统计页面
curl http://100.100.157.200:8404/stats
# 测试负载均衡功能
for i in {1..10}; do
echo "请求 $i: $(curl -k -s https://100.100.157.200:6443/healthz 2>/dev/null || echo 'failed')"
done
# 查看 HAProxy 后端状态
echo "show stat" | socat /run/haproxy/admin.sock stdio6. QA故障排除
6.1 常见问题及解决方案
问题1:VIP 未绑定
# 检查 Keepalived 状态
systemctl status keepalived
journalctl -u keepalived -f
# 手动检查 VIP
ip addr show ens34
# 检查防火墙是否阻止 VRRP
nft list ruleset | grep vrrp问题2:脚本检查失败
# 手动测试健康检查脚本
/bin/sh -c '/usr/bin/pgrep haproxy > /dev/null 2>&1'
echo $?
# 检查 haproxy 进程
pgrep haproxy
systemctl status haproxy问题3:节点间通信问题
# 测试节点间通信
ping -c 3 100.100.157.11
ping -c 3 100.100.157.12
# 检查防火墙规则
nft list ruleset
# 检查 VRRP 单播配置
grep -A5 "unicast_" /etc/keepalived/keepalived.conf问题4:频繁主备切换
# 检查优先级配置
grep priority /etc/keepalived/keepalived.conf
# 检查网络延迟
ping -c 10 100.100.157.11 | grep rtt
# 调整 advert_int(增加通告间隔)
# advert_int 27.日志分析
# 查看完整日志
journalctl -u keepalived --no-pager -n 100
# 过滤重要信息
journalctl -u keepalived | grep -E "(MASTER|BACKUP|FAULT|priority|advert)"
# 实时监控
journalctl -u keepalived -f8. 灾难恢复
# 如果所有节点都故障,按以下顺序恢复:
# 1. 启动 haproxy
systemctl start haproxy
# 2. 启动 keepalived
systemctl start keepalived
# 3. 检查 VIP 绑定
ip addr show ens34 | grep 100.100.157.200
# 4. 验证服务
curl -k https://100.100.157.200:6443/healthz9. 附录
9.1 配置文件汇总
HAProxy 配置文件路径
- 主配置:
/etc/haproxy/haproxy.cfg - 备份配置:
/etc/haproxy/haproxy.cfg.backup
Keepalived 配置文件路径
- 主配置:
/etc/keepalived/keepalived.conf - 备份配置:
/etc/keepalived/keepalived.conf.backup - 通知脚本:
/etc/keepalived/notify.sh