Keepalived + HAProxy 高可用集群安装部署文档

文档版本

  • 版本:v1.0
  • 本次用于环境:Kubernetes 高可用控制平面 VIP 管理

1. 环境规划

1.1 服务器信息

主机名IP地址角色Keepalived优先级VIP
node1100.100.157.10负载均衡节点1101100.100.157.200
node2100.100.157.11负载均衡节点2100100.100.157.200
node3100.100.157.12负载均衡节点399100.100.157.200

1.2 网络规划

  • 虚拟IP(VIP):100.100.157.200/24
  • 服务端口:6443(Kubernetes API Server)
  • VRRP组ID:51
  • 认证密码:k8svip

2. 前置准备

2.1 所有节点执行

# 更新系统
apt update && apt upgrade -y

# 安装必要工具
apt install -y curl wget vim net-tools nftables

# 关闭防火墙或配置放行规则(生产环境建议配置精确规则)
systemctl stop ufw
systemctl disable ufw

2.2 网络配置检查

# 确认网卡名称(本环境为 ens34)
ip addr show

# 确认网络互通
ping -c 3 100.100.157.10
ping -c 3 100.100.157.11
ping -c 3 100.100.157.12

# 检查 VRRP 协议是否允许(协议号 112)
nft list ruleset

3. HAProxy 安装配置

3.1 所有节点安装 HAProxy

# 安装 HAProxy
apt install -y haproxy

# 检查版本
haproxy -v

3.2 配置 HAProxy(所有节点配置相同)

# 备份原配置
cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.bak

# 创建新配置
cat > /etc/haproxy/haproxy.cfg << 'EOF'
global
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
    maxconn 4096

defaults
    log global
    mode tcp          # Kubernetes API 使用 TCP 模式
    option tcplog     # TCP 日志格式
    option dontlognull
    timeout connect 5s
    timeout client  30s
    timeout server  30s
    retries 3

# 统计页面(可选)
frontend stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats admin if LOCALHOST

# Kubernetes API Server 负载均衡
frontend k8s-api
    bind *:6443
    mode tcp
    default_backend k8s-api-backend

backend k8s-api-backend
    mode tcp
    balance roundrobin
    option tcp-check
    tcp-check connect port 6443 ssl
    tcp-check send "PING\r\n"
    tcp-check expect string "pong"
  
    server node1 100.100.157.10:6443 check fall 3 rise 2 inter 2000 weight 1
    server node2 100.100.157.11:6443 check fall 3 rise 2 inter 2000 weight 1
    server node3 100.100.157.12:6443 check fall 3 rise 2 inter 2000 weight 1
EOF

3.3 验证配置并启动 HAProxy

# 验证配置文件语法
haproxy -c -f /etc/haproxy/haproxy.cfg

# 创建 haproxy 用户(如果不存在)
id haproxy || useradd -r haproxy

# 创建必要的目录
mkdir -p /var/lib/haproxy
chown -R haproxy:haproxy /var/lib/haproxy

# 启动并设置开机自启
systemctl start haproxy
systemctl enable haproxy
systemctl status haproxy

# 检查端口监听
ss -tlnp | grep 6443
ss -tlnp | grep 8404

4. Keepalived 安装配置

4.1 所有节点安装 Keepalived

# 安装 Keepalived
apt install -y keepalived

# 检查版本
keepalived --version

4.2 节点差异化配置

node1 (100.100.157.10) 配置

# 创建配置文件
cat > /etc/keepalived/keepalived.conf << 'EOF'
! Configuration File for keepalived - Kubernetes HA Control Plane VIP

global_defs {
   router_id K8S_VIP_NODE1
   vrrp_skip_check_adv_addr
   #vrrp_strict  # 生产环境建议启用,但需要配置防火墙
   enable_script_security
   script_user root
   max_auto_priority
}

vrrp_script chk_haproxy {
    script "/bin/sh -c '/usr/bin/pgrep haproxy > /dev/null 2>&1'"
    interval 2
    weight 2
    fall 3
    rise 2
    timeout 2
    user root
}

vrrp_instance VI_K8S_API {
    state BACKUP            # 所有节点都设为 BACKUP,通过优先级选举
    interface ens34         # 根据实际网卡名修改
    virtual_router_id 51    # 所有节点必须相同
    priority 101            # node1 优先级最高
  
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass k8svip    # 所有节点密码相同
    }
  
    # 单播模式配置(避免多播/广播问题)
    unicast_src_ip 100.100.157.10   # 本机 IP
  
    unicast_peer {
        100.100.157.11              # node2
        100.100.157.12              # node3
    }
  
    virtual_ipaddress {
        100.100.157.200/24 dev ens34 label ens34:vip
    }
  
    track_script {
        chk_haproxy
    }
  
    # 非抢占模式,避免频繁切换
    nopreempt
    preempt_delay 300
  
    # 状态切换通知脚本(可选)
    # notify /etc/keepalived/notify.sh
    # notify_master "/etc/keepalived/notify.sh master"
    # notify_backup "/etc/keepalived/notify.sh backup"
    # notify_fault "/etc/keepalived/notify.sh fault"
  
    # 调试日志(生产环境可关闭)
    debug
}
EOF

node2 (100.100.157.11) 配置

# 与 node1 配置基本相同,只修改以下参数:
# router_id K8S_VIP_NODE2
# unicast_src_ip 100.100.157.11
# priority 100

node3 (100.100.157.12) 配置

# 与 node1 配置基本相同,只修改以下参数:
# router_id K8S_VIP_NODE3
# unicast_src_ip 100.100.157.12
# priority 99

4.3 启动并验证 Keepalived

# 验证配置文件语法
keepalived -t -f /etc/keepalived/keepalived.conf

# 启动 Keepalived
systemctl start keepalived
systemctl enable keepalived
systemctl status keepalived

# 查看实时日志
journalctl -u keepalived -f

5. 功能验证

5.1 基础功能验证

# 检查 VIP 绑定(应该在 MASTER 节点上)
ip addr show ens34 | grep 100.100.157.200

# 检查 Keepalived 状态
systemctl status keepalived

# 检查 HAProxy 状态
systemctl status haproxy

# 检查端口监听
ss -tlnp | grep -E '(6443|8404)'

# 查看进程
ps aux | grep -E '(keepalived|haproxy)'

5.2 故障切换测试

# 1. 查看当前 MASTER 节点
cat /var/log/syslog | grep "Entering MASTER" | tail -1

# 2. 停止当前 MASTER 节点的 Keepalived
systemctl stop keepalived

# 3. 查看哪个节点接管了 VIP(等待约10秒)
ip addr show ens34 | grep 100.100.157.200

# 4. 恢复原 MASTER 节点
systemctl start keepalived

# 5. 验证是否保持当前状态(因为配置了 nopreempt)

# 6. 切换选举日志
 journalctl -u keepalived -f
Jan 16 15:45:15 hxy Keepalived[41811]: Startup complete
Jan 16 15:45:15 hxy systemd[1]: Started keepalived.service - Keepalive Daemon (LVS and VRRP).
Jan 16 15:45:15 hxy Keepalived_vrrp[41812]: (VI_K8S_API) Entering BACKUP STATE (init)
Jan 16 15:45:15 hxy Keepalived_vrrp[41812]: VRRP_Script(chk_haproxy) succeeded
Jan 16 15:45:15 hxy Keepalived_vrrp[41812]: (VI_K8S_API) Changing effective priority from 100 to 102
Jan 16 15:45:15 hxy Keepalived_vrrp[41812]: (VI_K8S_API) received lower priority (101) advert from 100.100.157.10 - discarding
Jan 16 15:45:16 hxy Keepalived_vrrp[41812]: (VI_K8S_API) received lower priority (101) advert from 100.100.157.10 - discarding
Jan 16 15:45:17 hxy Keepalived_vrrp[41812]: (VI_K8S_API) received lower priority (101) advert from 100.100.157.10 - discarding
Jan 16 15:45:18 hxy Keepalived_vrrp[41812]: (VI_K8S_API) received lower priority (101) advert from 100.100.157.10 - discarding
Jan 16 15:45:18 hxy Keepalived_vrrp[41812]: (VI_K8S_API) Entering MASTER STATE
Jan 16 15:47:47 hxy Keepalived_vrrp[41812]: (VI_K8S_API) Master received advert from 100.100.157.10 with higher priority 103, ours 102
Jan 16 15:47:47 hxy Keepalived_vrrp[41812]: (VI_K8S_API) Entering BACKUP STATE

5.3 HAProxy 健康检查测试

# 测试 HAProxy 统计页面
curl http://100.100.157.200:8404/stats

# 测试负载均衡功能
for i in {1..10}; do
    echo "请求 $i: $(curl -k -s https://100.100.157.200:6443/healthz 2>/dev/null || echo 'failed')"
done

# 查看 HAProxy 后端状态
echo "show stat" | socat /run/haproxy/admin.sock stdio

6. QA故障排除

6.1 常见问题及解决方案

问题1:VIP 未绑定

# 检查 Keepalived 状态
systemctl status keepalived
journalctl -u keepalived -f

# 手动检查 VIP
ip addr show ens34

# 检查防火墙是否阻止 VRRP
nft list ruleset | grep vrrp

问题2:脚本检查失败

# 手动测试健康检查脚本
/bin/sh -c '/usr/bin/pgrep haproxy > /dev/null 2>&1'
echo $?

# 检查 haproxy 进程
pgrep haproxy
systemctl status haproxy

问题3:节点间通信问题

# 测试节点间通信
ping -c 3 100.100.157.11
ping -c 3 100.100.157.12

# 检查防火墙规则
nft list ruleset

# 检查 VRRP 单播配置
grep -A5 "unicast_" /etc/keepalived/keepalived.conf

问题4:频繁主备切换

# 检查优先级配置
grep priority /etc/keepalived/keepalived.conf

# 检查网络延迟
ping -c 10 100.100.157.11 | grep rtt

# 调整 advert_int(增加通告间隔)
# advert_int 2

7.日志分析

# 查看完整日志
journalctl -u keepalived --no-pager -n 100

# 过滤重要信息
journalctl -u keepalived | grep -E "(MASTER|BACKUP|FAULT|priority|advert)"

# 实时监控
journalctl -u keepalived -f

8. 灾难恢复

# 如果所有节点都故障,按以下顺序恢复:
# 1. 启动 haproxy
systemctl start haproxy

# 2. 启动 keepalived
systemctl start keepalived

# 3. 检查 VIP 绑定
ip addr show ens34 | grep 100.100.157.200

# 4. 验证服务
curl -k https://100.100.157.200:6443/healthz

9. 附录

9.1 配置文件汇总

HAProxy 配置文件路径

  • 主配置:/etc/haproxy/haproxy.cfg
  • 备份配置:/etc/haproxy/haproxy.cfg.backup

Keepalived 配置文件路径

  • 主配置:/etc/keepalived/keepalived.conf
  • 备份配置:/etc/keepalived/keepalived.conf.backup
  • 通知脚本:/etc/keepalived/notify.sh
正文到此结束
  • 本文作者:xinyu.he
  • 文章标题:Keepalived + HAProxy 高可用集群
  • 本文地址:https://www.hxy.bj.cn/archives/769/
  • 版权说明:若无注明,本文皆Xinyu.he blog原创,转载请保留文章出处。
最后修改:2026 年 01 月 17 日
如果觉得我的文章对你有用,请随意赞赏