高可用容灾环境部署
AntDB 数据库采用复制组多副本架构,保证数据库的高可用性。同时,AntDB 支持单机、机房和城市级别的容灾,提供双中心部署方案。用户可根据机房配置和容灾需求,选择合适的方案进行部署。
主中心部署
参考【分布式高可用集群模式部署】搭建一主两备高可用集群。
搭建完成,集群如下所示:
[antdb@localhost ~]$ adbhamgrctl -c /etc/adbhamgr/adbhamgr_antdbcluster.yaml list
+ Cluster: antdbcluster (7325731413001351384) --+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+------------+-------------------+--------------+---------+----+-----------+
| adbhamgr-1 | 10.21.10.175:4567 | Leader | running | 2 | |
| adbhamgr-2 | 10.21.10.176:4567 | Replica | running | 2 | 0 |
| adbhamgr-3 | 10.21.10.177:4567 | Sync Standby | running | 2 | 0 |
+------------+-------------------+--------------+---------+----+-----------+
登陆主中心的 mgr 执行:
ADD hba gtmcoord all("host all all 0.0.0.0 0 trust");
ADD hba coordinator all("host all all 0.0.0.0 0 trust");
ADD hba datanode all("host all all 0.0.0.0 0 trust");
修改 postgresql.conf 文件
将主中心的 postgresql.conf 文件中 listen_addresses 字段值由默认的 localhost 改为*,或者 IP 地址列表。将 wal_level 改为 hot_standby,参考如下:
listen_addresses = '*'
max_wal_senders = 5
wal_keep_size = 5120
wal_level = replica
hot_standby = on
log_destination = 'csvlog'
logging_collector = on
log_directory = 'pg_log'
mgr_zone ='local' #按实际主中心的zone名称来配置,默认为local
重启 mgr
mgr_ctl restart -D /home/antdb/data/mgr
副中心部署
一键式部署
参考【分布式高可用集群模式部署】,搭建一主两备高可用集群,方法和搭建主中心一致。
例如,集群如下所示,此时主副中心是两个独立的集群。
[antdb@localhost mgr]$ adbhamgrctl -c /etc/adbhamgr/adbhamgr_antdbcluster.yaml list
+ Cluster: antdbcluster (7325731413001351384) ----+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+------------+-------------------+----------------+---------+----+-----------+
| adbhamgr-1 | 10.21.10.180:4567 | Leader | running | 2 | |
| adbhamgr-2 | 10.21.10.181:4567 | Replica | running | 2 | 0 |
| adbhamgr-3 | 10.21.10.191:4567 | Sync Standby | running | 2 | 0 |
+------------+-------------------+----------------+---------+----+-----------+
停止副中心
- 登录副中心 mgr,停止原有服务并清除信息:
stop all;
clean all;
stop agent all;
- 停止 adbhamgr 服务(所有节点都要执行):
sudo systemctl stop adbhamgr
使用 adbhamgr 的 remove 命令清理集群数据(一个节点执行即可):
# remove后跟的参数“antdbcluster”为集群名称
[antdb@localhost ~]$ adbhamgrctl -c /etc/adbhamgr/adbhamgr_antdbcluster.yaml remove antdbcluster
+ Cluster: antdbcluster (7326804842864827307) --+
| Member | Host | Role | State | TL | Lag in MB |
+--------+------+------+-------+----+-----------+
+--------+------+------+-------+----+-----------+
Please confirm the cluster name to remove: antdbcluster
You are about to remove all information in DCS for antdbcluster, please type: "Yes I am aware": Yes I am aware
重建副中心
步骤一:修改主中心的 hba 文件
重建副中心需要执行 adb_basebackup 命令,因此需要配置 hba,从而允许备机通过流复制连接到主节点。
- 修改主节点的 pg_hba.conf 文件,例如:
host replication all 0.0.0.0/0 trust
- reload 主节点,使配置生效;
mgr_ctl reload -D /home/antdb/data/mgr
步骤二:主中心的主节点创建复制槽(可选):
# 记录下创建的复制槽名称为adbhamgr(自定义)
select pg_create_physical_replication_slot('adbhamgr', true);
步骤三:重建副中心(每个节点都执行,操作一样)
- 清理数据目录;
rm -rf /home/antdb/data
- 修改 adbhamgr 的配置文件;
# 副中心adbhamgr的yml配置文件中bootstrap.dcs下添加 standby_cluster 参数块,如下:
bootstrap:
# this section will be written into adbdcs:/<namespace>/<scope>/config after initializing new cluster
# and all other cluster members will use it as a `global configuration`
dcs:
……
standby_cluster: # 添加standby_cluster参数设置,注意对其格式
host: 10.21.10.175 # 主中心某个节点ip, 执行pg_basebackup和建立流复制的远程节点
port: 4567 # 远程节点的端口
primart_slot_name: adbhamgr # 建立流复制的复制槽,如果使用,需要手动在主中心建立相应的复制槽。(步骤二创建)
create_replica_methods: # 建立基础备份使用的方法,一般为pg_basebackup
- basebackup
注意:host设置的节点与primart_slot_name所在的节点要保持一致。
这里建议设置为主中心的主节点,与上述步骤一、二都是以主中心的主节点操作保持一致。
- 最后,启动各节点的 adbhamgr;
# 启动各节点的adbhamgr:
sudo systemctl start adbhamgr
# 成功后查看集群状态,副中心的主节点为Standby Leader,其余节点都是Replica
[antdb@localhost ~]$ adbhamgrctl -c /etc/adbhamgr/adbhamgr_antdbcluster.yaml list
+ Cluster: antdbcluster (7325731413001351384) ----+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+------------+-------------------+----------------+---------+----+-----------+
| adbhamgr-1 | 10.21.10.180:4567 | Standby Leader | running | 2 | |
| adbhamgr-2 | 10.21.10.181:4567 | Replica | running | 2 | 0 |
| adbhamgr-3 | 10.21.10.191:4567 | Replica | running | 2 | 0 |
+------------+-------------------+----------------+---------+----+-----------+
# 查看副中心主节点的流复制,可以看到副中心的主节点的流复制源是主中心的主节点
[antdb@localhost ~]$ adb -d antdb -p 60103
psql (13.3)
Type "help" for help.
antdb=# \x
Expanded display is on.
antdb=# SELECT * FROM pg_stat_wal_receiver;
pid | status | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli | last_msg_send_time | last_msg_receipt_
time | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port |
conninfo
--------+-----------+-------------------+-------------------+-------------+-------------+--------------+-------------------------------+----------------------
---------+----------------+-------------------------------+-----------+--------------+-------------+----------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
559514 | streaming | 0/30000000 | 1 | 0/30000148 | 0/30000148 | 1 | 2024-01-22 17:57:19.931312+08 | 2024-01-22 17:57:19.9
38772+08 | 0/30000148 | 2024-01-22 16:09:00.814624+08 | ds1c | 10.21.10.175 | 60104 | user=antdb passfile=/home/antdb/.pgpass channel_binding=p
refer dbname=replication host=10.21.10.175 port=60104 application_name=ds1c fallback_application_name=walreceiver sslmode=prefer sslcompression=0 ssl_min_prot
ocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any contype=6655
(1 row)
# 查看副中心备节点的流复制,可以看到副中心的备节点的流复制源是副中心的主节点
antdb=# SELECT * FROM pg_stat_wal_receiver;
pid | status | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli | last_msg_send_time | last_msg_receipt_
time | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port |
conninfo
--------+-----------+-------------------+-------------------+-------------+-------------+--------------+-------------------------------+----------------------
---------+----------------+-------------------------------+-----------+--------------+-------------+----------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
982561 | streaming | 0/30000000 | 1 | 0/30000148 | 0/30000148 | 1 | 2024-01-22 17:59:50.596492+08 | 2024-01-22 17:59:50.6
02242+08 | 0/30000148 | 2024-01-22 16:09:01.068282+08 | ds2c | 10.21.10.176 | 60105 | user=antdb passfile=/home/antdb/.pgpass channel_binding=p
refer dbname=replication host=10.21.10.176 port=60105 application_name=ds2c fallback_application_name=walreceiver sslmode=prefer sslcompression=0 ssl_min_prot
ocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any contype=6655
(1 row)
步骤四:修改 postgresql.conf 文件(所有节点都要执行)
修改副中心 mgr 中 postgresql.conf 文件中的 mgr_zone 名称,将其改为 “zone2”,即副中心的 zone 名称。
mgr_zone = 'zone2' #根据实际需要进行设置
重启副中心 mgr:
mgr_ctl restart -D /home/antdb/data/mgr
部署副中心节点
添加和部署副中心主机
注意:下面的操作只能在主中心 mgr 上执行
# 登陆主机的mgr执行如下添加命令:
# 如下,追加了三个host
# 各参数的值,请根据部署环境进行更改
ADD host host01(port=22,protocol='ssh',adbhome='/home/antdb/app',address="10.21.10.180",agentport=8432,user='antdb');
ADD host host02(port=22,protocol='ssh',adbhome='/home/antdb/app',address="10.21.10.181",agentport=8432,user='antdb');
ADD host host03(port=22,protocol='ssh',adbhome='/home/antdb/app',address="10.21.10.191",agentport=8432,user='antdb');
start agent all password 'XXX';
添加副中心节点
注意:下面的操作只能在主中心 mgr 上执行
注意:主中心的所有 master 节点必须在副中心有一个直属的 slave 节点,slave 节点可以有级联 slave 节点。
--例如:主中心有gtmcoord master gc; coordinator master coord0和coord1; datanode master dm0和dm1
--那么副中心至少要为以上master节点部署一个对应的备节点
--副中心的节点的zone应该填写上述步骤中,postgresql.conf文件中设置的副中心的zone
ADD gtmcoord slave gc_4 for gc_1(host='host02',port=60103, path='/home/antdb/data/gc_4',zone='zone2');
ADD datanode slave ds1c for dn1(host='host01',port=60104, path='/home/antdb/data/ds1a',zone='zone2');
ADD datanode slave ds2c for dn2(host='host02',port=60104, path='/home/antdb/data/ds2c',zone='zone2');
ADD datanode slave ds3c for dn3(host='host03',port=60104, path='/home/antdb/data/ds3c',zone='zone2');
ADD coordinator slave cna for cn1(path = '/home/antdb/data/cna', host='host01', port=7788,zone='zone2');
ADD coordinator slave cnb for cn2(path = '/home/antdb/data/cnb', host='host02', port=7788,zone='zone2');
ADD coordinator slave cnc for cn3(path = '/home/antdb/data/cnc', host='host03', port=7788,zone='zone2');
查看属于指定副中心的所有节点:
LIST node zone zone_name;
例如:
antdb=# LIST NODE ZONE zone2;
name | host | type | mastername | port | sync_state | path | initialized | incluster | zone
------+--------+-------------------+------------+-------+------------+------------------------+-------------+-----------+-------
gc_4 | host01 | gtmcoord slave | gc_1 | 60103 | async | /home/antdb/data1/gc_4 | f | f | zone2
cna | host01 | coordinator slave | cn1 | 7788 | async | /home/antdb/data1/cna | f | f | zone2
cnb | host02 | coordinator slave | cn2 | 7788 | async | /home/antdb/data1/cnb | f | f | zone2
cnc | host03 | coordinator slave | cn3 | 7788 | async | /home/antdb/data1/cnc | f | f | zone2
ds1c | host01 | datanode slave | dn1 | 60104 | async | /home/antdb/data1/ds1a | f | f | zone2
ds2c | host02 | datanode slave | dn2 | 60104 | async | /home/antdb/data1/ds2c | f | f | zone2
ds3c | host03 | datanode slave | dn3 | 60104 | async | /home/antdb/data1/ds3c | f | f | zone2
(7 rows)
副中心节点初始化
方法一:可以使用 append 的方式逐个初始化副中心的节点。例如:
append gtmcoord slave gc_4;
append gtmcoord slave ds1c;
append datanode slave ds2c;
append datanode slave ds3c;
append datanode slave cna;
append datanode slave cnb;
append datanode slave cnc;
方法二:可以使用 zone init 命令一次添加 zone2 中的所有节点 ,例如:
ZONE INIT zone2;
查看检查初始化后副中心节点的状态:
LIST node zone zone_name;
例如:
antdb=# LIST NODE ZONE zone2;
name | host | type | mastername | port | sync_state | path | initialized | incluster | zone
------+--------+-------------------+------------+-------+------------+------------------------+-------------+-----------+-------
gc_4 | host01 | gtmcoord slave | gc_1 | 60103 | async | /home/antdb/data1/gc_4 | t | t | zone2
cna | host01 | coordinator slave | cn1 | 7788 | async | /home/antdb/data1/cna | t | t | zone2
cnb | host02 | coordinator slave | cn2 | 7788 | async | /home/antdb/data1/cnb | t | t | zone2
cnc | host03 | coordinator slave | cn3 | 7788 | async | /home/antdb/data1/cnc | t | t | zone2
ds1c | host01 | datanode slave | dn1 | 60104 | async | /home/antdb/data1/ds1a | t | t | zone2
ds2c | host02 | datanode slave | dn2 | 60104 | async | /home/antdb/data1/ds2c | t | t | zone2
ds3c | host03 | datanode slave | dn3 | 60104 | async | /home/antdb/data1/ds3c | t | t | zone2
(7 rows)
此时,副中心建立完成。