常用管理操作
单节点环境
启动
adb_ctl start [-D DATADIR] [-l FILENAME] [-W] [-t SECS] [-s]
例:adb_ctl start -D /home/antdb/datapath
停止
adb_ctl stop [-D DATADIR] [-m SHUTDOWN-MODE] [-W] [-t SECS] [-s]
Shutdown modes are:
smart quit after all clients have disconnected
fast quit directly, with proper shutdown (default)
immediate quit without complete shutdown; will lead to recovery on restart
例:adb_ctl stop -D /home/antdb/datapath -m f
集中式高可用集群环境
adbdcs 集群的启停
启动 adbdcs 集群
-
分别以 AntDB 用户,例如 adb01 登录三台机器(一主二备)。
-
使用以下命令分别启动 adbdcs。
sudo systemctl start adbdcs
停止 adbdcs 集群
-
分别以 AntDB 用户,例如 adb01 登录三台机器(一主二备)。
-
使用以下命令分别停止 adbdcs。
sudo systemctl stop adbdcs
查看 adbdcs 节点启停状态
-
分别以 AntDB 用户,例如 adb01 登录三台机器(一主二备)。
-
使用以下命令分别停止 adbdcs。
sudo systemctl status adbdcs
错误排查
如果启动 adbdcs 或者停止 adbdcs 服务失败,请根据日志文件中的日志信息排查错误。
# adbdcs的日志在系统日志中,如果有问题,可以通过日志报错去调查
tail -f /var/log/messages
高可用集群的启停
启动集群
-
分别以 AntDB 用户,例如 adb01 登录三台机器(一主二备)。
-
使用以下命令分别启动 adbhamgr。
sudo systemctl start adbhamgr
说明
默认前提是集群搭建完毕:adbdcs 启动成功,主备搭建成功。具体搭建步骤请参考集中式安装部署手册。
停止集群
-
分别以 AntDB 用户,例如 adb01 登录三台机器(一主二备)。
-
使用以下命令分别停止 adbhamgr。
sudo systemctl stop adbhamgr
说明
adbhamgr 停止后,集群即停止成功。
重启集群
adbhamgrctl 的 restart 后直接跟集群名称,可以重启集群。--force 能强制重启集群。
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml restart antdb-cluster
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Replica | running | 467 | 0 |
| adbhamgr-02 | 10.19.36.206:55551 | Sync Standby | running | 467 | 0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader | running | 467 | |
+-------------+--------------------+--------------+---------+-----+-----------+
When should the restart take place (e.g. 2022-12-27T16:11) [now]:
Are you sure you want to restart members adbhamgr-03, adbhamgr-01, adbhamgr-02? [y/N]: y
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2) []:
Success: restart on member adbhamgr-03
Success: restart on member adbhamgr-01
Success: restart on member adbhamgr-02
#--force强制重启
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml restart antdb-cluster --force
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Sync Standby | running | 467 | 0 |
| adbhamgr-02 | 10.19.36.206:55551 | Replica | running | 467 | 0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader | running | 467 | |
+-------------+--------------------+--------------+---------+-----+-----------+
Success: restart on member adbhamgr-03
Success: restart on member adbhamgr-01
Success: restart on member adbhamgr-02
重启节点
adbhamgrctl 的 restart 后跟集群名称和节点名称,可以重启集群的节点。--force 能强制重启集群。
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml restart antdb-cluster adbhamgr-01
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Sync Standby | running | 467 | 0 |
| adbhamgr-02 | 10.19.36.206:55551 | Replica | running | 467 | 0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader | running | 467 | |
+-------------+--------------------+--------------+---------+-----+-----------+
When should the restart take place (e.g. 2022-12-27T16:19) [now]:
Are you sure you want to restart members adbhamgr-01? [y/N]: y
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2) []:
Success: restart on member adbhamgr-01
#--force强制重启
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml restart antdb-cluster adbhamgr-01 --force
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Replica | running | 467 | 0 |
| adbhamgr-02 | 10.19.36.206:55551 | Sync Standby | running | 467 | 0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader | running | 467 | |
+-------------+--------------------+--------------+---------+-----+-----------+
Success: restart on member adbhamgr-01
错误排查
如果启动 adbhamgr 或者停止 adbhamgr 服务失败,请根据日志文件中的日志信息排查错误。
# adbhamgr的日志在系统日志中,如果有问题,可以通过日志报错去调查
tail -f /var/log/messages
# 查adbhamgr日志的命令:
sudo systemctl status adbhamgr -l
sudo journalctl -f -u adbhamgr
数据库状态查询
用 adbhamgrctl 命令可以做如下操作,进行集群的维护和查询。
Usage: adbhamgrctl [OPTIONS] COMMAND [ARGS]...
Options:
-c, --config-file TEXT Configuration file
-d, --dcs TEXT Use this DCS
-k, --insecure Allow connections to SSL sites without certs
--help Show this message and exit.
Commands:
configure Create configuration file
dsn Generate a dsn for the provided member, defaults to a dsn of...
edit-config Edit cluster configuration
failover Failover to a replica
flush Flush scheduled events
list List the adbhamgr members for a given adbhamgr
pause Disable auto failover
query Query a adbhamgr PostgreSQL member
reinit Reinitialize cluster member
reload Reload cluster member configuration
remove Remove cluster from DCS
restart Restart cluster member
resume Resume auto failover
scaffold Create a structure for the cluster in DCS
show-config Show cluster configuration
switchover Switchover to a replica
version Output version of adbhamgrctl command or a running adbhamgr...
集中式高可用支持查看整个集群的状态,通过查询结果确认集群或者单个主机的运行状态是否正常。该命令在集群中的任意一个主机上执行,结果都一样。
#集群状态查询命令:
adbhamgrctl -c /etc/adbhamgr.yml list
例如,下面命令执行后,发现集群中三个节点 Leader、Sync Standby、Replica 都存在,且State为running,说明该集群处于正常状态。
antdb@adb06:~$ adbhamgrctl -c /etc/adbhamgr.yml list
+ Cluster: antdb-cluster (7348278630800196973) -----+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 192.168.10.101:55551 | Replica | running | 188 | 0 |
| adbhamgr-02 | 192.168.10.106:55551 | Sync Standby | running | 188 | 0 |
| adbhamgr-03 | 192.168.10.103:55551 | Leader | running | 188 | |
+-------------+----------------------+--------------+---------+-----+-----------+
参数说明
字段 | 字段含义 | 字段值 |
---|---|---|
Member | 集群中的节点成员名称 | 在 adbhamgr.yml 文件中自定义 |
Host | 集群中节点的 IP 和端口号 | 在 adbhamgr.yml 文件中设置,形式是 IP:PORT |
Role | 集群中节点的角色属性 | Leader:主节点;Sync Standby:同步备节点;Replica:异步备节点 |
State | 当前节点的状态 | running:运行中;crashed:节点奔溃中;creating replica:创建中;starting:启动中;stopped:节点停止 |
TL | “时间线”(Timeline) | 每当归档文件恢复完成后,创建一个新的时间线用来区别新生成的 WAL 记录。 |
Lag in MB | 节点之间相互同步的偏移量 | 正常为 0,代表主备之间同步成功。主节点压数据的时候,备节点还没及时同步则会出现大于 0的数值。 |
Pending restart | 等待重新启动 | 如果存在需要重启的节点,该列才会出现,用‘*’表示 |
Cluster | 集群名称,如 Cluster: antdb-cluster,代表这个集群名称是 antdb-cluster | 在 adbhamgr.yml 文件中自定义 |
主备切换
数据库在运行过程中,数据库管理员可能需要手工对数据库节点做主备切换。例如发现数据库节点主备 failover 后需要恢复原有的主备角色,或怀疑硬件故障需要手动进行主备切换。可以通过 switchover 或 failover,手动实现主备切换。
操作步骤
非故障切换:使用命令 adbhamgrctl -c /etc/adbhamgr.yml switchover 进行手动切换主备。
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml switchover
Master [adbhamgr-02]: adbhamgr-02 #输入当前主节点
Candidate ['adbhamgr-01', 'adbhamgr-03'] []: adbhamgr-01 #输入当前同步备节点(Sync Standby)
When should the switchover take place (e.g. 2022-12-27T12:14 ) [now]:
Current cluster topology
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Sync Standby | running | 465 | 0 |
| adbhamgr-02 | 10.19.36.206:55551 | Leader | running | 465 | |
| adbhamgr-03 | 10.19.36.207:55551 | Replica | running | 465 | 0 |
+-------------+--------------------+--------------+---------+-----+-----------+
Are you sure you want to switchover cluster antdb-cluster, demoting current master adbhamgr-02? [y/N]: y
#查看主备切换结果:
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml list
+ Cluster: antdb-cluster (7348278630800196973) ---+----------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+--------------------+--------------+----------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Sync Standby | running | 465 | 0 |
| adbhamgr-02 | 10.19.36.206:55551 | Leader | stopping | | |
| adbhamgr-03 | 10.19.36.207:55551 | Replica | running | 465 | 0 |
+-------------+--------------------+--------------+----------+-----+-----------+
#Leader由adbhamgr-02切换到了adbhamgr-01
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml list
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Leader | running | 466 | |
| adbhamgr-02 | 10.19.36.206:55551 | Replica | running | 466 | 0 |
| adbhamgr-03 | 10.19.36.207:55551 | Sync Standby | running | 466 | 0 |
+-------------+--------------------+--------------+---------+-----+-----------+
故障切换:使用命令 adbhamgrctl -c /etc/adbhamgr.yml failover 进行手动切换主备。
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml failover
Candidate ['adbhamgr-02', 'adbhamgr-03'] []: adbhamgr-03 #输入当前同步备节点(Sync Standby)
Current cluster topology
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Leader | running | 466 | |
| adbhamgr-02 | 10.19.36.206:55551 | Replica | running | 466 | 0 |
| adbhamgr-03 | 10.19.36.207:55551 | Sync Standby | running | 466 | 0 |
+-------------+--------------------+--------------+---------+-----+-----------+
Are you sure you want to failover cluster antdb-cluster, demoting current master adbhamgr-01? [y/N]: y
#查看主备切换结果:
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml list
+ Cluster: antdb-cluster (7348278630800196973) ---+----------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+--------------------+--------------+----------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Leader | stopping | | |
| adbhamgr-02 | 10.19.36.206:55551 | Replica | running | 466 | 0 |
| adbhamgr-03 | 10.19.36.207:55551 | Sync Standby | running | 466 | 0 |
+-------------+--------------------+--------------+----------+-----+-----------+
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml list
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Replica | stopped | | unknown |
| adbhamgr-02 | 10.19.36.206:55551 | Sync Standby | running | 467 | 0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader | running | 467 | |
+-------------+--------------------+--------------+---------+-----+-----------+
#Leader由adbhamgr-01切换到了adbhamgr-03
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml list
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.29:55551 | Replica | running | 467 | 0 |
| adbhamgr-02 | 10.19.36.206:55551 | Sync Standby | running | 467 | 0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader | running | 467 | |
+-------------+--------------------+--------------+---------+-----+-----------+
重新初始化节点
adbhamgrctl 的 reinit 后跟集群名称,并选择对应的节点,可以重新初始化集群的某节点。--force 能强制重新初始化。
# 可以在交互式选项里面选择需要重启的节点
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml reinit antdb-cluster
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Replica | running | 467 | 0 |
| adbhamgr-02 | 10.19.36.206:55551 | Sync Standby | running | 467 | 0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader | running | 467 | |
+-------------+--------------------+--------------+---------+-----+-----------+
Which member do you want to reinitialize [adbhamgr-02, adbhamgr-03, adbhamgr-01]? []: adbhamgr-01
Are you sure you want to reinitialize members adbhamgr-01? [y/N]: y
Success: reinitialize for member adbhamgr-01
# 也可以在命令行直接输入需要重新初始化的节点,--force能强制重新初始化。
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml reinit antdb-cluster adbhamgr-01 --force
Success: reinitialize for member adbhamgr-01
参考
ADBDCS 常用操作
集群信息查询
用 adbdcsctl 命令对 adbdcs 集群做如下操作,进行 adbdcs 集群的维护和查询。
NAME:
adbdcsctl - A simple command line client for adbdcs.
WARNING:
Environment variable adbdcsCTL_API is not set; defaults to adbdcsctl v2.
Set environment variable adbdcsCTL_API=3 to use v3 API or adbdcsCTL_API=2 to use v2 API.
USAGE:
adbdcsctl [global options] command [command options] [arguments...]
VERSION:
3.3.18
COMMANDS:
backup backup an adbdcs directory
cluster-health check the health of the adbdcs cluster
mk make a new key with a given value
mkdir make a new directory
rm remove a key or a directory
rmdir removes the key if it is an empty directory or a key-value pair
get retrieve the value of a key
ls retrieve a directory
set set the value of a key
setdir create a new directory or update an existing directory TTL
update update an existing key with a given value
updatedir update an existing directory
watch watch a key for changes
exec-watch watch a key for changes and exec an executable
member member add, remove and list subcommands
user user add, grant and revoke subcommands
role role add, grant and revoke subcommands
auth overall auth controls
help, h Shows a list of commands or help for one command
GLOBAL OPTIONS:
--debug output cURL commands which can be used to reproduce the request
--no-sync don't synchronize cluster information before sending request
--output simple, -o simple output response in the given format (simple, `extended` or `json`) (default: "simple")
--discovery-srv value, -D value domain name to query for SRV records describing cluster endpoints
--insecure-discovery accept insecure SRV records describing cluster endpoints
--peers value, -C value DEPRECATED - "--endpoints" should be used instead
--endpoint value DEPRECATED - "--endpoints" should be used instead
--endpoints value a comma-delimited list of machine addresses in the cluster (default: "http://127.0.0.1:2379,http://127.0.0.1:4001")
--cert-file value identify HTTPS client using this SSL certificate file
--key-file value identify HTTPS client using this SSL key file
--ca-file value verify certificates of HTTPS-enabled servers using this CA bundle
--username value, -u value provide username[:password] and prompt if password is not supplied.
--timeout value connection timeout per request (default: 2s)
--total-timeout value timeout for the command execution (except watch) (default: 5s)
--help, -h show help
--version, -v print the version
- 使用
member list
选项查看 adbdcs 集群中的节点成员情况:
# 下述命令中--endpoints需要指定集群的计算机地址列表。其中127.0.0.1代表本机,12379为端口号。
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 member list
338f9fdae9331534: name=adbdcs-2 peerURLs=http://10.21.10.242:12380 clientURLs=http://10.21.10.242:12379,http://127.0.0.1:12379 isLeader=true
9ab50241714c014f: name=adbdcs-3 peerURLs=http://10.21.10.243:12380 clientURLs=http://10.21.10.243:12379,http://127.0.0.1:12379 isLeader=false
d97b22cbde6ee848: name=adbdcs-1 peerURLs=http://10.21.10.241:12380 clientURLs=http://10.21.10.241:12379,http://127.0.0.1:12379 isLeader=false
- 使用
cluster-health
选项查看 adbdcs 集群中的健康状况:
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 cluster-health
member 338f9fdae9331534 is healthy: got healthy result from http://10.21.10.242:12379
member 9ab50241714c014f is healthy: got healthy result from http://10.21.10.243:12379
member d97b22cbde6ee848 is healthy: got healthy result from http://10.21.10.241:12379
- 使用
ls
选项查看 adbdcs 集群中的数据目录结构:
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 ls
/service
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 ls /service
/service/antdbcluster
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 ls /service/antdbcluster
/service/antdbcluster/sync
/service/antdbcluster/config
/service/antdbcluster/status
/service/antdbcluster/history
/service/antdbcluster/members
/service/antdbcluster/initialize
/service/antdbcluster/leader
- 使用
get
选项获取 adbdcs 集群中的存储的节点信息:
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 ls /service/antdbcluster/members
/service/antdbcluster/members/adbhamgr-2
/service/antdbcluster/members/adbhamgr-3
/service/antdbcluster/members/adbhamgr-1
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 get /service/antdbcluster/members/adbhamgr-1
{"conn_url":"postgres://10.21.10.241:55551/postgres","api_url":"http://10.21.10.241:8008/adbhamgr","state":"running","role":"master","version":"2.1.5","is_far_sync":false,"xlog_location":83886968,"timeline":4}