1.Base基础/3.Icon图标/操作/search备份
1.Base基础/3.Icon图标/操作/search备份
EN
文档
关于AntDB
部署与升级
快速入门
使用教程
调优
工具和插件
高级服务
数据安全
参考
  • 文档首页 /
  • 运维 /
  • 集中式运维手册 /
  • 常用管理操作

常用管理操作

更新时间:2024-07-01 14:39:42

单节点环境

启动

adb_ctl start [-D DATADIR] [-l FILENAME] [-W] [-t SECS] [-s]

例:adb_ctl start -D /home/antdb/datapath

停止

adb_ctl stop [-D DATADIR] [-m SHUTDOWN-MODE] [-W] [-t SECS] [-s]
Shutdown modes are:
  smart       quit after all clients have disconnected
  fast        quit directly, with proper shutdown (default)
  immediate   quit without complete shutdown; will lead to recovery on restart
  
例:adb_ctl stop -D /home/antdb/datapath -m f

集中式高可用集群环境

adbdcs 集群的启停

启动 adbdcs 集群

  • 分别以 AntDB 用户,例如 adb01 登录三台机器(一主二备)。

  • 使用以下命令分别启动 adbdcs。

sudo systemctl start adbdcs

停止 adbdcs 集群

  • 分别以 AntDB 用户,例如 adb01 登录三台机器(一主二备)。

  • 使用以下命令分别停止 adbdcs。

sudo systemctl stop adbdcs

查看 adbdcs 节点启停状态

  • 分别以 AntDB 用户,例如 adb01 登录三台机器(一主二备)。

  • 使用以下命令分别停止 adbdcs。

sudo systemctl status adbdcs

错误排查

如果启动 adbdcs 或者停止 adbdcs 服务失败,请根据日志文件中的日志信息排查错误。

# adbdcs的日志在系统日志中,如果有问题,可以通过日志报错去调查
tail -f /var/log/messages 

高可用集群的启停

启动集群

  • 分别以 AntDB 用户,例如 adb01 登录三台机器(一主二备)。

  • 使用以下命令分别启动 adbhamgr。

sudo systemctl start adbhamgr

说明

默认前提是集群搭建完毕:adbdcs 启动成功,主备搭建成功。具体搭建步骤请参考集中式安装部署手册。

停止集群

  • 分别以 AntDB 用户,例如 adb01 登录三台机器(一主二备)。

  • 使用以下命令分别停止 adbhamgr。

sudo systemctl stop adbhamgr

说明

adbhamgr 停止后,集群即停止成功。

重启集群

adbhamgrctl 的 restart 后直接跟集群名称,可以重启集群。--force 能强制重启集群。

[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml restart antdb-cluster
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member      | Host               | Role         | State   | TL  | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Replica      | running | 467 |         0 |
| adbhamgr-02 | 10.19.36.206:55551 | Sync Standby | running | 467 |         0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader       | running | 467 |           |
+-------------+--------------------+--------------+---------+-----+-----------+
When should the restart take place (e.g. 2022-12-27T16:11)  [now]:
Are you sure you want to restart members adbhamgr-03, adbhamgr-01, adbhamgr-02? [y/N]: y
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2)  []:
Success: restart on member adbhamgr-03
Success: restart on member adbhamgr-01
Success: restart on member adbhamgr-02

#--force强制重启
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml restart antdb-cluster --force
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+----+-----------+
| Member      | Host               | Role         | State   | TL  | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Sync Standby | running | 467 |         0 |
| adbhamgr-02 | 10.19.36.206:55551 | Replica      | running | 467 |         0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader       | running | 467 |           |
+-------------+--------------------+--------------+---------+-----+-----------+
Success: restart on member adbhamgr-03
Success: restart on member adbhamgr-01
Success: restart on member adbhamgr-02

重启节点

adbhamgrctl 的 restart 后跟集群名称和节点名称,可以重启集群的节点。--force 能强制重启集群。

[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml restart antdb-cluster adbhamgr-01
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member      | Host               | Role         | State   | TL  | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Sync Standby | running | 467 |         0 |
| adbhamgr-02 | 10.19.36.206:55551 | Replica      | running | 467 |         0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader       | running | 467 |           |
+-------------+--------------------+--------------+---------+-----+-----------+
When should the restart take place (e.g. 2022-12-27T16:19)  [now]:
Are you sure you want to restart members adbhamgr-01? [y/N]: y
Restart if the PostgreSQL version is less than provided (e.g. 9.5.2)  []:
Success: restart on member adbhamgr-01

#--force强制重启
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml restart antdb-cluster adbhamgr-01 --force
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member      | Host               | Role         | State   | TL  | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Replica      | running | 467 |         0 |
| adbhamgr-02 | 10.19.36.206:55551 | Sync Standby | running | 467 |         0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader       | running | 467 |           |
+-------------+--------------------+--------------+---------+-----+-----------+
Success: restart on member adbhamgr-01

错误排查

如果启动 adbhamgr 或者停止 adbhamgr 服务失败,请根据日志文件中的日志信息排查错误。

# adbhamgr的日志在系统日志中,如果有问题,可以通过日志报错去调查
tail -f /var/log/messages 

# 查adbhamgr日志的命令:
sudo systemctl status adbhamgr -l
sudo journalctl -f -u adbhamgr  

数据库状态查询

用 adbhamgrctl 命令可以做如下操作,进行集群的维护和查询。

Usage: adbhamgrctl [OPTIONS] COMMAND [ARGS]...

Options:
  -c, --config-file TEXT  Configuration file
  -d, --dcs TEXT          Use this DCS
  -k, --insecure          Allow connections to SSL sites without certs
  --help                  Show this message and exit.

Commands:
  configure    Create configuration file
  dsn          Generate a dsn for the provided member, defaults to a dsn of...
  edit-config  Edit cluster configuration
  failover     Failover to a replica
  flush        Flush scheduled events
  list         List the adbhamgr members for a given adbhamgr
  pause        Disable auto failover
  query        Query a adbhamgr PostgreSQL member
  reinit       Reinitialize cluster member
  reload       Reload cluster member configuration
  remove       Remove cluster from DCS
  restart      Restart cluster member
  resume       Resume auto failover
  scaffold     Create a structure for the cluster in DCS
  show-config  Show cluster configuration
  switchover   Switchover to a replica
  version      Output version of adbhamgrctl command or a running adbhamgr...

集中式高可用支持查看整个集群的状态,通过查询结果确认集群或者单个主机的运行状态是否正常。该命令在集群中的任意一个主机上执行,结果都一样。

#集群状态查询命令:
adbhamgrctl -c /etc/adbhamgr.yml list

例如,下面命令执行后,发现集群中三个节点 Leader、Sync Standby、Replica 都存在,且State为running,说明该集群处于正常状态。
antdb@adb06:~$ adbhamgrctl -c /etc/adbhamgr.yml list
+ Cluster: antdb-cluster (7348278630800196973) -----+---------+-----+-----------+
| Member      | Host                 | Role         | State   | TL  | Lag in MB |
+-------------+----------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 192.168.10.101:55551 | Replica      | running | 188 |         0 |
| adbhamgr-02 | 192.168.10.106:55551 | Sync Standby | running | 188 |         0 |
| adbhamgr-03 | 192.168.10.103:55551 | Leader       | running | 188 |           |
+-------------+----------------------+--------------+---------+-----+-----------+

参数说明

字段字段含义字段值
Member集群中的节点成员名称在 adbhamgr.yml 文件中自定义
Host集群中节点的 IP 和端口号在 adbhamgr.yml 文件中设置,形式是 IP:PORT
Role集群中节点的角色属性Leader:主节点;Sync Standby:同步备节点;Replica:异步备节点
State当前节点的状态running:运行中;crashed:节点奔溃中;creating replica:创建中;starting:启动中;stopped:节点停止
TL“时间线”(Timeline)每当归档文件恢复完成后,创建一个新的时间线用来区别新生成的 WAL 记录。
Lag in MB节点之间相互同步的偏移量正常为 0,代表主备之间同步成功。主节点压数据的时候,备节点还没及时同步则会出现大于 0的数值。
Pending restart等待重新启动如果存在需要重启的节点,该列才会出现,用‘*’表示
Cluster集群名称,如 Cluster: antdb-cluster,代表这个集群名称是 antdb-cluster在 adbhamgr.yml 文件中自定义

主备切换

数据库在运行过程中,数据库管理员可能需要手工对数据库节点做主备切换。例如发现数据库节点主备 failover 后需要恢复原有的主备角色,或怀疑硬件故障需要手动进行主备切换。可以通过 switchover 或 failover,手动实现主备切换。

操作步骤

非故障切换:使用命令 adbhamgrctl -c /etc/adbhamgr.yml switchover 进行手动切换主备。

[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml switchover
Master [adbhamgr-02]: adbhamgr-02                                            #输入当前主节点
Candidate ['adbhamgr-01', 'adbhamgr-03'] []: adbhamgr-01                     #输入当前同步备节点(Sync Standby)
When should the switchover take place (e.g. 2022-12-27T12:14 )  [now]:
Current cluster topology
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member      | Host               | Role         | State   |  TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Sync Standby | running | 465 |         0 |
| adbhamgr-02 | 10.19.36.206:55551 | Leader       | running | 465 |           |
| adbhamgr-03 | 10.19.36.207:55551 | Replica      | running | 465 |         0 |
+-------------+--------------------+--------------+---------+-----+-----------+
Are you sure you want to switchover cluster antdb-cluster, demoting current master adbhamgr-02? [y/N]: y

#查看主备切换结果:
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml list
+ Cluster: antdb-cluster (7348278630800196973) ---+----------+-----+-----------+
| Member      | Host               | Role         |  State   | TL  | Lag in MB |
+-------------+--------------------+--------------+----------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Sync Standby | running  | 465 |         0 |
| adbhamgr-02 | 10.19.36.206:55551 | Leader       | stopping |     |           |
| adbhamgr-03 | 10.19.36.207:55551 | Replica      | running  | 465 |         0 |
+-------------+--------------------+--------------+----------+-----+-----------+

#Leader由adbhamgr-02切换到了adbhamgr-01
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml list
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member      | Host               | Role         | State   |  TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Leader       | running | 466 |           |
| adbhamgr-02 | 10.19.36.206:55551 | Replica      | running | 466 |         0 |
| adbhamgr-03 | 10.19.36.207:55551 | Sync Standby | running | 466 |         0 |
+-------------+--------------------+--------------+---------+-----+-----------+

故障切换:使用命令 adbhamgrctl -c /etc/adbhamgr.yml failover 进行手动切换主备。

[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml failover
Candidate ['adbhamgr-02', 'adbhamgr-03'] []: adbhamgr-03            #输入当前同步备节点(Sync Standby)              
Current cluster topology
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member      | Host               | Role         | State   |  TL | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Leader       | running | 466 |           |
| adbhamgr-02 | 10.19.36.206:55551 | Replica      | running | 466 |         0 |
| adbhamgr-03 | 10.19.36.207:55551 | Sync Standby | running | 466 |         0 |
+-------------+--------------------+--------------+---------+-----+-----------+
Are you sure you want to failover cluster antdb-cluster, demoting current master adbhamgr-01? [y/N]: y

#查看主备切换结果:
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml list
+ Cluster: antdb-cluster (7348278630800196973) ---+----------+-----+-----------+
|  Member     | Host               | Role         | State    |  TL | Lag in MB |
+-------------+--------------------+--------------+----------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Leader       | stopping |     |           |
| adbhamgr-02 | 10.19.36.206:55551 | Replica      | running  | 466 |         0 |
| adbhamgr-03 | 10.19.36.207:55551 | Sync Standby | running  | 466 |         0 |
+-------------+--------------------+--------------+----------+-----+-----------+
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml list
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
|  Member     | Host               | Role         | State   | TL  | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Replica      | stopped |     |   unknown |
| adbhamgr-02 | 10.19.36.206:55551 | Sync Standby | running | 467 |         0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader       | running | 467 |           |
+-------------+--------------------+--------------+---------+-----+-----------+

#Leader由adbhamgr-01切换到了adbhamgr-03
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml list
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
|  Member     | Host               | Role         | State   | TL  | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.29:55551  | Replica      | running | 467 |         0 |
| adbhamgr-02 | 10.19.36.206:55551 | Sync Standby | running | 467 |         0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader       | running | 467 |           |
+-------------+--------------------+--------------+---------+-----+-----------+

重新初始化节点

adbhamgrctl 的 reinit 后跟集群名称,并选择对应的节点,可以重新初始化集群的某节点。--force 能强制重新初始化。

# 可以在交互式选项里面选择需要重启的节点
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml reinit antdb-cluster
+ Cluster: antdb-cluster (7348278630800196973) ---+---------+-----+-----------+
| Member      | Host               | Role         | State   | TL  | Lag in MB |
+-------------+--------------------+--------------+---------+-----+-----------+
| adbhamgr-01 | 10.19.28.129:55551 | Replica      | running | 467 |         0 |
| adbhamgr-02 | 10.19.36.206:55551 | Sync Standby | running | 467 |         0 |
| adbhamgr-03 | 10.19.36.207:55551 | Leader       | running | 467 |           |
+-------------+--------------------+--------------+---------+-----+-----------+
Which member do you want to reinitialize [adbhamgr-02, adbhamgr-03, adbhamgr-01]? []: adbhamgr-01
Are you sure you want to reinitialize members adbhamgr-01? [y/N]: y
Success: reinitialize for member adbhamgr-01

# 也可以在命令行直接输入需要重新初始化的节点,--force能强制重新初始化。
[antdb@host-10-19-28-129 ~]$ adbhamgrctl -c /etc/adbhamgr.yml reinit antdb-cluster adbhamgr-01 --force
Success: reinitialize for member adbhamgr-01

参考

ADBDCS 常用操作

集群信息查询

用 adbdcsctl 命令对 adbdcs 集群做如下操作,进行 adbdcs 集群的维护和查询。

NAME:
   adbdcsctl - A simple command line client for adbdcs.

WARNING:
   Environment variable adbdcsCTL_API is not set; defaults to adbdcsctl v2.
   Set environment variable adbdcsCTL_API=3 to use v3 API or adbdcsCTL_API=2 to use v2 API.

USAGE:
   adbdcsctl [global options] command [command options] [arguments...]

VERSION:
   3.3.18

COMMANDS:
     backup          backup an adbdcs directory
     cluster-health  check the health of the adbdcs cluster
     mk              make a new key with a given value
     mkdir           make a new directory
     rm              remove a key or a directory
     rmdir           removes the key if it is an empty directory or a key-value pair
     get             retrieve the value of a key
     ls              retrieve a directory
     set             set the value of a key
     setdir          create a new directory or update an existing directory TTL
     update          update an existing key with a given value
     updatedir       update an existing directory
     watch           watch a key for changes
     exec-watch      watch a key for changes and exec an executable
     member          member add, remove and list subcommands
     user            user add, grant and revoke subcommands
     role            role add, grant and revoke subcommands
     auth            overall auth controls
     help, h         Shows a list of commands or help for one command

GLOBAL OPTIONS:
   --debug                          output cURL commands which can be used to reproduce the request
   --no-sync                        don't synchronize cluster information before sending request
   --output simple, -o simple       output response in the given format (simple, `extended` or `json`) (default: "simple")
   --discovery-srv value, -D value  domain name to query for SRV records describing cluster endpoints
   --insecure-discovery             accept insecure SRV records describing cluster endpoints
   --peers value, -C value          DEPRECATED - "--endpoints" should be used instead
   --endpoint value                 DEPRECATED - "--endpoints" should be used instead
   --endpoints value                a comma-delimited list of machine addresses in the cluster (default: "http://127.0.0.1:2379,http://127.0.0.1:4001")
   --cert-file value                identify HTTPS client using this SSL certificate file
   --key-file value                 identify HTTPS client using this SSL key file
   --ca-file value                  verify certificates of HTTPS-enabled servers using this CA bundle
   --username value, -u value       provide username[:password] and prompt if password is not supplied.
   --timeout value                  connection timeout per request (default: 2s)
   --total-timeout value            timeout for the command execution (except watch) (default: 5s)
   --help, -h                       show help
   --version, -v                    print the version
  • 使用 member list 选项查看 adbdcs 集群中的节点成员情况:
# 下述命令中--endpoints需要指定集群的计算机地址列表。其中127.0.0.1代表本机,12379为端口号。
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 member list
338f9fdae9331534: name=adbdcs-2 peerURLs=http://10.21.10.242:12380 clientURLs=http://10.21.10.242:12379,http://127.0.0.1:12379 isLeader=true
9ab50241714c014f: name=adbdcs-3 peerURLs=http://10.21.10.243:12380 clientURLs=http://10.21.10.243:12379,http://127.0.0.1:12379 isLeader=false
d97b22cbde6ee848: name=adbdcs-1 peerURLs=http://10.21.10.241:12380 clientURLs=http://10.21.10.241:12379,http://127.0.0.1:12379 isLeader=false
  • 使用 cluster-health 选项查看 adbdcs 集群中的健康状况:
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 cluster-health
member 338f9fdae9331534 is healthy: got healthy result from http://10.21.10.242:12379
member 9ab50241714c014f is healthy: got healthy result from http://10.21.10.243:12379
member d97b22cbde6ee848 is healthy: got healthy result from http://10.21.10.241:12379
  • 使用 ls 选项查看 adbdcs 集群中的数据目录结构:
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 ls
/service
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 ls /service
/service/antdbcluster
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 ls /service/antdbcluster
/service/antdbcluster/sync
/service/antdbcluster/config
/service/antdbcluster/status
/service/antdbcluster/history
/service/antdbcluster/members
/service/antdbcluster/initialize
/service/antdbcluster/leader
  • 使用 get 选项获取 adbdcs 集群中的存储的节点信息:
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 ls /service/antdbcluster/members
/service/antdbcluster/members/adbhamgr-2
/service/antdbcluster/members/adbhamgr-3
/service/antdbcluster/members/adbhamgr-1
[antdb@localhost ~]$ adbdcsctl --endpoints=http://127.0.0.1:12379 get /service/antdbcluster/members/adbhamgr-1
{"conn_url":"postgres://10.21.10.241:55551/postgres","api_url":"http://10.21.10.241:8008/adbhamgr","state":"running","role":"master","version":"2.1.5","is_far_sync":false,"xlog_location":83886968,"timeline":4}
问题反馈