前言
MasterSlave 的数据库机构解决了很多问题特别是read/write 比较高的应用结构如图
写操作全部在Master 结点执行并由Slave 数据库结点定时(默认s)读取Master 的binlog
将众多的用户读请求分散到更多的数据库节点从而减轻了单点的压力
它的缺点是
Slave 实时性的保障对于实时性很高的场合可能需要做一些处理
高可用性问题Master 就是那个致命点(SPOF:Single point of failure)
本文主要讨论的是如何解决第 个缺点
解决方案如下图
使用两个MySQL 主库mastermaster数据存在共享设备上用heartbeat 进行监控当master 发生故障时将资源切换到master
故障发生后无需对slave 进行修改slave 自动切到master(断电切换需要手工同步slave)
网络设置
磁盘设置
mysql 安装
到官方网址 下载最新版本mysql找到对应的包在haha 分别安装
# cd /usr/local/
# tar xzvf mysqllinuxx_glibctargz
# mv mysqllinuxx_glibc mysql
# groupadd mysql
# useradd g mysql mysql
#passwd mysql
修改/etc/mycf数据文件路径设置在共享磁盘修改相关参数这 个参数在haha上必须一样
datadir=/u/data #数据文件路径
serverid = #数据库ID
logbin=/u/data/master # binlog 路径
下面步骤先在ha 上执行然后在ha 执行
先在ha 把磁盘mount
[root@ha mysql]# mount /dev/sdb /u
[root@ha u]# mkdir data
[root@ha u]# chown R mysqlmysql data/
安装mysql
[root@ha u]# cd /usr/local/mysql
# /scripts/mysql_install_db user=mysql
# cp supportfiles/mysqlserver /etc/rcd/initd/mysqld
# chmod +x /etc/rcd/initd/mysqld
# chkconfig add mysqld
# /etc/rcd/initd/mysqld start
在ha 把共享磁盘umount在ha 上mount把上面的数据删除后执行 步骤完成后ha 也umount
在slave 上安装mysql数据放在slave 本地过程略
heartbeat 安装
官方网址分别在haha 安装确认下列包安装
[root@ha ~]# rpm ivh libnetx_rpm
[root@ha ~]# rpm ivh heartbeatpilsx_rpm
[root@ha ~]# rpm ivh heartbeatstonithx_rpm
[root@ha ~]# rpm ivh perlTimeDateelnoarchrpm
[root@ha ~]# rpm ivh heartbeatx_rpm
[root@ha ~]# rpm ivh heartbeatdevelx_rpm
开始编辑配置文件(haha 都执行)
[root@ha local]# cp /usr/share/doc/packages/heartbeat/hacf /etc/had/
[root@ha local]# cp /usr/share/doc/packages/heartbeat/authkeys /etc/had/
编辑/etc/had/authkeys使用的是第 种认证方式(crc)接着把文件的权限改为
cat /etc/had/authkeys
显示
auth
crc
更改文件权限
chmod /etc/had/authkeys
[root@ha ~]# cat /etc/had/hacf
debugfile /var/log/hadebug
logfile /var/log/halog
logfacility local
keepalive
deadtime
warntime
udpport
crm yes
node ha ha
bcast eth
auto_failback off
apiauth cibmon uid=hacluster
respawn hacluster /usr/lib/heartbeat/cibmon –d
配置资源共享IP共享磁盘MYSQL 个服务组成 组资源
[root@ha ~]# cat /etc/had/haresources
ha Filesystem::/dev/sdb::/u::ext mysqld
启动的时候从左到右依次运行脚本关闭的时候从右到左依次关闭
这个文件原名为haresources 在x 上使用不过为了区别使用此名称
将资源文件转换成cibxmlx 里编译好后自带有转换脚本
[root@ha ~]# cd /var/lib/heartbeat/crm/
[root@ha crm]# rm rf cibxml*
[root@ha crm]#
/usr/lib/heartbeat/haresourcescibpy stout c /etc/had/hacf /etc/had/haresources
[root@ha crm]# cat cibxml|grep mysql
<primitive class=lsb id=mysqld_ provider=heartbeat type=mysqld>
<op id=mysqld__mon interval=s name=monitor timeout=s/>
即每 秒检测资源运行情况如果发现资源不在则尝试启动资源如果s 后还未启动
成功则资源切换向另节点可根据业务进行修改
启动heartbeat在ha 和ha 都启动
[root@ha ~]# /etc/initd/heartbeat start
查看资源情况
============
Last updated: Tue Feb ::
Current DC: ha (cdebcbbbbeef)
Nodes configured
Resources configured
============
Node: ha (cdebcbbbbeef): online
Node: ha (abdfbeacfcfc): online
Resource Group: group_
IPaddr____ (ocf::heartbeat:IPaddr): Started ha
Filesystem_ (ocf::heartbeat:Filesystem): Started ha
mysqld_ (lsb:mysqld): Started ha
将heartbeat 设置成开机自动重启
[root@ha ~]# chkconfig add heartbeat
[root@ha ~]# chkconfig level heartbeat on
[root@ha ~]# chkconfig list heartbeat
heartbeat :off :off :on :on :on :on :off
mysql slave 配置
在master 赋予slave 权限(任一节点操作)
[root@ha ~]# /usr/local/mysql/bin/mysql u root p
Enter password:
Welcome to the MySQL monitor Commands end with ; or \g
Your MySQL connection id is
Server version: log MySQL Community Server (GPL)
Type help; or \h for help Type \c to clear the buffer
mysql> GRANT RELOAD REPLICATION SLAVE ON ** TO slave@
IDENTIFIED BY nslave;
mysql> show master status\G;
*************************** row ***************************
File: master
Position:
Binlog_Do_DB:
Binlog_Ignore_DB:
row in set ( sec)
slave 同步
因为没有数据所以省略数据同步的步骤在slalve 执行下列命令
mysql> CHANGE MASTER TO MASTER_HOST=
MASTER_PORT =
MASTER_USER=slave
MASTER_PASSWORD=nslave
MASTER_LOG_FILE=master
MASTER_LOG_POS = ;
Query OK rows affected ( sec)
mysql> start slave;
Query OK rows affected ( sec)
mysql> show slave status\G;
*************************** row ***************************
Slave_IO_State:Waiting for master to send event
Master_Host:
Master_User: slave
Master_Port:
Connect_Retry:
Master_Log_File: master
Read_Master_Log_Pos:
Relay_Log_File: mysqlrelaybin
Relay_Log_Pos:
Relay_Master_Log_File: master
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
row in set ( sec)
失败测试
网络切换测试
将ha 的网线拔掉在ha 看资源的切换情况
Node: ha (cdebcbbbbeef): online
Node: ha (abdfbeacfcfc): OFFLINE
Resource Group: group_
IPaddr____ (ocf::heartbeat:IPaddr): Started ha
Filesystem_ (ocf::heartbeat:Filesystem): Started ha
mysqld_ (lsb:mysqld): Started ha
没有问题把ha 网线插上资源又回切到ha
Node: ha (cdebcbbbbeef): online
Node: ha (abdfbeacfcfc): online
Resource Group: group_
IPaddr____ (ocf::heartbeat:IPaddr): Started ha
Filesystem_ (ocf::heartbeat:Filesystem): Started ha
mysqld_ (lsb:mysqld): Started ha
宕机切换
将ha 强制关机在开机这和 的情况差不多能正常切换
服务切换
将ha 的mysql 服务强制中断修改/etc/mycf
datadir=/u/data #实际这个路径不存在
[root@ha ~]# ps ef|grep mysql
root : ? :: /bin/sh /bin/mysqld_safe
datadir=/u/data pidfile=/u/data/hapid
mysql : ? :: /usr/local/mysql/bin/mysqld
basedir=/usr/local/mysql datadir=/u/data user=mysql logerror=/u/data/haerr
pidfile=/u/data/hapid socket=/tmp/mysqlsock port=
root : ? :: /bin/sh /etc/initd/mysqld start
root : pts/ :: grep mysql
[root@ha ~]# kill
[root@ha ~]# ps ef|grep mysql
root : pts/ :: grep mysql
在ha 上查看资源情况已经切换过来
Node: ha (cdebcbbbbeef): online
Node: ha (abdfbeacfcfc): online
Resource Group: group_
IPaddr____ (ocf::heartbeat:IPaddr): Started ha
Filesystem_ (ocf::heartbeat:Filesystem): Started ha
mysqld_ (lsb:mysqld): Started ha
Failed actions:
mysqld__monitor_ (node=ha call= rc=): complete
mysqld__start_ (node=ha call= rc=): complete
slave 同步测试
将ha 的heartbeat 关闭
[root@ha data]# /etc/initd/heartbeat stop
在ha 看资源情况
Node: ha (cdebcbbbbeef): online
Node: ha (abdfbeacfcfc): OFFLINE
Resource Group: group_
IPaddr____ (ocf::heartbeat:IPaddr): Started ha
Filesystem_ (ocf::heartbeat:Filesystem): Started ha
mysqld_ (lsb:mysqld): Started ha
资源全部切换到ha我们执行一些操作看是否同步
[root@ha ~]# /usr/local/mysql/bin/mysql u root p D test
Enter password:
Welcome to the MySQL monitor Commands end with ; or \g
Your MySQL connection id is
Server version: log MySQL Community Server (GPL)
Type help; or \h for help Type \c to clear the buffer
root@test >create table t(t int);
root@test >insert into t values();
root@test >select * from t;
++
| t |
++
| |
++
row in set ( sec)
在slave 查看是否同步
[root@mysql ~]# /root/cron/lgsh
Welcome to the MySQL monitor Commands end with ; or \g
Your MySQL connection id is
Server version: log MySQL Community Server (GPL)
Type help; or \h for help Type \c to clear the buffer
mysql> use test
Database changed
mysql> select * from t;
++
| t |
++
| |
++
row in set ( sec)
在将资源切回到ha在进行操作看slave 的情况
[root@ha ~]# /etc/initd/heartbeat start
Node: ha (cdebcbbbbeef): online
Node: ha (abdfbeacfcfc): online
Resource Group: group_
IPaddr____ (ocf::heartbeat:IPaddr): Started ha
Filesystem_ (ocf::heartbeat:Filesystem): Started ha
mysqld_ (lsb:mysqld): Started ha
资源已经切回到ha对其进行操作
[root@ha ~]# /usr/local/mysql/bin/mysql u root p D test
Enter password:
Welcome to the MySQL monitor Commands end with ; or \g
Your MySQL connection id is
Server version: log MySQL Community Server (GPL)
Type help; or \h for help Type \c to clear the buffer
mysql> insert into t values();
mysql> insert into t values();
mysql> select * from t;
++
| t |
++
| |
| |
| |
++
rows in set ( sec)
在slave 查看是否同步
mysql> select * from t;
++
| t |
++
| |
| |
| |
++
rows in set ( sec)
mysql> show slave status\G;
*************************** row ***************************
Slave_IO_State:Waiting for master to send event
Master_Host:
Master_User: slave
Master_Port:
Connect_Retry:
Master_Log_File: master
Read_Master_Log_Pos:
Relay_Log_File: mysqlrelaybin
Relay_Log_Pos:
Relay_Master_Log_File: master
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Master 切换后slave 无需任何人工介入自动同步
模拟写入切换测试
写个简单的循环
[root@mysql cron]# cat testsh
for ((num=;num<;num=num+))
do
echo
/usr/local/mysql/bin/mysql u sg psg h D test e
insert into t values($num);
if (( $? )); then
echo $num:no
else
echo $num:ok
fi
done
在写入过程中手工切换手工reboot 等操作和 的测试结果一样在写入过程中强制ha 断电需要手工处理slave 同步将ha 断电后资源切到ha 后在slave 会报错
mysql> show slave status\G;
*************************** row ***************************
Slave_IO_State:
Master_Host:
Master_User: slave
Master_Port:
Connect_Retry:
Master_Log_File: master
Read_Master_Log_Pos:
Relay_Log_File: mysqlrelaybin
Relay_Log_Pos:
Relay_Master_Log_File: master
Slave_IO_Running: No
Slave_SQL_Running: Yes
查看slave 报错日志
:: [Note] Slave I/O thread: Failed reading log event reconnecting to retry log
master at postion
:: [ERROR] Error reading packet from server: Client requested master to start
replication from impossible position ( server_errno=)
:: [ERROR] Got fatal error : Client requested master to start replication from
impossible position from master when reading data from binary log
:: [Note] Slave I/O thread exiting read up to log master position
在master 中找不到position
查看master 的binlog
[root@ha data]# /usr/local/mysql/bin/mysqlbinlog master|tail
/*!*/;
# at
# :: server id end_log_pos Query thread_id=
exec_time= error_code=
SET TIMESTAMP=/*!*/;
insert into t values()
/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*! SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
在master 上master 最后值是而在slave 却是Slave 的Log_Pos比master 大应该是在断电的时候binlog 有部分没有及时写入磁盘但从库已经读入我们需要在slave 执行change maseter 操作MASTER_LOG_FILE 在原来的基础上加Log_Pos 为
mysql> stop slave;
Query OK rows affected ( sec)
mysql> CHANGE MASTER TO MASTER_HOST=MASTER_PORT =
MASTER_USER
=slaveMASTER_PASSWORD=nslaveMASTER_LOG_FILE=masterMAST
ER_LOG_POS = ;
Query OK rows affected ( sec)
mysql> start slave;
Query OK rows affected ( sec)
mysql> show slave status\G;
*************************** row ***************************
Slave_IO_State:Waiting for master to send event
Master_Host:
Master_User: slave
Master_Port:
Connect_Retry:
Master_Log_File: master
Read_Master_Log_Pos:
Relay_Log_File: mysqlrelaybin
Relay_Log_Pos:
Relay_Master_Log_File: master
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
在这次断电的切换过程中将会有 减去 条binlog 操作在slave 存在而在masert 没有在后面的复制中可能会产生错误不过这些错误影响不大如果对数据的同步要求很严格可以在主库添加参数sync_binlog=这样最多将只会有 条数据错误不过这样将影响mysql 写性能
heartbeat crm 常用命令
查看资源状态
#crm_mon –i
查看节点资源
#crm_resource L
查看资源在那个节点上运行
# crm_resource W r mysqld_
启动/停止资源
#crm_resource r mysqld_ p target_role v started
#crm_resource r mysqld _p target_role v stopped
将资源组从当前节点转移到另个节点
#crm_resource M r group_
将资源组转移到指定节点
#crm_resource M r group_ H ha
允许资源组回到正常的节点
#crm_resource U r group_
将资源从CRM 中删除
#crm_resource D r mysqld_ t primitive
将资源组从CRM 中删除
#crm_resource D –r group_ t group
将资源从CRM 中禁用
#crm_resource p is_managed r mysqld_ t primitive v off
将资源从新从CRM 中启用
#crm_resource p is_managed r mysqld _t primitive v on
重启资源
#crm_resource C H ha r mysqld_
检查所有节点上未在CRM 中的资源
#crm_resource P
检查指定节点上未在CRM 中的资源
#crm_resource P H ha
检查所有节点上未在CRM 中的资源
#crm_resource P
检查指定节点上未在CRM 中的资源
#crm_resource P H ha
八讨论
对写要求不是很高的应用我觉得可以考虑用NFS 来代替共享存储设备结构图如下
将数据文件放在NFS 上这样几台廉价的PC 机就能实现相对成本降低
参数sync_binlog=对写的性能有多少影响个人始终觉得会带来很大的写性能问题在数据完整和性能之间做个平衡有得有失吧
此方案相对网上流传的MasterMaster Replication 方案个人觉得实施起来相对简单维护也相对简单缺点是需要一个共享设备备机在处于空闲状态在master 断电的时候MMM 方案也应该存在主机的内存binlog 丢失的问题当然这些我并没有测试过只是自己的推测