This is an old revision of the document!
FAQ : Cluster monitoring
Scope
Testing and Commissioning Procedure of Cluster
Description
A server cluster is composed of 2 rigorously identical servers configured in normal / backup high availability. The first server in normal mode is called “primary”, the backup server is called “secondary”.
Prerequisites
At a minimum, each server uses 3 network adapters configured as follows:
- ETH1 = Main Network Interface = IP_Server
- ETH2 = bridged network interface for virtual machines = IP_Br0
- ETH3 = “Private” server synchronization network interface, direct link between the cluster nodes.
On HP servers, the HP_ILO management interface for monitoring the machine can be set to benefit from the information of the server's physical state (see ILO monitoring documentation).
The 2 servers are connected to each other by a link allowing to have the servers in 2 different and distant technical premises to ensure the physical integrity of the equipment and the non-propagation of a physical damage on one of the two rooms.
Connection Schema
Functioning of the cluster
The Linux services used for the Cluster are:
- Drbd = data replication between disk spaces
- Corosync = Configuration and scheduling of Cluster services
- Peacemaker = Monitoring cluster services
The services configured and monitored by the Cluster are:
- Apache = Web server
- MySQL = Database
- Samba = File Sharing
- Libvirtd = KVM Virtualization Engine
- Libvirtguest = Virtualization Management Tools
- IP Cluster / Route Cluster = Active Network Node
Server's supervision web page
Checking the synchronization of cluster data
In a terminal or by ssh access on one of the cluster nodes use the drbd-overview
command
Here the 2 primary and secondary servers are perfectly synchronized at the data level since the status UpToDate
is effective on both servers.
Primary / Secondary Uptodate / Uptodate shows the synchronization status of the 2 nodes of the cluster.
In case the DRBD service is not started correctly (Cluster out of service), it is possible to restart the server data synchronization service via the following command:
# service drbdserv –full-restart
VALIDATION OF THE CORRECT FUNCTIONING OF THE CLUSTER (COROSYNC)
To know the state of the services managed by the cluster via a terminal or by access ssh, use the command crm status
The command returns the configuration and cluster status
DESCRIPTION OF THE CONFIGURATION FILE
The first block indicates the state of the cluster
Last updated: Sun Sep 23 08:21:21 2018
Last change: Tue Aug 28 09:42:27 2018 via crm_attribute on dzacupsvr2
Stack: corosync
Current DC: dzacupsvr2 (34212362) - partition with quorum
Version: 1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, unknown expected votes
11 Resources configured.
The second block tells you which is the primary node, and where are the services
Online: [ dzacupsvr dzacupsvr2 ]
Resource Group: services
samba (lsb:smb): Started dzacupsvr
apache (ocf::heartbeat:apache): Started dzacupsvr
mysql (ocf::heartbeat:mysql): Started dzacupsvr
libvirtd (lsb:libvirtd): Started dzacupsvr
libvirt-guests (lsb:libvirt-guests): Started dzacupsvr
Master/Slave Set: drbdservClone [drbdserv]
Masters: [ dzacupsvr ]
Slaves: [ dzacupsvr2 ]
fsserv (ocf::heartbeat:Filesystem): Started dzacupsvr
Resource Group: iphd
clusterip (ocf::heartbeat:IPaddr2): Started dzacupsvr
clusterroute (ocf::heartbeat:Route): Started dzacupsvr
⇒ the 2 servers are “online”, and each service is operational on the primary.
VERIFYING THE CORRECT FUNCTIONING OF THE CLUSTER
See the cluster configuration, use the following command:
# crm configure show
Example of a configuration file of the Abidjan cluster:
COmmands FOR VERIFYING THE CORRECT FUNCTIONING OF THE CLUSTER
for example « abjairsvr2 »
DESIRED Action | SYSTEM Command |
Checking the cluster status | service corosync status |
See cluster nodes | crm node |
See the cluster configuration | crm configure show |
Edit cluster configuration | crm configure edit |
Put a cluster node in standby time to change a configuration | crm node standby abjairsvr2 |
Put back in service a node of the cluster (here secondary of abidjan) | crm node online abjairsvr2 |
Change a cluster configuration parameter | crm configure rsc_defaults resource-stickiness=100 |
View the status of a cluster service | crm resource libvirt-guests status |
Purge a cluster service that does not start | crm resource cleanup libvirt-guests |
Check whether or not a split brain exists (service that has migrated to a non-operational node) | grep “split-brain” /var/log/syslog |
Move a service from one node to another (in the case of a split brain) | crm resource move libvirt-guests abjairsvr2 |
Reattach a service to the cluster | crm resource manage libvirt-guests |
Check that the configuration files are identical between the nodes of a server | crm cluster diff /etc/samba/smb.conf |
VERIFICATION OF CLUSTER MANAGEMENT TOOLS
DESIRED Action | SYSTEM Commands | |
See cluster nodes | systemctl status pacemaker | |
See the cluster configuration | systemd-analyze verify pacemaker.service | |
Edit cluster configuration | systemctl pacemaker.service reload | |
Put a cluster node in standby time to change a configuration | systemd-delta pacemaker.service | |
Put back in service a node of the cluster (here secondary of abidjan) | journalctl -u pacemaker | more |