−Table of Contents

Procedure: Cluster monitoring

Procedure: Cluster monitoring

Scope

Testing and Commissioning Procedure of Cluster

Description

A server cluster is composed of 2 rigorously identical servers configured in normal / backup high availability. The first server in normal mode is called “primary”, the backup server is called “secondary”.

Prerequisites

At a minimum, each server uses 3 network adapters configured as follows:

ETH1 = Main Network Interface = IP_Server
ETH2 = bridged network interface for virtual machines = IP_Br0
ETH3 = “Private” server synchronization network interface, direct link between the cluster nodes.

On HP servers, the HP_ILO management interface for monitoring the machine can be set to benefit from the information of the server's physical state (see ILO monitoring documentation).

The 2 servers are connected to each other by a link allowing to have the servers in 2 different and distant technical premises to ensure the physical integrity of the equipment and the non-propagation of a physical damage on one of the two rooms.

Connection Schema

Functioning of the cluster

The Linux services used for the Cluster are:

Drbd = data replication between disk spaces
Corosync = Configuration and scheduling of Cluster services
Peacemaker = Monitoring cluster services

The services configured and monitored by the Cluster are:

Apache = Web server
MySQL = Database
Samba = File Sharing
Libvirtd = KVM Virtualization Engine
Libvirtguest = Virtualization Management Tools
IP Cluster / Route Cluster = Active Network Node

All Linux services are controlled by Corosync, do not use the standard services of Linux daemons, do not use “services” or “systemctl” commands or automatic scripts like “samba”. Any activation of the services by this type of command cancels the system monitoring by peacemaker and corosync.

Server's supervision web page

Goto « http://ip_server/web/system/ezmonitor »

Checking the synchronization of cluster data

In a terminal or by ssh access on one of the cluster nodes use the drbd-overview command

Here the 2 primary and secondary servers are perfectly synchronized at the data level since the status UpToDate is effective on both servers.

Primary / Secondary Uptodate / Uptodate shows the synchronization status of the 2 nodes of the cluster.

In case the DRBD service is not started correctly (Cluster out of service), it is possible to restart the server data synchronization service via the following command:

# service drbdserv –full-restart

Validation of the correct functioning of the cluster (Corosync)

To know the state of the services managed by the cluster via a terminal or by access ssh, use the command crm status

The command returns the configuration and cluster status

Description of the configuration file

The first block indicates the state of the cluster

Last updated: Sun Sep 23 08:21:21 2018

Last change: Tue Aug 28 09:42:27 2018 via crm_attribute on dzacupsvr2

Stack: corosync

Current DC: dzacupsvr2 (34212362) - partition with quorum

Version: 1.1.7-2.mga1-ee0730e13d124c3d58f00016c3376a1de5323cff

2 Nodes configured, unknown expected votes

11 Resources configured.

The second block tells you which is the primary node, and where are the services

Online: [ dzacupsvr dzacupsvr2 ]

Resource Group: services

samba (lsb:smb): Started dzacupsvr

apache (ocf::heartbeat:apache): Started dzacupsvr

mysql (ocf::heartbeat:mysql): Started dzacupsvr

libvirtd (lsb:libvirtd): Started dzacupsvr

libvirt-guests (lsb:libvirt-guests): Started dzacupsvr

Master/Slave Set: drbdservClone [drbdserv]

Masters: [ dzacupsvr ]

Slaves: [ dzacupsvr2 ]

fsserv (ocf::heartbeat:Filesystem): Started dzacupsvr

Resource Group: iphd

clusterip (ocf::heartbeat:IPaddr2): Started dzacupsvr

clusterroute (ocf::heartbeat:Route): Started dzacupsvr

⇒ the 2 servers are “online”, and each service is operational on the primary.

Verifying the correct functioning of the cluster

See the cluster configuration, use the following command:

# crm configure show

Example of a configuration file of the Abidjan cluster:

Commands for verifying the correct functioning of the cluster

for example « abjairsvr2 »

DESIRED Action	SYSTEM Command
Checking the cluster status	service corosync status
See cluster nodes	crm node
See the cluster configuration	crm configure show
Edit cluster configuration	crm configure edit
Put a cluster node in standby time to change a configuration	crm node standby abjairsvr2
Put back in service a node of the cluster (here secondary of abidjan)	crm node online abjairsvr2
Change a cluster configuration parameter	crm configure rsc_defaults resource-stickiness=100
View the status of a cluster service	crm resource libvirt-guests status
Purge a cluster service that does not start	crm resource cleanup libvirt-guests
Check whether or not a split brain exists (service that has migrated to a non-operational node)	grep “split-brain” /var/log/syslog
Move a service from one node to another (in the case of a split brain)	crm resource move libvirt-guests abjairsvr2
Reattach a service to the cluster	crm resource manage libvirt-guests
Check that the configuration files are identical between the nodes of a server	crm cluster diff /etc/samba/smb.conf

Verification of cluster management tools

DESIRED Action	SYSTEM Commands
See cluster nodes	systemctl status pacemaker
See the cluster configuration	systemd-analyze verify pacemaker.service
Edit cluster configuration	systemctl pacemaker.service reload
Put a cluster node in standby time to change a configuration	systemd-delta pacemaker.service
Put back in service a node of the cluster (here secondary of abidjan)	journalctl -u pacemaker	more