Axigen with Linux-HA and DRBD - Solution Deployment
From Axigen Wiki
yum install kmod-drbd83
(The drbd83 package should be pulled automatically as a dependency)
Configuration
After a successful installation of the DRBD package on both nodes, it has to be configured by editing the /etc/drbd.conf file. At the end, the same DRBD configuration must be present on both nodes.
Below is an example configuration for /etc/drbd.conf on both nodes, according to the data from the corresponding section.
global {
usage-count no;
}
resource AxigenData {
protocol C;
startup {
wfc-timeout 120;
degr-wfc-timeout 120;
}
disk {
on-io-error pass_on;
no-disk-barrier;
no-disk-flushes;
no-disk-drain;
}
syncer {
rate 2000;
al-extents 257;
}
net {
max-buffers 2048;
cram-hmac-alg "sha1";
shared-secret "%Ax1G3N^MAIL#Server?";
unplug-watermark 2048;
}
on n1.cl.axilab.local {
device /dev/drbd0;
disk /dev/sda5;
address 10.9.9.91:7788;
meta-disk internal;
}
on n2.cl.axilab.local {
device /dev/drbd0;
disk /dev/sda5;
address 10.9.9.92:7788;
meta-disk internal;
}
}
Disk setup
After you have successfully modified the DRBD configuration file, you may go to creating the DRBD virtual disk(s), which will store the Axigen data directory. The commands must be issued on both nodes.
First, in order to be able to work with DRBD, you have to load the drbd kernel module:
modprobe drbd
Next, you have to initialize the DRBD resource meta data. This needs to be done before a DRBD resource can be taken online for the first time, thus only on initial device creation:
drbdadm create-md AxigenData
A successful meta data initialization makes the DRBD resource ready for attaching to the backing device:
drbdadm attach AxigenData
At the end, perform DRBD connection to the peer device, by issuing:
drbdadm connect AxigenData
After doing the above disk setup on the second node, both nodes should show an Inconsistent/Inconsistent state in the output of the drbdadm role command or the contents of the /proc/drbd file at the ds: field.
Disk synchronization
The initial full synchronization of the two nodes must be performed on only one node, only on initial resource configuration, and only on the node you selected as the synchronization source. This node is the one you will consider the primary node in the future cluster setup. To perform this step, issue this command:
drbdadm -- --overwrite-data-of-peer primary AxigenData
After issuing this command, the initial full synchronization will commence. You will be able to monitor its progress via /proc/drbd. It may take some time depending on the size of the device and overall disk and network performance. The synchronization is logged with the following two syslog messages:
kernel: block drbd0: Began resync as SyncSource ... kernel: block drbd0: Resync done ...
File system creation
At this final point, you have to create the file system for the DRBD resource, at your choice.
In our example, the filesystem has been created as ext3:
mkfs.ext3 /dev/drbd0
Axigen
Package installation
Axigen must be installed on both nodes, just the package, without being started or configured.
After the package installation process finishes successfully, the axigen and axigenfilters init scripts must be disabled to start at system boot in all runlevels:
chkconfig --del axigen chkconfig --del axigenfilters
Init script
First, you need to disable the regular axigen init script to accidentally start, by editing its corresponding configuration file /etc/sysconfig/axigen and literally setting:
AXIGEN_BACKEND="<DISABLED>"
A hard link must be created, which will point to the original package init script:
ln -v /etc/init.d/axigen /etc/init.d/axigen-ha
Then, the corresponding configuration file must be created as /etc/sysconfig/axigen-ha, with the following configuration:
# The AXIGEN_BACKEND variable can be named as follows
AXIGEN_BACKEND="HA"
# The following AXIGEN_* variables are common on all nodes
AXIGEN_DATA_DIR="/var/clusterfs/data/axigen"
AXIGEN_PID_FILE="${AXIGEN_DATA_DIR}/run/axigen.pid"
AXIGEN_DAEMON_OPT="-W ${AXIGEN_DATA_DIR}"
AXIGEN_SSL_CERT="${AXIGEN_DATA_DIR}/axigen_cert.pem"
AXIGEN_SSL_DH="${AXIGEN_DATA_DIR}/axigen_dh.pem"
Storage preparation
On both nodes, you have to create a common mount point directory for the Axigen DRBD resource. In our example setup we have used the /var/clusterfs/data directory for mounting the AxigenData DRBD resource. Create the mount points on both nodes using:
mkdir -v /var/clusterfs/data
On the primary node, mount the DRBD resource, as follows:
mount /dev/drbd0 /var/clusterfs/data cp -rav /var/opt/axigen /var/clusterfs/data
Admin password
To be able to login in the Axigen administrative interfaces, you need to set the password for the top level administrative user, called admin. The following command helps you with this step:
/opt/axigen/bin/axigen -W /var/clusterfs/data/axigen -A your-password
Service start
By default, all enabled Axigen services will listen to the local loopback interface, 127.0.0.1. In order to be able to use the WebAdmin interface via the cluster floating IP address, you must set it on the corresponding network interface:
/sbin/ip -f inet addr add dev eth0 10.9.9.90
You can see it set if it appears in the output of the following command:
/sbin/ip -f inet addr show dev eth0
The output should be similar with the following one, on the first node:
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
inet 10.9.9.91/24 brd 10.9.9.255 scope global eth0
inet 10.9.9.90/24 scope global secondary eth0
Start the Axigen service with the following command:
service axigen-ha start
WebAdmin setup
Then, enable the WebAdmin listener on the cluster service floating IP address, using the CLI service:
$ telnet localhost 7000 Trying 127.0.0.1... Connected to localhost.localdomain (127.0.0.1). Escape character is '^]'. Welcome to AXIGEN's Command Line Interface You must login first. For a list of available commands, type HELP <login> user admin <password> your-password For a list of available commands, type HELP +OK: Authentication successful <#> config server +OK: command successful <server#> config webadmin +OK: command successful <server-webadmin#> add listener address 10.9.9.90:9000 +OK: command successful <server-webadmin-listener#> commit committing changes and switching back to previous context. +OK: command successful (listener is closed) <server-webadmin#> commit committing changes and switching back to previous context. +OK: command successful <server#> commit committing changes and switching back to previous context. +OK: command successful <#> save config +OK: command successful <#> quit WARNING: all changes made and not committed are lost connection to AXIGEN closing. +OK: have a nice day Connection closed by foreign host.
Point your browser at the failover domain corresponding floating IP address or its corresponding DNS name, for example http://10.9.9.90:9000 (http://mail.cl.axilab.local:9000), and try to login using the admin username and the password you have set earlier.
If the login has been successful, you can login and set the listeners for the Axigen services you want to use, including but not limited to: SMTP, IMAP, POP3, WebMail. Please use the floating cluster IP address for these services listener addresses, like for the WebAdmin service you have set above, for example 10.9.9.90.
Cleanup
You can safely stop the Axigen service and its related resources and continue with the cluster setup:
service axigen-ha stop /sbin/ip -f inet addr del dev eth0 10.9.9.90 umount /var/clusterfs/data service drbd stop
On the secondary node, the drbd daemon must also be stopped:
service drbd stop
Heartbeat
The Heartbeat project from Linux-HA.org had multiple development stages:
- Heartbeat 1.x (Legacy), with all the components integrated in the same package and a very simplistic configuration style (v1), allowing only two nodes to be clustered at the same time and lacking of resource monitoring.
- Heartbeat 2.x (CRM), with all the components integrated, including a CRM (cluster resource manager) component, up until version 2.1.4. The configuration file was based on XML, with few administrative GUI tools, raising the complexity of configuring the cluster.
- Heartbeat 3.x (Pacemaker), with all the main components having their own development cycle, with different versions, and including a command line administration interface. The CRM component was split off into a separate project called Pacemaker.
Although this guide refers to all the configuration styles, we recommend using the latest v3 style, due to older versions being currently unsupported. Also, the latest versions are well documented, both on Linux-HA and ClusterLabs sites.
Installation
Heartbeat 3
The latest version of Linux-HA and PaceMaker suites can be installed by following the installation instructions from the official ClusterLabs online documentation, on the installation page, or from the Linux-HA User's Guide, on the Heartbeat installation page.
Heartbeat 2
This version of the heartbeat package set is present in both CentOS Extras and EPEL repositories. After you setup yum to use any of them, you can install heartbeat and its dependencies, by issuing on both nodes:
yum install heartbeat
Configuration
v3, Pacemaker configuration style
In order to be able to configure the Pacemaker based cluster, there is a set of official online documents that contain very detailed instructions. You can find these documents at:
- Clusters from Scratch, ClusterLabs
- ClusterLabs Documentation Wiki page
- The Linux-HA User's Guide, publican edition
- The Linux-HA User's Guide
Below is an step-by-step set of instructions on how to configure a two-nodes active/passive cluster, based on the example configuration we have described in this document.
The ha.cf file
Just after the installation, first step is to configure the /etc/ha.d/ha.cf configuration file for the Heartbeat cluster messaging layer. The following example is a small and simple ha.cf file:
autojoin none bcast eth0 warntime 5 deadtime 15 initdead 60 keepalive 2 node n1.cl.axilab.local node n2.cl.axilab.local pacemaker respawn
The autojoin none setting disables cluster node auto-discovery and requires that cluster nodes be listed explicitly, using the node directives, defined at the bottom in the file. This setting speeds up cluster start-up in clusters with a fixed small number of nodes.
The bcast eth0 setting configures eth0 as interface Heartbeat sends UDP broadcast traffic on.
The next options configure node failure detection. They set the time after which Heartbeat issues a warning that a no longer available peer node may be dead (warntime), the time after which Heartbeat considers a node confirmed dead (deadtime), and the maximum time it waits for other nodes to check in at cluster startup (initdead). The keepalive directive sets the interval at which Heartbeat keep-alive packets are sent. All these options are given in seconds.
The node directive identifies cluster members. The option values listed here must match the exact host names of cluster nodes as given by uname -n.
The pacemaker directive set to the respawn value enables the Pacemaker cluster manager.
The authkeys file
The /etc/ha.d/authkeys file contains pre-shared secrets used for mutual cluster node authentication. It contains pairs of two lines, one specifying a key identifier and the second line specifying key's hashing algorithm and a secret.
An example used in our setup is:
auth 1 1 sha1 Ax1G3N^MAIL#Server
Configuration propagation
You must copy the /etc/ha.d/ha.cf and /etc/ha.d/authkeys on the second node to have the exactly same content.
You can also use the ha_propagate tool, which uses scp to copy the files to the remote node(s) in the cluster. This tool can be found in either the /usr/share/heartbeat/ or /usr/lib/heartbeat/ directories, depending on the package distribution you have installed.
Service startup
Make sure you set the heartbeat init script to start at boot time, using the command:
chkconfig heartbeat on
Starting the heartbeat services is as simple as:
service heartbeat start
Please issue the above commands also on the second node in the cluster.
Verify the service successful startup with the crm_mon command, like:
crm_mon -1
Cluster preparations
For data safety, the cluster will have enabled by default STONITH by default. We will disable it and configure it at a later point, by setting stonith-enabled cluster option to false:
crm configure property stonith-enabled=false
After this, the live cluster configuration verification command, crm_verify -L, will return no error.
Because the resources are being started by the cluster immediately after their creation, we will put both nodes in stand-by and bring them online after finishing resources configuration, so that . To put both nodes in stand-by, just issue the following two commands:
crm_standby -U n1.cl.axilab.local -v on crm_standby -U n2.cl.axilab.local -v on
The crm configure show command will list nodes with their standby attribute set to on.
In order to reduce the possibility of data corruption, Pacemaker's default behavior is to stop all resources if the cluster does not have quorum. Because a cluster is said to have quorum when more than half the known or expected nodes are online, a two-node cluster only has quorum when both nodes are running, which is no longer the case for our cluster. It is possible to control how Pacemaker behaves when quorum is lost. In particular, we can tell the cluster to simply ignore quorum altogether:
crm configure property no-quorum-policy=ignore
DRBD resource
The first resource you need to add to your cluster is the DRBD file system you have previously created. This functionality is provided by the ocf:linbit:drbd resource agent, as follows:
crm configure primitive drbd_axigen_ha \ ocf:linbit:drbd \ params drbd_resource=AxigenData \ op monitor interval=60s
The above resource, named drbd_axigen_ha, specifies only the DRBD resource as parameter and a monitoring interval of 60 seconds.
Next, we need to create a master/slave resource, which will tell the cluster manager to only run the drbd_axigen_ha resource on the node that has DRBD configured as primary.
crm configure ms ms_drbd_axigen_ha drbd_axigen_ha \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
The third resource related to DRBD is the file system mount itself, provided by the ocf:heartbeat:Filesystem resource agent, configured with parameters specifying the device to mount, the mount point and the file system type.
crm configure primitive fs_axigen_ha \
ocf:heartbeat:Filesystem \
params \
device="/dev/drbd/by-res/AxigenData" \
directory="/var/clusterfs/data" \
fstype="ext3"
Finally, we have to specify that the file system resource must run on the Master node and that the mount action must take place on the same machine as the one that has been promoted the master/slave resource:
crm configure colocation fs_on_drbd inf: fs_axigen_ha ms_drbd_axigen_ha:Master crm configure order fs_after_drbd inf: ms_drbd_axigen_ha:promote fs_axigen_ha:start
IP resource
A floating IP address must be assigned to the active node in the cluster, to ensure transparency for the Axigen services. This can be achieved by defining a resource based on the ocf:heartbeat:IPaddr2 agent, as follows:
crm configure primitive ip_axigen_ha \ ocf:heartbeat:IPaddr2 \ params ip=10.9.9.90 cidr_netmask=32 \ op monitor interval=30s
Axigen service resource
The last resource is the Axigen init script configured above, which should be added like in the example below:
crm configure primitive script_axigen_ha \ lsb::axigen-ha \ op monitor interval=30s
Resource ordering
Because the successful startup of the defined resources depend on their order, you have to add some ordering constraints, which will ensure the following order: File system -> IP Address -> Axigen init script.
crm configure order ip_after_fs inf: fs_axigen_ha ip_axigen_ha crm configure order axigen_after_ip mandatory: ip_axigen_ha script_axigen_ha
Location preference
Like in the case of the resource order, besides being started in a preferred order, they also need to run on the same machine. To achieve this, the IP address and file system resources are constrained to run on the same node as the Axigen init script resource:
crm configure colocation axigen_with_ip inf: script_axigen_ha ip_axigen_ha crm configure colocation axigen_with_fs inf: script_axigen_ha fs_axigen_ha
You can also setup a preferred node for running the cluster resources, by specifying a location constraint. For example, using our example, you can set the n1.cl.axilab.local as preferred for running the Axigen init script resource (and its dependencies):
crm configure location prefer_n1 script_axigen_ha 50: n1.cl.axilab.local
Sometimes, after a node fails, it come back alive eventually. To avoid the resources being transferred back to it (generating an additional downtime), you can setup a general cluster resource stickiness with a higher score than the node preference defined above, as follows:
crm configure rsc_defaults resource-stickiness=100
Fencing
STONITH is an acronym for Shoot The Other Node In The Head and it protects your data from being corrupted by rogue nodes or concurrent access. With Pacemaker, STONITH is a node fencing daemon which also must be configured to achieve full data safety.
Using the example configuration from this document, we have defined a STONITH fencing device as follows:
crm configure primitive apc_fencing stonith::apcmaster \ params ipaddr=10.9.9.99 login=fenceadm password=P@ssw0rd \ op monitor interval="60s" crm configure clone Fencing apc_fencing
Set back the stonith-enabled cluster property you have switched off at the beginning of the cluster setup:
crm configure property stonith-enabled="true"
Moving resources
You can move a resource to a specific node by using the following command example:
crm resource move script_axigen_ha n2.cl.axilab.local
Letting back the cluster to decide where to run the resources ca be performed like:
crm resource unmove script_axigen_ha
Removing resources
If you need to remove a specific resource from the cluster configuration, you can use the crm resource command line, in the following order:
crm resource stop <resource-name> crm resource delete <resource-name> crm resource cleanup <resource-name>
v2, CRM configuration style
Current packages of heartbeat 2 use an OCF provider for DRBD, which uses a deprecated command passed to the drbdadm application, which breaks the functionality of this script. Thus, the drbd OCF provider must be modified in order to use the role command instead of the deprecated state one. Please issue the following command on both nodes:
sed -i-orig 's/ state / role /g' /usr/lib/ocf/resource.d/heartbeat/drbd
The original OCF provider script will be renamed by sed with drbd-orig.
In order to configure the heartbeat cluster, modify or create the /etc/ha.d/ha.cf file to contain the heartbeat communication timeouts, the two nodes defined and the broadcast network interface. Follow the guidelines found on the documentation found online. Please note that this version is no longer supported, thus the documentation may be incomplete or unmaintained. The following resources can be useful:
Below is the heartbeat configuration for the example we have described in this document:
- /etc/ha.d/ha.cf
keepalive 2 deadtime 30 warntime 10 initdead 30 crm yes node n1.cl.axilab.local n2.cl.axilab.local bcast eth0
- /etc/ha.d/authkeys
auth 1 1 sha1 Ax1G3N^MAIL#Server
Start the heartbeat service with:
service heartbeat start
The CRM enabled cluster can, now, be configured using the following commands:
- Setup stickiness
crm_attribute --type crm_config --attr-name default-resource-failure-stickiness --attr-value INFINITY crm_attribute --type crm_config --attr-name default-resource-stickiness --attr-value INFINITY
- Put both nodes in stand-by:
crm_standby -U n1.cl.axilab.local -v on crm_standby -U n2.cl.axilab.local -v on
- Create a temporary file, /tmp/resources.xml, with the following contents:
<resources>
<master_slave id="ms_drbd_axigen_ha">
<meta_attributes>
<attributes>
<nvpair name="notify" value="yes"/>
<nvpair name="globally_unique" value="true"/>
</attributes>
</meta_attributes>
<primitive class="ocf" provider="heartbeat" type="drbd" id="drbd_axigen_ha_AxigenData">
<instance_attributes>
<attributes>
<nvpair name="drbd_resource" value="AxigenData"/>
<nvpair name="clone_overrides_hostname" value="no"/>
<nvpair name="target_role" value="Started"/>
</attributes>
</instance_attributes>
<operations>
<op name="monitor" interval="25s" timeout="10s" role="Started"/>
<op name="monitor" interval="30s" timeout="10s" role="Slave"/>
<op name="monitor" interval="35s" timeout="10s" role="Master"/>
</operations>
</primitive>
</master_slave>
<group id="rg_axigen_ha">
<primitive class="ocf" type="Filesystem" provider="heartbeat" id="fs_axigen_ha">
<instance_attributes>
<attributes>
<nvpair name="device" value="/dev/drbd0"/>
<nvpair name="directory" value="/var/clusterfs/data"/>
<nvpair name="type" value="ext3"/>
</attributes>
</instance_attributes>
<operations>
<op name="monitor" interval="30s" timeout="10s" role="Started"/>
</operations>
</primitive>
<primitive class="ocf" type="IPaddr2" provider="heartbeat" id="ip_axigen_ha">
<instance_attributes>
<attributes>
<nvpair name="ip" value="10.9.9.90"/>
<nvpair name="nic" value="eth0"/>
<nvpair name="cidr_netmask" value="24"/>
</attributes>
</instance_attributes>
<operations>
<op name="monitor" interval="30s" timeout="10s" role="Started"/>
</operations>
</primitive>
<primitive class="lsb" type="axigen" provider="heartbeat" id="script_axigen_ha">
<operations>
<op name="monitor" interval="30s" timeout="10s" role="Started"/>
</operations>
</primitive>
</group>
</resources>
- Create a /tmp/constraints.xml file, with the following contents:
<constraints> <rsc_order from="rg_axigen_ha" action="start" to="ms_drbd_axigen_ha" to_action="promote" type="after"/> <rsc_colocation to="ms_drbd_axigen_ha" to_role="master" from="rg_axigen_ha" score="INFINITY"/> </constraints>
- Update the cluster resources and constraints:
cibadmin -o resources -R -x /tmp/resources.xml cibadmin -o constraints -R -x /tmp/constraints.xml
- Bring both nodes up from stand-by:
crm_standby -U n1.cl.axilab.local -v off crm_standby -U n2.cl.axilab.local -v off
v1, Legacy configuration style
In order to have heartbeat use the legacy configuration style, you have to specify crm no in the /etc/ha.d/ha.cf file. Follow the guidelines found on the documentation found online. Please note that this version is highly deprecated and no longer supported, thus the documentation may be incomplete or unmaintained. The following resources can be useful:
Below is the heartbeat configuration for the example we have described in this document.
- /etc/ha.d/ha.cf
keepalive 2 deadtime 30 warntime 10 initdead 30 bcast eth0 node n1.cl.axilab.local node n2.cl.axilab.local crm no auto_failback on respawn hacluster /usr/lib/heartbeat/ipfail ping 10.9.9.1 # this is an IP address visible to both nodes, i.e. router
- /etc/ha.d/haresources
n1.cl.axilab.local \
drbddisk::AxigenData \
Filesystem::/dev/drbd0::/var/cluster/data \
IPaddr2::10.9.9.90/24/eth0 \
axigen
- /etc/ha.d/authkeys
auth 1 1 sha1 Ax1G3N^MAIL#Server
After setting up identical configuration files on both nodes, start the drbd and heartbeat system services on both nodes:
service drbd start service heartbeat start
