Live Chat

Axigen with Linux-HA and DRBD - Solution Deployment

From Axigen Wiki

Jump to: navigation, search

Example configuration

This document uses an example setup configuration, explained below. Thus, the configuration options may refer to these particular resources.


Operating system

Although DRBD and Linux-HA is supported on many Linux flavors, this document may refer particularly to commands and instructions specific to the Red Hat Enterprise Linux/CentOS. Also, this document may be extended in the future to include specific instructions for other Linux flavors supported by Axigen.


IPs and DNS

The two nodes which will be part of the cluster have the following IP addresses and DNS registered hostnames:

  • n1.cl.axilab.local, eth0/10.9.9.91
  • n2.cl.axilab.local, eth0/10.9.9.92

The active node in the cluster will have a floating IP address assigned, 10.9.9.90, registered in the DNS with the name mail.cl.axilab.local.


Storage

Each node has been configured with a partition, accessible as /dev/sda5, which will hold the entire Axigen data directory on an ext3 file system. These partitions are being synchronized with DRBD.


Fence device

The nodes are handled by an APC power switch, available at the address 10.9.9.99. The fenceadm user will be used for power cycling the nodes, if considered necessary by the Linux-HA/heartbeat suite.


DRBD

Please use this contents only as a guideline. For a detailed installation and configuration guide, please read the DRBD official documentation.


Installation

Install the latest version of the DRBD package by following the official instructions. At the moment of this writing, the latest DRBD version is 8.3. The same DRBD packages must be installed on both nodes.

If you are using a CentOS 5 platform, you have the drbd83 package and its kernel kmod-drbd83 counterpart, from the extras repository, which is active by default. If you are using Red Hat Enterprise Linux 5 or 6, you can also include the CentOS Extras repository in your yum configuration. After this, installing DRBD kernel and user space packages is as simple as:

yum install kmod-drbd83

(The drbd83 package should be pulled automatically as a dependency)


Configuration

After a successful installation of the DRBD package on both nodes, it has to be configured by editing the /etc/drbd.conf file. At the end, the same DRBD configuration must be present on both nodes.

Below is an example configuration for /etc/drbd.conf on both nodes, according to the data from the corresponding section.

 global {
   usage-count no;
 }
 resource AxigenData {
   protocol C;
   startup {
     wfc-timeout 120;
     degr-wfc-timeout 120;
   }
   disk {
     on-io-error pass_on;
     no-disk-barrier;
     no-disk-flushes;
     no-disk-drain;
   }
   syncer {
     rate 2000;
     al-extents 257;
   }
   net {
     max-buffers 2048;
     cram-hmac-alg "sha1";
     shared-secret "%Ax1G3N^MAIL#Server?";
     unplug-watermark 2048;
   }
   on n1.cl.axilab.local {
     device /dev/drbd0;
     disk /dev/sda5;
     address 10.9.9.91:7788;
     meta-disk internal;
   }
   on n2.cl.axilab.local {
     device /dev/drbd0;
     disk /dev/sda5;
     address 10.9.9.92:7788;
     meta-disk internal;
   }
 }
Documentation-note.png Note: The data marked with bold italic is the one that must be replaced with actual data from your specific setup.


Disk setup

After you have successfully modified the DRBD configuration file, you may go to creating the DRBD virtual disk(s), which will store the Axigen data directory. The commands must be issued on both nodes.

First, in order to be able to work with DRBD, you have to load the drbd kernel module:

modprobe drbd

Next, you have to initialize the DRBD resource meta data. This needs to be done before a DRBD resource can be taken online for the first time, thus only on initial device creation:

drbdadm create-md AxigenData

A successful meta data initialization makes the DRBD resource ready for attaching to the backing device:

drbdadm attach AxigenData

At the end, perform DRBD connection to the peer device, by issuing:

drbdadm connect AxigenData

After doing the above disk setup on the second node, both nodes should show an Inconsistent/Inconsistent state in the output of the drbdadm role command or the contents of the /proc/drbd file at the ds: field.

Documentation-note.png Note: After this stage, you will need to perform the operations described only on the primary node.


Disk synchronization

The initial full synchronization of the two nodes must be performed on only one node, only on initial resource configuration, and only on the node you selected as the synchronization source. This node is the one you will consider the primary node in the future cluster setup. To perform this step, issue this command:

drbdadm -- --overwrite-data-of-peer primary AxigenData

After issuing this command, the initial full synchronization will commence. You will be able to monitor its progress via /proc/drbd. It may take some time depending on the size of the device and overall disk and network performance. The synchronization is logged with the following two syslog messages:

kernel: block drbd0: Began resync as SyncSource ...
kernel: block drbd0: Resync done ...
Documentation-warning.png Warning: Do not attempt to perform the same synchronization on the secondary node, it must be performed only once on the primary node.


File system creation

At this final point, you have to create the file system for the DRBD resource, at your choice.

In our example, the filesystem has been created as ext3:

mkfs.ext3 /dev/drbd0
Documentation-note.png Note: Remember to perform this step only on the primary node.


Axigen

Package installation

Axigen must be installed on both nodes, just the package, without being started or configured.

After the package installation process finishes successfully, the axigen and axigenfilters init scripts must be disabled to start at system boot in all runlevels:

chkconfig --del axigen
chkconfig --del axigenfilters


Init script

Documentation-note.png Note: This step must be performed on both nodes.

First, you need to disable the regular axigen init script to accidentally start, by editing its corresponding configuration file /etc/sysconfig/axigen and literally setting:

AXIGEN_BACKEND="<DISABLED>"

A hard link must be created, which will point to the original package init script:

ln -v /etc/init.d/axigen /etc/init.d/axigen-ha
Documentation-warning.png Warning: Do not attempt to set the cluster init scripts (like the axigen-ha defined above) to start at system boot. They will be started, stopped and generally managed by the clustering management software.

Then, the corresponding configuration file must be created as /etc/sysconfig/axigen-ha, with the following configuration:

# The AXIGEN_BACKEND variable can be named as follows
AXIGEN_BACKEND="HA"

# The following AXIGEN_* variables are common on all nodes
AXIGEN_DATA_DIR="/var/clusterfs/data/axigen"
AXIGEN_PID_FILE="${AXIGEN_DATA_DIR}/run/axigen.pid"
AXIGEN_DAEMON_OPT="-W ${AXIGEN_DATA_DIR}"
AXIGEN_SSL_CERT="${AXIGEN_DATA_DIR}/axigen_cert.pem"
AXIGEN_SSL_DH="${AXIGEN_DATA_DIR}/axigen_dh.pem"


Storage preparation

On both nodes, you have to create a common mount point directory for the Axigen DRBD resource. In our example setup we have used the /var/clusterfs/data directory for mounting the AxigenData DRBD resource. Create the mount points on both nodes using:

mkdir -v /var/clusterfs/data

On the primary node, mount the DRBD resource, as follows:

mount /dev/drbd0 /var/clusterfs/data
cp -rav /var/opt/axigen /var/clusterfs/data


Admin password

Documentation-note.png Note: This step and the following steps for configuring Axigen must be performed only on the primary node.

To be able to login in the Axigen administrative interfaces, you need to set the password for the top level administrative user, called admin. The following command helps you with this step:

/opt/axigen/bin/axigen -W /var/clusterfs/data/axigen -A your-password


Service start

By default, all enabled Axigen services will listen to the local loopback interface, 127.0.0.1. In order to be able to use the WebAdmin interface via the cluster floating IP address, you must set it on the corresponding network interface:

/sbin/ip -f inet addr add dev eth0 10.9.9.90

You can see it set if it appears in the output of the following command:

/sbin/ip -f inet addr show dev eth0

The output should be similar with the following one, on the first node:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    inet 10.9.9.91/24 brd 10.9.9.255 scope global eth0
    inet 10.9.9.90/24 scope global secondary eth0

Start the Axigen service with the following command:

service axigen-ha start


WebAdmin setup

Then, enable the WebAdmin listener on the cluster service floating IP address, using the CLI service:

$ telnet localhost 7000
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
Welcome to AXIGEN's Command Line Interface
You must login first. For a list of available commands, type HELP
<login> user admin
<password> your-password
For a list of available commands, type HELP
+OK: Authentication successful
<#> config server
+OK: command successful
<server#> config webadmin
+OK: command successful
<server-webadmin#> add listener address 10.9.9.90:9000
+OK: command successful
<server-webadmin-listener#> commit
committing changes and switching back to previous context.
+OK: command successful (listener is closed)
<server-webadmin#> commit
committing changes and switching back to previous context.
+OK: command successful
<server#> commit
committing changes and switching back to previous context.
+OK: command successful
<#> save config
+OK: command successful
<#> quit
WARNING: all changes made and not committed are lost
connection to AXIGEN closing.
+OK: have a nice day
Connection closed by foreign host.

Point your browser at the failover domain corresponding floating IP address or its corresponding DNS name, for example http://10.9.9.90:9000 (http://mail.cl.axilab.local:9000), and try to login using the admin username and the password you have set earlier.

If the login has been successful, you can login and set the listeners for the Axigen services you want to use, including but not limited to: SMTP, IMAP, POP3, WebMail. Please use the floating cluster IP address for these services listener addresses, like for the WebAdmin service you have set above, for example 10.9.9.90.


Cleanup

You can safely stop the Axigen service and its related resources and continue with the cluster setup:

service axigen-ha stop
/sbin/ip -f inet addr del dev eth0 10.9.9.90
umount /var/clusterfs/data
service drbd stop

On the secondary node, the drbd daemon must also be stopped:

service drbd stop


Heartbeat

The Heartbeat project from Linux-HA.org had multiple development stages:

  1. Heartbeat 1.x (Legacy), with all the components integrated in the same package and a very simplistic configuration style (v1), allowing only two nodes to be clustered at the same time and lacking of resource monitoring.
  2. Heartbeat 2.x (CRM), with all the components integrated, including a CRM (cluster resource manager) component, up until version 2.1.4. The configuration file was based on XML, with few administrative GUI tools, raising the complexity of configuring the cluster.
  3. Heartbeat 3.x (Pacemaker), with all the main components having their own development cycle, with different versions, and including a command line administration interface. The CRM component was split off into a separate project called Pacemaker.

Although this guide refers to all the configuration styles, we recommend using the latest v3 style, due to older versions being currently unsupported. Also, the latest versions are well documented, both on Linux-HA and ClusterLabs sites.


Installation

Heartbeat 3

The latest version of Linux-HA and PaceMaker suites can be installed by following the installation instructions from the official ClusterLabs online documentation, on the installation page, or from the Linux-HA User's Guide, on the Heartbeat installation page.


Heartbeat 2

This version of the heartbeat package set is present in both CentOS Extras and EPEL repositories. After you setup yum to use any of them, you can install heartbeat and its dependencies, by issuing on both nodes:

yum install heartbeat


Configuration

v3, Pacemaker configuration style

In order to be able to configure the Pacemaker based cluster, there is a set of official online documents that contain very detailed instructions. You can find these documents at:

Below is an step-by-step set of instructions on how to configure a two-nodes active/passive cluster, based on the example configuration we have described in this document.


The ha.cf file

Just after the installation, first step is to configure the /etc/ha.d/ha.cf configuration file for the Heartbeat cluster messaging layer. The following example is a small and simple ha.cf file:

autojoin none
bcast eth0
warntime 5
deadtime 15
initdead 60
keepalive 2
node n1.cl.axilab.local
node n2.cl.axilab.local
pacemaker respawn

The autojoin none setting disables cluster node auto-discovery and requires that cluster nodes be listed explicitly, using the node directives, defined at the bottom in the file. This setting speeds up cluster start-up in clusters with a fixed small number of nodes.

The bcast eth0 setting configures eth0 as interface Heartbeat sends UDP broadcast traffic on.

The next options configure node failure detection. They set the time after which Heartbeat issues a warning that a no longer available peer node may be dead (warntime), the time after which Heartbeat considers a node confirmed dead (deadtime), and the maximum time it waits for other nodes to check in at cluster startup (initdead). The keepalive directive sets the interval at which Heartbeat keep-alive packets are sent. All these options are given in seconds.

The node directive identifies cluster members. The option values listed here must match the exact host names of cluster nodes as given by uname -n.

The pacemaker directive set to the respawn value enables the Pacemaker cluster manager.

Documentation-note.png Note: Prior to Heartbeat release 3.0.4, the pacemaker keyword was named crm. Newer versions still retain the old name as a compatibility alias, but the pacemaker directive is preferred by upstream developers.


The authkeys file

The /etc/ha.d/authkeys file contains pre-shared secrets used for mutual cluster node authentication. It contains pairs of two lines, one specifying a key identifier and the second line specifying key's hashing algorithm and a secret.

An example used in our setup is:

auth 1
1 sha1 Ax1G3N^MAIL#Server


Configuration propagation

You must copy the /etc/ha.d/ha.cf and /etc/ha.d/authkeys on the second node to have the exactly same content.

You can also use the ha_propagate tool, which uses scp to copy the files to the remote node(s) in the cluster. This tool can be found in either the /usr/share/heartbeat/ or /usr/lib/heartbeat/ directories, depending on the package distribution you have installed.

Service startup

Make sure you set the heartbeat init script to start at boot time, using the command:

chkconfig heartbeat on

Starting the heartbeat services is as simple as:

service heartbeat start

Please issue the above commands also on the second node in the cluster.

Verify the service successful startup with the crm_mon command, like:

crm_mon -1


Cluster preparations

For data safety, the cluster will have enabled by default STONITH by default. We will disable it and configure it at a later point, by setting stonith-enabled cluster option to false:

crm configure property stonith-enabled=false

After this, the live cluster configuration verification command, crm_verify -L, will return no error.

Because the resources are being started by the cluster immediately after their creation, we will put both nodes in stand-by and bring them online after finishing resources configuration, so that . To put both nodes in stand-by, just issue the following two commands:

crm_standby -U n1.cl.axilab.local -v on
crm_standby -U n2.cl.axilab.local -v on

The crm configure show command will list nodes with their standby attribute set to on.

In order to reduce the possibility of data corruption, Pacemaker's default behavior is to stop all resources if the cluster does not have quorum. Because a cluster is said to have quorum when more than half the known or expected nodes are online, a two-node cluster only has quorum when both nodes are running, which is no longer the case for our cluster. It is possible to control how Pacemaker behaves when quorum is lost. In particular, we can tell the cluster to simply ignore quorum altogether:

crm configure property no-quorum-policy=ignore


DRBD resource

The first resource you need to add to your cluster is the DRBD file system you have previously created. This functionality is provided by the ocf:linbit:drbd resource agent, as follows:

crm configure primitive drbd_axigen_ha \
  ocf:linbit:drbd \
  params drbd_resource=AxigenData \
  op monitor interval=60s

The above resource, named drbd_axigen_ha, specifies only the DRBD resource as parameter and a monitoring interval of 60 seconds.

Next, we need to create a master/slave resource, which will tell the cluster manager to only run the drbd_axigen_ha resource on the node that has DRBD configured as primary.

crm configure ms ms_drbd_axigen_ha drbd_axigen_ha \
  meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

The third resource related to DRBD is the file system mount itself, provided by the ocf:heartbeat:Filesystem resource agent, configured with parameters specifying the device to mount, the mount point and the file system type.

crm configure primitive fs_axigen_ha \
  ocf:heartbeat:Filesystem \
  params \
    device="/dev/drbd/by-res/AxigenData" \
    directory="/var/clusterfs/data" \
    fstype="ext3"

Finally, we have to specify that the file system resource must run on the Master node and that the mount action must take place on the same machine as the one that has been promoted the master/slave resource:

crm configure colocation fs_on_drbd inf: fs_axigen_ha ms_drbd_axigen_ha:Master
crm configure order fs_after_drbd inf: ms_drbd_axigen_ha:promote fs_axigen_ha:start


IP resource

A floating IP address must be assigned to the active node in the cluster, to ensure transparency for the Axigen services. This can be achieved by defining a resource based on the ocf:heartbeat:IPaddr2 agent, as follows:

crm configure primitive ip_axigen_ha \
  ocf:heartbeat:IPaddr2 \
  params ip=10.9.9.90 cidr_netmask=32 \
  op monitor interval=30s


Axigen service resource

The last resource is the Axigen init script configured above, which should be added like in the example below:

crm configure primitive script_axigen_ha \
  lsb::axigen-ha \
  op monitor interval=30s


Resource ordering

Because the successful startup of the defined resources depend on their order, you have to add some ordering constraints, which will ensure the following order: File system -> IP Address -> Axigen init script.

crm configure order ip_after_fs inf: fs_axigen_ha ip_axigen_ha
crm configure order axigen_after_ip mandatory: ip_axigen_ha script_axigen_ha


Location preference

Like in the case of the resource order, besides being started in a preferred order, they also need to run on the same machine. To achieve this, the IP address and file system resources are constrained to run on the same node as the Axigen init script resource:

crm configure colocation axigen_with_ip inf: script_axigen_ha ip_axigen_ha
crm configure colocation axigen_with_fs inf: script_axigen_ha fs_axigen_ha

You can also setup a preferred node for running the cluster resources, by specifying a location constraint. For example, using our example, you can set the n1.cl.axilab.local as preferred for running the Axigen init script resource (and its dependencies):

crm configure location prefer_n1 script_axigen_ha 50: n1.cl.axilab.local

Sometimes, after a node fails, it come back alive eventually. To avoid the resources being transferred back to it (generating an additional downtime), you can setup a general cluster resource stickiness with a higher score than the node preference defined above, as follows:

crm configure rsc_defaults resource-stickiness=100


Fencing

STONITH is an acronym for Shoot The Other Node In The Head and it protects your data from being corrupted by rogue nodes or concurrent access. With Pacemaker, STONITH is a node fencing daemon which also must be configured to achieve full data safety.

Using the example configuration from this document, we have defined a STONITH fencing device as follows:

crm configure primitive apc_fencing stonith::apcmaster \
  params ipaddr=10.9.9.99 login=fenceadm password=P@ssw0rd \
  op monitor interval="60s"
crm configure clone Fencing apc_fencing

Set back the stonith-enabled cluster property you have switched off at the beginning of the cluster setup:

crm configure property stonith-enabled="true"


Moving resources

You can move a resource to a specific node by using the following command example:

crm resource move script_axigen_ha n2.cl.axilab.local

Letting back the cluster to decide where to run the resources ca be performed like:

crm resource unmove script_axigen_ha


Removing resources

If you need to remove a specific resource from the cluster configuration, you can use the crm resource command line, in the following order:

crm resource stop <resource-name>
crm resource delete <resource-name>
crm resource cleanup <resource-name>


v2, CRM configuration style

Current packages of heartbeat 2 use an OCF provider for DRBD, which uses a deprecated command passed to the drbdadm application, which breaks the functionality of this script. Thus, the drbd OCF provider must be modified in order to use the role command instead of the deprecated state one. Please issue the following command on both nodes:

sed -i-orig 's/ state / role /g' /usr/lib/ocf/resource.d/heartbeat/drbd

The original OCF provider script will be renamed by sed with drbd-orig.

Documentation-note.png Note: In case of an update of the heartbeat package, the above command must be issued again, because the package manager will overwrite the script file.

In order to configure the heartbeat cluster, modify or create the /etc/ha.d/ha.cf file to contain the heartbeat communication timeouts, the two nodes defined and the broadcast network interface. Follow the guidelines found on the documentation found online. Please note that this version is no longer supported, thus the documentation may be incomplete or unmaintained. The following resources can be useful:

Below is the heartbeat configuration for the example we have described in this document:

  • /etc/ha.d/ha.cf
keepalive 2
deadtime 30
warntime 10
initdead 30
crm yes
node n1.cl.axilab.local n2.cl.axilab.local
bcast eth0
  • /etc/ha.d/authkeys
auth 1
1 sha1 Ax1G3N^MAIL#Server
Documentation-warning.png Warning: At this point, make sure the drbd system service is stopped on both nodes and is not set to be started at system boot time. The entire DRBD interaction is being handled by the heartbeat service itself.

Start the heartbeat service with:

service heartbeat start

The CRM enabled cluster can, now, be configured using the following commands:

  • Setup stickiness
crm_attribute --type crm_config --attr-name default-resource-failure-stickiness --attr-value INFINITY
crm_attribute --type crm_config --attr-name default-resource-stickiness --attr-value INFINITY
  • Put both nodes in stand-by:
crm_standby -U n1.cl.axilab.local -v on
crm_standby -U n2.cl.axilab.local -v on
  • Create a temporary file, /tmp/resources.xml, with the following contents:
<resources>
  <master_slave id="ms_drbd_axigen_ha">
    <meta_attributes>
      <attributes>
        <nvpair name="notify" value="yes"/>
        <nvpair name="globally_unique" value="true"/>
      </attributes>
    </meta_attributes>
    <primitive class="ocf" provider="heartbeat" type="drbd" id="drbd_axigen_ha_AxigenData">
      <instance_attributes>
        <attributes>
          <nvpair name="drbd_resource" value="AxigenData"/>
          <nvpair name="clone_overrides_hostname" value="no"/>
          <nvpair name="target_role" value="Started"/>
        </attributes>
      </instance_attributes>
      <operations>
        <op name="monitor" interval="25s" timeout="10s" role="Started"/>
        <op name="monitor" interval="30s" timeout="10s" role="Slave"/>
        <op name="monitor" interval="35s" timeout="10s" role="Master"/>
      </operations>
    </primitive>
  </master_slave>
  <group id="rg_axigen_ha">
    <primitive class="ocf" type="Filesystem" provider="heartbeat" id="fs_axigen_ha">
      <instance_attributes>
        <attributes>
          <nvpair name="device" value="/dev/drbd0"/>
          <nvpair name="directory" value="/var/clusterfs/data"/>
          <nvpair name="type" value="ext3"/>
        </attributes>
      </instance_attributes>
      <operations>
        <op name="monitor" interval="30s" timeout="10s" role="Started"/>
      </operations>
    </primitive>
    <primitive class="ocf" type="IPaddr2" provider="heartbeat" id="ip_axigen_ha">
      <instance_attributes>
        <attributes>
          <nvpair name="ip" value="10.9.9.90"/>
          <nvpair name="nic" value="eth0"/>
          <nvpair name="cidr_netmask" value="24"/>
        </attributes>
      </instance_attributes>
      <operations>
        <op name="monitor" interval="30s" timeout="10s" role="Started"/>
      </operations>
    </primitive>
    <primitive class="lsb" type="axigen" provider="heartbeat" id="script_axigen_ha">
      <operations>
        <op name="monitor" interval="30s" timeout="10s" role="Started"/>
      </operations>
    </primitive>
  </group>
</resources>
  • Create a /tmp/constraints.xml file, with the following contents:
<constraints>
  <rsc_order from="rg_axigen_ha" action="start" to="ms_drbd_axigen_ha" to_action="promote" type="after"/>
  <rsc_colocation to="ms_drbd_axigen_ha" to_role="master" from="rg_axigen_ha" score="INFINITY"/>
</constraints>
  • Update the cluster resources and constraints:
cibadmin -o resources -R -x /tmp/resources.xml
cibadmin -o constraints -R -x /tmp/constraints.xml
  • Bring both nodes up from stand-by:
crm_standby -U n1.cl.axilab.local -v off
crm_standby -U n2.cl.axilab.local -v off


v1, Legacy configuration style

In order to have heartbeat use the legacy configuration style, you have to specify crm no in the /etc/ha.d/ha.cf file. Follow the guidelines found on the documentation found online. Please note that this version is highly deprecated and no longer supported, thus the documentation may be incomplete or unmaintained. The following resources can be useful:

Below is the heartbeat configuration for the example we have described in this document.

  • /etc/ha.d/ha.cf
keepalive 2
deadtime 30
warntime 10
initdead 30
bcast   eth0
node n1.cl.axilab.local
node n2.cl.axilab.local
crm no
auto_failback on
respawn hacluster /usr/lib/heartbeat/ipfail
ping 10.9.9.1 # this is an IP address visible to both nodes, i.e. router
  • /etc/ha.d/haresources
n1.cl.axilab.local \
      drbddisk::AxigenData \
      Filesystem::/dev/drbd0::/var/cluster/data \
      IPaddr2::10.9.9.90/24/eth0 \
      axigen
  • /etc/ha.d/authkeys
auth 1
1 sha1 Ax1G3N^MAIL#Server

After setting up identical configuration files on both nodes, start the drbd and heartbeat system services on both nodes:

service drbd start
service heartbeat start
Documentation-note.png Note: Make sure you also set the drbd and heartbeat services to start at boot time.
Personal tools
Namespaces
Variants
Actions