Live Chat

Axigen Highly Available & Scalable Solution Architecture

From Axigen Wiki

Jump to: navigation, search

Introduction

This document describes the design and features of a standard three tier large-scale messaging solution relying on the Axigen Mail Server software and the Cluster Management software. The base architecture has been tailored having in mind the need for high performance, scalability and the fault-tolerance of the systems and services part of the resulting environment. The global architecture for the solution is described along with process and information flow explanations.


Solution overview

Definitions, terms, and abbreviations

The following definitions, terms or abbreviations are used in this document, with a brief description available:

  • Vertical scalability - potential increase in processing capacity of a machine attainable by hardware upgrades
  • Horizontal scalability - potential increase in processing capacity of a cluster attainable by increasing the number of nodes (machines)
  • Stateful services - services that provide access to persistent information (e.g. account configuration and mailbox) over multiple sessions. Typically refers to a service in the back-end tier. (e.g.. IMAP services for an account)
  • Stateless services - services that do not store persistent information over multiple sessions. Typically refers to the services in the front-end tier (e.g IMAP Proxy)
  • Front-end tier - Subnet, medium security level, provides proxy and routing services
  • Back-end tier - Subnet, high security level, provides e-mail data storage and directory services
  • Front-end node - Machine residing in the front-end network tier
  • Back-end node - Machine residing in the back-end network tier, participating in the high availability cluster
  • Service instance - A running service and storage for a specific set of accounts
  • Cluster node - A machine, capable of running one (typically) or more service instances.


Scalability

Non-distributed email solutions, where account information (configuration and messages) is stored on a single machine allow vertical scalability through hardware upgrades (CPU, RAM, additional hard disks). However, due to limitations in a typical machine (e.g. max 2 CPU, max 4 GB RAM etc) an upper limit is eventually reached where one can no longer upgrade one machine - we shall refer to this as vertical scalability limit.

When the vertical scalability limit is reached, the only solution available is to distribute account information (configuration and mailbox) on more than one machine - we shall refer to this as horizontal scalability. Since information for one account is atomic and cannot be spread across more machines, the solution is to distribute accounts on more than one machine. This way, for a single account, there will be one machine responding to requests (i.e. IMAP, SMTP etc.) for that specific account. Thus, when the overall capacity (in terms of active accounts and traffic) of the messaging solution is reached, adding one more machine to the solution and making sure new accounts are created, provides a capacity upgrade, therefore allowing virtually unlimited horizontal scalability.

The solution architecture described here is oriented towards this horizontal scalability concept and aims to provide practically unlimited expansion abilities of the messaging system. It should be noted, though, that there are both physical and logical (software related) limits that must be considered.

Since each account of the system is serviced by a specific node, a centralized location directory must be available to provide location services. In this case, an LDAP service system will store information about which node is able to service requests for a specific account.


Stateless services

Stateless services include any proxy services and all other content independent services (such as web content services) that do not store or keep track of the information exchange between clients and themselves. These systems do not require storage redundancy and their hardware failures do not affect user data, settings or other information.

Since stateless services do not store information over multiple sessions, we can assume that two different machines are able to serve requests for the same account. This way, horizontal scalability can be achieved by simply adding more machines providing the same service in the exact same configuration.

The only remaining requirement is to ensure that requests to a specific service are distributed evenly throughout the machines providing that specific service (i.e. if the front-end tier contains two machines providing IMAP proxy services, half of the incoming IMAP connections must reach one of the machines and the other half of the connections must reach the other machine). This distribution functionality is provided by a load balancer.


Fault tolerance and stateful services

For stateful services, all requests to an account are sent to one specific back-end node (mandatory), where the back-end service resides at that moment in time. If that back-end node experiences a fault and can no longer respond to requests, none of the other back-end nodes are able to serve requests for the respective account. A mechanism is required to ensure that, in the event of a catastrophic failure on one back-end node, some other node must takeover the task of serving requests for that account, thus providing high-availability.

The Cluster Management software provides this exact functionality. It ensures that if one back-end node (running a stateful service) fails, another cluster node will automatically detect the fault and start the required service, in-place of the failed node, providing minimal downtime of that service. This detection process is achieved through the heartbeat mechanism deployed along with the clustering software.

In the case of stateless services, any of the nodes providing the same service is able to respond to requests for any account. Accordingly, the only requirement is to make sure the request distribution device (load balancer) can detect when one of the front-end nodes no longer responds to requests, in order to cease the distribution of service requests to that specific front-end node. The total request processing capacity is decreased (the entire clustering system will respond slower, since one node can no longer process requests), but all service requests can still be processed.


Multi-tier structure

The solution uses three tiers to provide the required functionality.

The load balancer tier provides services for network layer 4 (transport), TCP connections, and is completely unaware of account information; it only provides distribution of connections to the nodes in the front-end tier.

The front-end tier comprises of nodes running proxy and SMTP routing services. Its task is to ensure that messages and connections are routed to the appropriate node (depending on the account for which a request is being performed) in the back-end tier.

Finally, the back-end tier provides access to persistent data (such as account configuration and mailbox data); each node in the back-end tier is capable of responding to requests for a set of accounts. No node in the back-end tier is capable of serving requests for an account that is hosted on a different node. The simplest way to identify the systems that have to run in the back-end is to check if they require storage access. All systems that require storage access are part of the back-end, while the proxy or front-end nodes will not require access to the storage resource.


High availability

Depending on the service type (stateful or stateless), high availability is achieved differently.

In the case of stateless services (services in the front-end tier), the load distribution system also provides high availability capabilities transparently. The load balancer automatically distributes requests (at TCP level – network layer 4) to all the nodes in the front-end tier, based on the configured algorithm. If one node in the front-end tier fails, the load balancer will automatically detect the failed node and will no longer distribute connections to it. A major advantage over the stateful services high availability mechanism is that, due to its active/active nature, a node failure causes no service downtime. Once a front-end node recovers, it will automatically be re-included in the load balancing mechanism and will start processing requests immediately.

High-availability in the back-end tier is achieved by making sure that in the event of failure of a physical node, the service instance is automatically started on a different cluster node. Currently, there are a number of software packages providing this functionality. Please see the Cluster management software section for a list of supported cluster management software.

Also, the high availability can be implemented for all the other devices and connections from the solution, like redundant network adapters, an additional load balancer. However, these are out of this document scope.


Mailbox node failure recovery

In the event of a failure one of the back-end nodes will be unable for any reason (software or hardware) to serve incoming requests. Heartbeat, the mechanism deployed to identify this failure, will report the issue to all of the other nodes that are part of the cluster and trigger the service migration from the failed node to the hot-standby node. The service migration can last between 10 and 60 seconds, resulting in less than a minute between service recovery. The failure of one such node will affect only a limited section of the entire pool of accounts. The quantity of user accounts affected is equal to those that were stored on the failed back-end node and connected to a Axigen service at the same time.

The hot stand-by node can take over as many failed nodes as it is configured to replace. However, the resources available to this node and how many services it can run with minimal impact on the overall system performance should be taken into consideration. The best practice, recommended in this situation, is to run no more than two services on any hot stand-by node.

In addition, the solution can be designed to have as many hot stand-by nodes as required in order to have zero performance impact in case multiple back-end tier nodes failed. However, most situations require only one hot stand-by node.


No single point of failure

The high availability used at service level provides a full no-single-point-of-failure. However, any faulty hardware component in a node causes that node to be unusable thus diminishing the total processing power (hence the total transaction capacity) the solution provides. There is a mechanism which makes sure that, even in the case of an I/O controller failure, the node can continue to provide the service; this relies on having duplicate I/O controllers on each node and a software method of failover (rerouting I/O traffic from the faulty controller to the healthy one).

The I/O high availability can be used for disk I/O and network I/O fault tolerance, provided that duplicate controllers are available on the nodes. This reduces the occurrence of service downtime in the case of stateful services; if an I/O controller fails, the service would need to be restarted on a different node.

Account provisioning

When a new account is being created, one Axigen back-end node must be selected to hold its related information. The provisioning interface must be implemented to select the back-end node based on one of the following algorithms:

  1. Random: Each new account is created on one of the back-end node, picked randomly; as an enhancement, a weighted-random distribution algorithm may be used to allow creating more accounts on some of the back-end cluster nodes than on others.
  2. Least used: The provisioning interface must be aware of the number of accounts that exist on each back-end node, so that, each time a new account is created, the back-end cluster node that has the least number of accounts is used.
  3. Domain based: Each domain is placed on one of the back-end cluster node; the provisioning interface must have configured a domain/back-end service instance table in order to be able to select a specific back-end node when creating a new account. Also, each domain will have a "home" back-end node assigned.

The first and second distribution algorithms have the advantage of a better spread of the accounts to the back-end service instances. The disadvantage resides in the fact that each domain must be created on all the back-end service instances and that domain-wide settings for each domain must be kept in sync on all the other back-end systems.

The third distribution algorithm simplifies the management of the accounts (the domain is only created on the specific back-end node that will host that domain, changes to domain configuration are performed only on the domain-wise home back-end service instance). Moreover, routing can be performed with one home back-end entry per domain instead of one entry per account.

Documentation-note.png Note: Groupware, sharing, address book and other common resources cannot be accessed between users from the same domain, if the domain is split across multiple back-end nodes.

For compatibility with other external systems, account synchronization, and also to complete the high availability and clustering functionalities, an LDAP service is required.

Axigen can authenticate out of the box against any external standard LDAP service. More than this, the current version can also synchronize its account base with specific LDAP services, namely Microsoft Active Directory and OpenLDAP. The synchronization can be made either Axigen to LDAP which means that any account created in Axigen will also be pushed and synchronized into the LDAP service, LDAP to Axigen, which means that any account created in the LDAP service (for example by a provisioning interface) will be created in Axigen internal database, or both ways synchronization, which means that the synchronization between the LDAP service and Axigen will be made in both ways: any account created in either of LDAP or Axigen will also be synchronized in the other database. In Axigen, the LDAP connectors can be configured for each domain, or generic LDAP connectors can be configured use place holders, expanded by Axigen into specific data (i.e. %d is replaced with the domain name).

In a clustered environment where multiple back-ends share accounts from the same domain, besides the user authentication and synchronization, SMTP routing is also possible using LDAP queries.


Solution architecture diagram

In the schematic diagram we have included a representation of the adjacent nodes such as the hot-standby node required for high availability purposes and the LDAP server, required for connection routing purposes.

Axigen Architecture RHEL Standard.png
Documentation-note.png Note: The above diagram is only a graphical representation of the overall solution architecture and may not reflect the actual number of machines.


Hardware requirements

Servers

The hardware recommendation for all of the solution nodes should match the performance of the following system configuration. More powerful hardware configurations can also be used and the hardware configuration should be accepted as a baseline example:

  • CPU: Intel Xeon Quad-Core (>2 GHz, 1333 MHz FSB)
  • RAM: 8 GB Fully Buffered
  • Array Controller (RAID 0/1/1+0/5)
  • 2x 72GB 15,000 RPM SAS Hot Plug Hard Drive
  • 2x Gigabit Network Adapters
Documentation-note.png Note: As an alternative to the configuration example above, any compatible hardware parts and systems from any different vendor may be used.
Documentation-note.png Note: Only heavy duty and reliable equipment (server framework / rackable) must be deployed within the clustering environment.


Load balancer

Any layer 3-7 compatible hardware load balancer can be used to provide request connection balancing. The following device types can be used as the baseline, in terms of feature availability and performance. Compatible device counterparts can be used instead, if a different vendor is preferred:

  • Nortel Alteon Application Switch series (3408E, 2424E, 2424-SSL E, 2216E, 2208E)
  • Cisco CSS 11500 Content Services Switch series (11506, 11503, 11501)
  • Sun N2000 and N1000 Application Switch series (N1400, N2040, N2120)
Documentation-note.png Note: Only one load balancer is required to have a functional solution, but two have to be used to achieve full redundancy for the entire messaging system.


Cluster storage

The clustered configuration requires all active/passive paired nodes in the cluster to access the same storage system. The storage device that can be used is either a Fiber Optics Storage (SAN / Storage Area Network) or SCSI attached storage.

The baseline hardware recommendation for the cluster storage should match the performance of the following configuration. More powerful hardware configurations can also be used instead, as well as compatible hardware from any vendors:

  • Storage HDD support: Fiber Channel drives
  • Drive speed: 15,000 RPM
  • RAID support: 1/5/6 (6 is recommended)
  • Host connections: Minimum 4 Gb Fiber Channel / iSCSI
  • Dual controller support
  • Hot-plug support

In addition to the above specifications, the maximum HDD count supported and the maximum raw capacity of the storage needs to be scaled appropriately. The storage raw capacity must accommodate the size of the existing mailboxes, if a migration is planned. In addition to the expected space requirements, at least a 25% slack (free) space has to be available to accommodate the binary format overhead for the data stored. For example, if the initial storage requirement estimation is 1 TB, the storage must be scaled to accommodate 1.25 TB of data to meet the requirements.

Documentation-note.png Note: Each of the back-end nodes connected to the shared storage device must have at least one (2 recommended for 100% redundancy) Host Based Adapter (HBA) installed. Any 4Gb fiber channel HBA type can be used for this purpose.


Fence devices

Fence devices allow a failed node to be isolated from the storage so that, at no time, two nodes may write on the same partition on the shared storage. There are two types of fence devices:

  • Remote power switches (allow the cluster software to remotely power down/reboot a node that has failed);
  • I/O barriers (allow the cluster software to block access to the shared storage for a node that has failed)
Documentation-note.png Note: I/O barriers (fiber channel fabric switches) are preferred instead of the power switches as the latter type of device will hard reset the systems to restore functionality. This behavior can cause issues related to the debugging process later on, as the state of the system is not preserved.

These components are required for the back-end tier of the solution. At least one fence device port is required for each node pair (active/passive) in the back-end tier. For example, while using fence devices with 4 ports in a solution which has 8 nodes in the back-end tier, at least 2 fence devices are required.

The following requirements must be met by the fiber channel fabric switches to fit the proposed architecture design:

  • Host connections: Minimum 4 Gb Fiber Channel
  • Minimum number of ports: 1 per back-end node (2 to achieve high availability)


Email filtering

E-mail filtering requires additional software or hardware resources added to the standard solution, and is generally performed on the Axigen nodes from the front-end tier.


Anti-virus / Anti-spam

Axigen supports a wide range of anti-virus and anti-spam products. Additionally, the SMTP service supports connecting via the standard Milter interface, thus supporting most of the world class anti-virus and anti-spam vendors, both commercial and free. The full list of supported filtering products is accessible at http://www.axigen.com/mail-server/antivirus-antispam/.

Starting with version 7.5.0, Axigen includes a higher level integration with Kaspersky Anti-Virus and Anti-Spam components, allowing the administrator to configure the filtering options via the WebAdmin or CLI interfaces.


Archiving

Axigen provides archiving capabilities through integration with third-party dedicated solutions (software or hardware).

MailArchiva is an e-mail archiving software developed by Stimulus Software, compatible with Axigen via the SMTP milter interface, available in two editions:

  • Open Source Edition - it is free and limited edition, with no commercial support included, only community support available (online resources, like forums and knowledge base)
  • Enterprise Edition - it contains more advanced features than the Open Source Edition and also includes commercial support

Archiving is also possible through dedicated hardware appliances. An appliance provider for this purpose is Arcmail.


Cluster management software

Axigen has been tested and is officially supported in a clustered three-tier solution with the following Cluster Management software:

  • Red Hat Cluster Suite
  • Windows Server Failover Cluster
  • Veritas/Symantec Cluster Server


Red Hat Cluster Suite

Red Hat Enterprise Linux is licensed on a per-host model, so a license is required for each node in the cluster (for both the back-end and the front-end). Each node in the back-end tier requires the High Availability Add-On for Red Hat Enterprise Linux license.

CentOS is a free Linux distribution alternative with community driven support, built from Red Hat Enterprise Linux published sources. While it only offers community support, it also provides all the official Red Hat products, rebranded as CentOS, but using the same code base. This includes the Cluster Suite, needed for the back-end tier nodes, which is re-compiled from the very same sources as the Red Hat Cluster Suite.

Specific documentation is present for Axigen deployment in the load balancer and front-end tiers, as well as for the back-end tier.

Personal tools
Namespaces
Variants
Actions