The concept of fault domains was introduced to protect the cluster from failures at the level of server racks or disk baskets, which are logically grouped into these domains. The inclusion of this mechanism leads to the distribution of data to ensure their fault tolerance, not at the level of individual nodes, but at the level of domains, which will allow you to survive the failure of the whole domain – all nodes grouped in it (for example, a server rack), since replicas of objects will necessarily be placed on nodes from different failure domains.
The minimum fault domain is a disk group, which is a logically related disk unit. Each disk group contains two types of media – cache and capacity. The system allows using only solid-state disks as cache carriers, and both magnetic and solid-state disks can act as capacity carriers. Caching media helps speed up magnetic disks and reduce latency when accessing data.
How many fault domains are recommended in a vSAN cluster?
The number of fault domains is calculated using the formula: the number of fault domains = 2 * number of failures to tolerate + 1.
The minimum required by vSAN is 2 failure domains, each with one or more hosts, although experts advise four because this provides for the potential of a rebuild in the event of failure (2-3 domains do not allow rebuilding). Determining the host is a similar method to figuring out how many fault domains there are in order to determine the required number of failures.
Ideally, there should be the same number of hosts in each fault domain, the hosts should have an identical configuration, and it is recommended to leave the space of one domain empty for the possibility of rebuilding (for example, 4 domains with 1 failover).
The fault domain system works not only for Mirroring (RAID-1) but also for Erasure Coding. In this case, each component of the object must be located in different fault domains, and the formula for calculating the number of fault domains changes a minimum of 4 domains for RAID-5 and 6 domains for RAID-6 (similar to the calculation of the number of hosts for Erasure Coding).
Design and size of fault-tolerant vSAN structures
To cope with host crashes, the PFTT (Primary level of failures to tolerate) attribute must be configured in the virtual machine storage policies. More calculated failures – more capacity hosts required.
When connecting cluster hosts to rack servers, you can arrange them in a fault domain to increase fault tolerance, in particular, to withstand switch failures at the top of the rack and power losses. Fault domains do this by distributing redundancy components on various servers in separate compute racks. Each of them includes at least one host and must meet the hardware requirements.
Best practice, in this case, is to use at least four fault domains, as for three, some data escape schemes may not be supported and reprotection after a vSAN failure is not guaranteed.
When you enable fault domains, vSAN applies the active VM storage policy to them, not to individual hosts. Note that if a host is not part of a fault domain, vSAN interprets it as a standalone fault domain. When increasing capacity and adding hosts, you can use an existing fault domain configuration or define a separate one.
It is important to correctly balance storage in terms of fault tolerance. To achieve this, consider the following:
- Provide a sufficient number of fault domains to satisfy the calculated PFTT (minimum 3, ideally 4 or more);
- Assign the same number of hosts to each fault domain;
- Use hosts with the same configurations;
- Allocate one fault domain of free capacity for data recovery after a failure.
Clusters with different host configurations are not as predictable in their performance. It decreases, including due to differences in the types of cache devices. In addition, they have different maintenance procedures. In the case of three hosts, there will only be one failure handled. In this case, each of the two necessary replicas of the virtual machine data will be located on different hosts.
For a three-host structure, if one of the hosts goes for maintenance, VMware vSAN will not be able to evacuate data from it. Any additional failure is catastrophic in this mode. In this situation, it is recommended that you always use the “Ensure accessibility” option when evacuating. It ensures that the objects are still available throughout the data migration.
Two-three-host configurations typically do not comply with the failover policy standard. However, there are different vSAN options that can support this config.
Fault domains for 2-node clusters
A typical vSAN cluster consists of a minimum of three hosts, each contributing to the total capacity. For a 2-node cluster, an external host Witness is necessary in the case of VMware vSAN. VMware vSAN host with 2-host Witness refers to a dispersal where a user sets up a 2-host vSAN cluster on a single server. The vSAN 2-node connects to a switch or, in some versions, through a direct connection.
The vSAN Witness node, which offers a two-node quorum, can be placed on a third server through low bandwidth/high latency lines or alternate infrastructure at the same site. Every node is set up as a vSAN fault domain. Configuration supported: 1 + 1 + 1 (2 nodes + Witness host vSAN).
Prior to vSAN 7, a dedicated Witness was needed for every 2-node configuration. One or more 2-node setups may make use of a shared Witness device example thanks to vSAN 7 Update 1. One Witness Appliance can be used by as many as 64 clusters of two nodes. This innovation has a significantly simplified design, controls, and operation.
When VMs are installed on vSAN host 2 clusters with two fault domains, they frequently duplicate data security, with one version of the data on host 1 and a second version of the data on host 2. A Witness component is frequently placed on the vSAN Witness hosting or vSAN Witness device.
A complete duplicate of the VM information is still accessible on the backup host in the case of a host or equipment failure. The VM continues to be accessible on the vSAN datastore due to the Witness component’s and alternate replica’s ongoing accessibility.
Disclaimer: This is a sponsored article.