VDI Planning Guide: Highly Available Desktops
The restart-only availabaility level allows for failure of a component such as a host or storage array. Failures generally manifest as the equivalent of a "blue screen" for the virtual desktop. Consider what happens when one of your physical desktops crashes -- do the users generate a help desk ticket or do they simply restart their system?
A key advantage of virtualization is the ability to decouple an operating system from its hardware layer. This improves flexibility and maximizes uptime. Virtual server infrastructure commonly utilizes this flexibility to provide high availability, moving virtual servers between hosts. When planning and deploying a virtual desktop infrastructure (VDI) solution, the question you must ask is: "How do we design the desktops to be highly available?"
First, you must determine whether you need your desktops to be highly available. This will depend on several factors, including assigned desktops or pooled desktops, persistent or non-persistent desktops, profile management and user data management. For example, using non-persistent desktops in a pool for multiple users will have different requirements than using dedicated persistent desktops.
To determine the need for high availability and the associated costs (both in capital and operating expense planning), you must first understand the components of high availability as well as the various level of high availability.
Components of High Availability
The primary component of high availability for virtual desktops is the underlying virtualization infrastructure. This will most likely be VMware vSphere, Citrix XenServer or Microsoft Hyper-V. All of these platforms offer high availability features such as live migration -- the ability to move a running virtual machine from one physical host to another. This is a major benefit, because it allows the virtual desktops to be moved among physical hosts based on capacity, performance and maintenance windows.
Another component is your storage infrastructure. In order to migrate machines from one host to another, the virtual machines must be stored on shared storage (SAN or NAS). If you want to implement tools such as live migration, each host in the cluster must be able to read and write from the same shared storage repositories. It is also ideal to separate user data and user profiles from the virtual desktops. This user information should live on the network, making it available outside of the virtual desktops, increasing the flexibility of user data access.
The underlying VDI management components, such as Citrix XenDesktop controllers or VMware View brokers and their supporting databases, should be configured for high availability and fault tolerance to ensure that the virtual desktops can be managed and connected at all times.
Levels of High Availability
When planning for high availability, start with a fundamental understanding of how critical your system is. In other words, what kind of risks are you willing to take? How long can your network be down? These questions are of paramount importance, since planning for 99.999 percent availability is significantly different (more complex and more expensive) than planning for 95 percent availability.
When designing virtualization projects, I tend to group a company's need for high availability in these three buckets: none, restart only, always on.
None: This implies no high availability of the virtual desktop. This may include a lack of redundancy on the controller level, a lack of shared storage, or possibly just no level of guarantee. I generally don't recommend this level except for testing units that are not critical to daily functions and can suffer one or more days of downtime for a total rebuild -- if required.
Restart Only: This level allows for limited downtime but assumes recovery within a minimal window (one hour or less) and typically requires a system reboot for maintenance. This is the most common type of high availability for desktops (and commonly for servers as well.) Restart only level allows for failure of a component such as a host or storage array. Failures generally manifest as the equivalent of a "blue screen" for the virtual desktop. Consider what happens when one of your physical desktops crashes -- do the users generate a help desk ticket or do they simply restart their system?
Always On: This level allows for the greatest level of high availability and fault tolerance. It is designed to support the most mission-critical systems. If any component should fail, there is a redundant partner or real-team recovery in place to prevent outages. I see this scenario rarely used for virtual desktops but commonly used for critical system servers.
Since restart only is the most common scenario, I'll delve a bit deeper into that design. The easiest and most cost-effective method is using pooled, non-persistent desktops combined with user profiles and data management. That allows the virtual desktop to be truly volatile.
Using a pooled scenario, all desktops are based upon the same common single image. If you have 1,000 virtual desktops defined in a pool, a user can connect to any one of those 1,000 desktops with the same user experience. Combining with user data management (such as folder redirection) as well as profile management (roaming profiles, Citrix Profile Manager, AppSense Environment Manager, VMWare Persona Management, etc.) makes the random assignment transparent to the end user.
In the event of a virtual machine crash, the user will lose any unsaved work just like a local workstation crash. To recover, the user does not need to wait for the virtual desktop to restore but should be able to launch a connection to a different desktop through the connection broker. All of the user's applications and data will be available, assuming there are available virtual desktops to be assigned.
In a dedicated or persistent model, the user can still reconnect but will have to wait until the virtual desktop recovers and is available for a new connection. This is very similar to waiting for a local desktop to reboot.
Choosing the right level of highly available desktops depends on a number of factors, including any existing service level agreements with your users and your risk mitigation requirements. You may need to plan on multiple tiers of high availability based on the needs of the target user groups and the types of virtual desktops being deployed.