Closing the Server-Storage Virtualization Gap
Server virtualization technologies for Linux have advanced at a rapid pace of innovation with VMware and Citrix (Xen) initially leading the way. They are now being joined by significant strategic investments by Red Hat.
Unfortunately, the storage side of the equation has lagged behind. Several trends, such as the explosion of unstructured data and the emergence of cloud computing, have shined a spotlight on the gap and woken many to the realization that it is holding the industry back from achieving a fully virtualized data center. Linux is proving to be a superior hypervisor than even a microkernel-based VMware implementation, while having borrowed powerful ideas from microkernel design from early development.
This article will discuss the current state of Linux virtualization and provide best practices, focused on storage, aimed at closing the server-storage virtualization gap.
In the early days of virtualization, the primary objective was to improve server utilization by consolidating relatively static services such as DHCP and DNS and in development environments. This was achieved by a creating a hypervisor -- a virtual software layer between the hardware and operating system (OS).
It worked, and adoption took off. Hardware vendors got involved as well, with Intel and AMD introducing virtualization support built in to the processor. This brought a second adoption surge, and IT managers moved to virtualize the entire data center and manage it from a centralized console ... server virtualization.
The Linux kernel as a Hypervisor
In terms of performance, hardware-assisted virtualization leveled the playing field for hypervisor implementations. VMware and Xen used their own microkernel-based hypervisor with Linux device-driver emulation. Red Hat KVM took a different approach, implementing a loadable Linux kernel module and a modified QEMU for device emulation.
This makes sense for a number of reasons. The hypervisor needs to support a wide range of devices, scale to many cores and huge memory, and manage all of these resources securely and efficiently -- proven capabilities of the kernel.
This approach is also supported by a thriving open source development community. In the near future, guest OS kernels will get thinner, since the underlying hypervisor will emulate standard hardware chipsets, handle complicated functionalities such as memory management, networking I/O and OS security.
For networking, 10 gigabit Ethernet will offload TCP/IP and iSCSI to network cards. Storage virtualization will be handled separately by scale-out NAS and object storage systems.
Virtualization Evolves in the Cloud
Cloud computing is an architectural evolution of the data center tightly coupled with the need for virtualization, and the focus of innovation has shifted here. Now that hypervisors are mature management tools, monitoring capabilities and standards are evolving. Resources in the cloud are more dynamic, multi-tenant and large-scale. Virtualization vendors are rapidly adapting themselves to the cloud requirements.
RHEV 2.2 was a bold new step by Red Hat; Citrix is talking about open sourcing XenServer; Rackspace released OpenStack as free software under the Apache license; Cloud.com (previously VMOps) and Eucalyptus also release their cloud stacks under GNU GPLv3 license.
As standards emerge, there will be many options to choose from, and consolidation is inevitable.
Storage Virtualization Is Left Behind
Full data center virtualization and the cloud cannot be complete without virtualizing the storage layer. Storage was often considered an afterthought, and systems designed for transaction-oriented databases were a poor match for new demands.
The storage layer has to scale linearly in capacity and performance; throwing hardware at the problem is not a solution. Storage virtualization demands an entirely new software-based approach that leverages a scale-out architecture to deliver petabytes of capacity and GB/s of throughput.
Commodity Hardware Has Arrived
Commodity storage hardware is quickly approaching enterprise-class capabilities. Features from SCSI disks are available at low cost, high-capacity SATA drives. RAID controllers support 6 Gbps SAS connectivity and automatic tiering. In networking, 10GbE is unifying the storage and computing I/O (eliminating expensive and complicated fiber channel networking is a relief for IT organizations).
You can build a 500 terabyte super-storage configuration with 10 storage nodes, SATA drives, and 10GbE for a fraction of the price of proprietary offerings.
Storage Is a Software Problem
Today's filesystems need to handle more than data blocks. A complete storage OS stack is needed that handles volume management, software RAID, network protocols and a host of other functions. Similar to compute virtualization, it makes sense to implement most of this functionality in userspace in a virtualized container.
FUSE interface allows filesystems and virtual block devices to achieve this. Modern multicore processors are optimized to run multiple OSes in userspace concurrently, and old arguments of monolithic kernels no longer apply. The Gluster filesystem takes this approach to implement a powerful storage virtualization layer on top of commodity hardware.
Linux Direct Attached Storage
The default Linux root filesystem, Ext3, is aging, although Ext4 addresses some of its limitations.
A new filesystem named "Btrfs" is under active development and will be the Linux answer to Solaris ZFS. It supports powerful functionalities like snapshots, volume management, software RAID, online fsck and compression.
The primary limitation of disk filesystems such as Btrfs and ZFS is they do not scale outside a single server.
Standalone Linux iSCSI or NFS servers built using OpenFiler would provide standalone NAS/SAN via NFS, CIFS, FTP and iSCSI protocols to share the storage resources across multiple compute nodes.
However, standalone storage is a single point of failure.
DRBD (Distributed Replicated Block Device) provides network RAID-1 across two storage servers using active-passive HA configuration.
Linux Scale-Out NAS
The older generation of scale-out filesystems such as Oracle Lustre, Red Hat GFS, Oracle OCFS2 and SGI CXFS took a kernel-based approach. They were complex to deploy and manage and have not penetrated far outside of high-performance computing into enterprise primary storage. The newer generation of scale-out filesystems took a userspace approach. The notable ones to consider are scale-out NFS/CIFS, Ceph and GlusterFS.
- Scale-out NFS: NFS v4.1 (pNFS) was officially assigned an RFC number in January of 2010. Unfortunately, Linux pNFS also falls under the first generation category because of its centralized metadata and kernel-based approach. Enterprise adoption of pNFS will likely be slow. Since NFSv3 TCP is the most widely supported NAS protocol, it is better to implement a round-robin DNS or virtual-IP based scale-out NAS storage. NFSv3 is supported by RHEV, VMware and Xen. Virtualization and cloud users are beginning to move toward scale-out NAS from proprietary SAN-based storage solutions.
- Scale-out CIFS: Samba implements CIFS for Linux using SMB1 and SMB2 protocols. SMB2 addresses performance issues in SMB1 but will not be truly ready until next year. Even Microsoft does not recommend CIFS for Hyper-V.
- Ceph: Ceph is under active development and looks promising. The kernel space client ships with Linux starting in v2.6.34 and the server side is implemented in the userspace. Ceph does use a distributed metadata architecture that adds complexity, managing distributed replicated metadata across multiple servers. Given time and large community support, they should overcome the challenges. Ceph relies on Btrfs for the backend storage, and until NFS re-export is supported, it will take some time before server virtualization vendors support the client natively in hypervisors.
- GlusterFS: GlusterFS is a complete storage OS stack implemented in userspace. Functionalities such as volume manager, replication, striping, network protocols, I/O schedulers, threads and performance modules implemented as stackable modules. Gluster eliminates the need for metadata servers using its unique elastic hashing algorithm. It also supports online self-healing. Similar to Linux NFS, files and folders are stored on the backend disks using standard disk filesystems. Gluster supports multiple NAS protocols such as NFSv3, CIFS, WebDAV, FTP and native Gluster (via FUSE). Gluster has been deployed widely on a variety of application environments from Amazon EC2 environment to VMware. Architecture was inspired by Mach microkernel-based GNU Hurd OS.
Non-POSIX StorageA whole new breed of storage solutions have emerged that fall between the POSIX-based NAS and SQL databases. Typically referred to as "NoSQL," they form a group of solutions including distributed object store, document store, key-value pair and scale-out object databases. These solutions do require modifications to application source. Since these are purpose-built, they offer advantages in their own application space.
For example, Hadoop is designed for large-scale search analytics or data mining applications. Cassandra and MongoDB are similar to Amazon Dynamo or Google's BigTable. Redis, Memcached and Memcachedb provide a distributed key-value data store.
The future is looking bright for Linux virtualization in both the storage and computing space. The power of free software and open source is in its diversity. And when the dust settles, a few innovative solutions will emerge.
We look forward to tracking the next phase of Linux server virtualization, and expect enterprises to find it increasingly more beneficial to storage as the server-storage virtualization gap narrows.
AB Periasamy is CTO and cofounder of Gluster.