Storage: The Achilles' Heel of Data Center Virtualization
When server virtualization was still in its infancy, servers were primarily using block-based SANs for network disk storage, and that was sufficient at the time. Then things began to change. The underlying SAN infrastructure began to feel the strain. SAN has inherent limitations in virtual server environments, while NAS has inherent strengths.
Virtual machines (VMs) have dramatically transformed the way IT organizations approach the computing architecture. Initially employed to provide greater efficiency, server virtualization has quickly revealed that its true benefits are in building an agile data center where resources can be started and stopped as necessary, files are transferred more easily, and disaster recovery and failover are simpler and more reliable. In other words, server virtualization encompasses everything that is characteristic of the modern, dynamic data center and cloud environment.
The flexibility and efficiency benefits of virtual environments have led to widespread adoption and explosive growth of virtual servers. However, storage technologies designed for physical servers have struggled to keep up with the modern demands of virtual servers, resulting in I/O bottlenecks, highly complex management, and an increase in required storage capacity.
Storage has become the Achilles' heel, preventing the virtualization of the entire data center.
Feeling the Strain
Unstructured data is growing by 60 percent per year, according to industry analyst firm IDC, creating a need for vastly different storage economics. More than 20 percent of all compute workloads are now virtualized, creating a need for vastly different shared storage architectures, and the cloud is expected to account for 14 percent of all IT spending by 2014 -- creating a need for storage that works both on-premise and in the cloud.
Organizations must find a way to store and manage this unstructured data and achieve a true virtual storage environment to complement their server virtualization. Businesses must think about what the right platform will be to support their increasingly dynamic data center environment, because it is not as simple as merely taking yesterday's storage area network (SAN) and adapting it to today's virtualized data center.
When server virtualization was still in its infancy, servers were primarily using block-based SANs for network disk storage, and that was sufficient at the time. Then things began to change. Every VM required a dedicated logical unit number (LUN) to be provisioned, so as the number of VMs grew, so did the number of LUNs.
The underlying SAN infrastructure began to feel the strain, and attempts to rectify the resulting management and scalability issues have been limited in their effectiveness. Storage is still trying to catch up to the advancements made in server virtualization.
Network attached storage (NAS), however, has emerged as a viable option for VM storage. iSCSI also has potential, offering cost and simplicity advantages over traditional Fibre Channel (FC) SAN, but it shares some of SAN's issues. SAN has inherent limitations in virtual server environments, while NAS has inherent strengths.
Scalable NAS is the way to overcome the Achilles' heel of storage. NAS has proved itself to be the storage solution of choice because of its inherent scalability and sharing capabilities, in addition to delivering cost savings and ease of use.
To support the modern data center, it is vital for storage to be
- Scalable on-demand
- Able to Scale-out
- Free Software/Open source
- Standards-based (Ethernet, x64)
Challenges of SAN
First, let us look at those inherent SAN limitations mentioned earlier:
- Scalability: Because it lacks a global namespace, SAN cannot easily manage the massive amounts of terabytes and petabytes of data created by cloud-based applications. A scalable, high-performance file system is necessary to manage all of the data.
- Manageability: Virtualized environments are intensely dynamic; hundreds of VMs can be provisioned in a matter of minutes. SANs, on the other hand, are happier with static environments where LUNs can be easily provisioned and managed. Managing provisioning, backup and recovery in the cloud is extremely complex, given the number of LUNs, and SAN administration requires significant expertise. This complexity is incompatible with the cloud environment, where benefits can only be achieved with large-scale automation.
- Data-sharing: For shared hosting of virtual disks and application data, LUNs must be accessed concurrently across physical and virtual machines. SAN has scaling and sharing limitations across LUNs, but applications must have shared access to the data partition for clustering.
- Cost: SAN was designed for mission-critical database environments with full redundancy at the hardware level, whereas cloud architects consider it expensive if they must pay more than 50 US cents per GB. New economics are required to match the massive scale of storage required in the new environment, which is why it makes sense to address storage as a software problem that can leverage commodity scale-out architecture.
Advantages of NASNext, let us examine the advantages that NAS offers to the virtualized data center environment.
- Can scale to petabytes: By the end of 2010, it is predicted that 1,200 exabytes of data will have been created. The beauty of NAS is that it can scale seamlessly, regardless of whether an organization needs to store a petabyte of data, a thousand virtual disk images, or both. Ideally, a virtualized data center would use a scalable NAS solution with a unified global namespace that automatically load-balances data across multiple storage servers.
- Shared data access: NAS volumes can be mounted simultaneously across thousands of servers, which allows the hypervisor and applications to share the same storage. This in turn allows VM migration across a large pool of servers without having to worry about storage access. NAS allows multiple VMs to access data concurrently to distribute I/O; it also provides simultaneous read/write access to files. Even a simple setup of VMs serving HTML and image files require a shared volume. Separate NAS volumes are required only for multitenancy (i.e. partitioning different groups of applications and users) and fewer volumes result in far fewer administrative tasks.
- Cost and ease of use: NAS systems using protocols like NFS and CIFS are easier to deploy and use than Fibre Channel SAN. Backup/recovery is a familiar process and VM disk images are simply files on a NAS volume. Scalable NAS solutions with software redundancy on commodity hardware can cost as little as $8,000 for 24 TB of storage.
- High performance: NAS now supports 10GigE and 40G Infiniband networking, which is faster than 8 G Fibre Channel, and it scales across multiple storage servers to deliver hundreds of gigabits throughput at 10µs latency. Other important features are replication and thin provisioning: Scalable NAS solutions support built-in mirroring and replication with snapshot/cloning in case of hardware failure or other disaster recovery needs. Thin provisioning also uses disk space more efficiently by allowing for on-demand allocation of capacity, making it easy to expand NAS volumes when needed.
Within the next few years, next generation scale-out NAS systems are going to play a significant role in providing an enterprise-wide, global unified namespace for all unstructured data, helping to eliminate the Achilles' heel of storage.
Anand Babu (AB) Periasamy is CTO and cofounder of Gluster.