Data Storage: It's Time to Grow Up
There are two types of data living in the data center: transactional data and reference data. Transactional is dynamic and evolving; reference is mostly static. Application policies can immediately ascertain whether data will be static or transactional, and the most efficient ways of storing each type are very different. But IT managers continue to manage information as they did 20 years ago. Why?
Today's data-storage customer has lost his voice in a sea of vendor jockeying and positioning. This glut of vendors creates a noisy industry and a crowded marketplace, where all vendors sound the same and marketing materials are nearly indistinguishable from vendor to vendor.
Directly because of this confusion, many data storage customers have chosen to stick with practices that they are familiar with, regardless of whether those behaviors prove detrimental to their storage environment. Customers have often maintained poor storage practices because the alternatives are unclear and confusing.
Even today, storage practices haven't evolved significantly since they were first implemented over 20 years ago, although the characteristics of the data being generated and stored has substantially changed. Bloated backups are causing pain, headaches and budget issues in data centers around the world. The theory that "data is stored on disk and backed up to a less expensive media" simply isn't enough anymore.
Backup: A Storage Practice That Needs a Facelift
The content explosion, roughly 20 years ago, wreaked havoc on data centers across the globe. This was a new era of information processing, driven by digital revolutionaries' (CIOs') intent to harness all data for future use.
Disk proliferation became the norm, and frantic IT managers, worried about data loss and corruption, turned to backup. Backup applications created multiple copies of data on a daily, weekly, monthly, quarterly, and yearly basis to protect against disaster, sabotage or accidental deletions. The result was a comfortable solution for all parties involved. CIOs got to harness as much content as possible, and IT managers had a solution to ensure that data was protected over, and over, and over again. The problem was that unchanging content, or reference data, was also being backed up multiple times, too.
Transactional Data vs. Reference Data
Today, we know that there are two types of data living in the data center: transactional data and reference data. Transactional data, by nature, is dynamic and evolving. Examples of transactional data can be found in any typical Customer Relationship Management or Sales Force Management application. This data usually resides on high-spinning disk because users need frequent and immediate access to it, often making a series of edits and modifications. Reference data, on the other hand, is mostly static. This data, once created, rarely changes and is used primarily for recall purposes. Examples of reference data generators include medical imaging, financial records management, and any file associated with compliance.
When IT managers began storing data, the assumption was that all data had similar characteristics. Backup applications would not discriminate, storing multiple copies of files regardless of how the data was created or how often the data was accessed and modified. Over time, it became clear that reference data was accounting for the vast majority of content being generated and therefore being backed up. IT managers quickly determined that reference data was consuming the majority of disk space (upwards of 85 percent) and creating nightmares for backup applications that were creating multiple copies of data that never changed.
While data protection is still paramount, the new age of IT needs to solve other major problems caused by continued growth and the inefficient practices of data management. Disk proliferation, multiple copies of unchanging data, time and space management, reduced productivity and splintered storage environments can all be solved with more proactive data management implementations. The time for a paradigm shift is now.
Why Is Change So Hard?
Application and management policies can immediately ascertain whether data will be static or transactional. It is universally accepted that the bulk of content being created is reference data. CIOs have more knowledge at their disposal than ever and will freely admit that the practices of storing data need to change. But despite this knowledge base, IT managers continue to store and manage information as they did 20 years ago. Why?
IT managers know that their legacy approach won't result in the loss of their job. They know that their legacy approach is a tried and true practice that has been in place for decades. They know what their legacy approach will cost and plan accordingly. They are willing to deal with growing pains for the sake of data security and job security.
The Industry's Quick-Fix Approach
Vendors are very familiar with customer behavior. Like sharks that taste the first droplet of blood, vendors see complacency as a selling point. Not willing to change? That's fine with us, we will sell you reactive solutions to your existing problems. Think backup as an archive is the least disruptive option? Great, buy our deduplication technology that removes the extra copies of data being created. Feel like just buying more and more disks? Excellent, don't forget to keep a chunk of your budget aside for our disk virtualization technology. Maybe your idea of change is a switch in hardware vendors? We have you covered; our data migration technology will make your fork-lift upgrades more palatable.
Rather than take a hard look at their backup practices and alter their behavior, IT managers bought reactive technologies to deliver a quick fix. Companies solving data management problems sprouted up like wild weeds, and the storage industry licked their chops in anticipation of a newfound revenue stream.
IT's "bad habits" have become the new gold rush for vendors. New companies are scrambling (sacrificing) to release products in order to keep up with the established players. Today, the terms best used to mine IT gold are "cloud" and "deduplication." Vendors are frantic in their efforts to brand their products as one or the other, and in some cases, as both. However, are these solutions or just opportunistic vendors, investors and entrepreneurs looking for the next big wave to ride to financial success? Could all these vendors really have had the foresight to be working on next-generation problems, or is it more likely that these technologies were developed in a microwave in an effort to capitalize on a trend? Think about it.
A Better Solution
Why not just fix the core problem? Why feed a problem aspirin when what data center managers need is an antidote so that this problem ceases to occur? To avoid backup costs and the need for reactionary, quick-fix technologies, IT managers should seriously consider implementing a smart archive and stop using their backups as an archive.
A tiered archive will store one copy of each file safely. Archives should be accessible, secure and transparent to the user. Once a copy has been stored, that same file will never be stored again, preventing unnecessary capacity usage. Archiving takes data off primary storage and removes it from the backup process. Backup processes then do not have to examine as many files to determine their backup need. The investment in backup hardware and network infrastructure would be greatly reduced. Good storage practices should reduce data explosion, not enable it.
Quick-fix solutions wouldn't be necessary if the industry would simply nip the bad backup habit in the bud. Active archiving eliminates duplicate copies and stores information intelligently from Day 1, without having to spend a large portion of the IT budget on back-up software and hardware infrastructure. No headaches, and certainly, no aspirin needed.
Bobby Moulton is president of Seven10 Storage Software.