The Next Frontier for Packet Switching
Enterprise IT departments are creaking under the strain of the massive and growing amounts of data they need to archive. The sheer volume of data only increases the chances that something will be lost, corrupted or exposed. One solution for efficient, secure and reliable data storage may lie in information dispersal.
05/12/09 4:00 AM PT
Packet switching, in short, is a method to deliver data across a computer network connection. The impact of packet switching cannot be understated, as it essentially makes today's Internet function.
Looking to the future, packet switching is on the precipice of impacting and disrupting another technology -- storage -- which is gaining more and more attention in the enterprise as the amount of digital data that needs to be stored grows exponentially.
Over the past few years, the nature of data that needs to be stored has changed from smaller files and structured applications to larger, digital content, media and imaging. In addition to this change in data type, there is simply more data than there used to be: IDC estimated that the digital universe exceeded more than 281 exabytes in 2007 and will grow tenfold by 2011. This rapid data growth means major management and storage headaches for larger enterprises. The more data -- and the more large data -- there is, the more important efficient storage becomes.
What's the aspirin for this ever-present problem? Look to packet switching slices of virtualized data -- also known as "information dispersal" -- to revolutionize the way that digital content is stored and delivered.
Expect RAID to Fade
The architecture of storage hasn't changed in 50 years in terms of storing full copies of data. Replicating data is not suited practically or economically for rich digital media because of the high overhead of maintaining duplicate files. As a result, many organizations today are just "playing the odds" that they will not lose critical digital assets because it's just too expensive to adequately protect them. In many cases, the odds have not been kind. In fact, Privacyrights.org lists more than 250 announced security breaches since 2004 alone.
As a result, it's likely that storage technologies based on replication -- specifically RAID (Redundant Array of Inexpensive Disks) -- will see significant declines in popularity over the coming years. Why? Because RAID is mathematically reaching a breaking point based on one-terabyte drives. RAID 6, based on parity, cannot recover from more than two simultaneous failures, or two failures and a bit rate error.
Typical SATA (Serial Advanced Technology Attachment) drives have a published bit rate error (BRE) of 1014, meaning once every 100,000,000,000,000 bits, there will be a bit that is unrecoverable. Although this failure rate seems insignificant, when reading 100 terabytes (note, 100 terabytes is 1014 bits), it is nearly certain there will be an unreadable bit, and if this read happens to be during a rebuild, data will be lost.
What Is Information Dispersal?
Think of "Willy Wonka & the Chocolate Factory," where the character Mike Teavee is transferred across a room in bits from real-life into TV-life. Information dispersal is similar. Dispersal involves slicing data into pieces too small to be useful on their own and distributing the sliced pieces to multiple storage nodes on a network of local or remote servers. Unlike the boy in the movie who comes out shrunken after transmission, data retrieved using information dispersal is always bit perfect -- data is accessed exactly as it was stored.
How did this simple but game-changing idea of digital storage get life in the first place?
A quick anecdote: A few years ago, I was looking for a safe, effective way to store my personal data -- nearly 30 GB (equivalent to a library of more than 22,000 books) of music, photos and documents that I had been meaning to organize for years. Fortunately, I found a gem in my readings of early encryption strategies: Cryptographers kept information secure by dividing it into pieces and dispersing it. Combining that gem with a little research into a high-performance variant of Reed-Solomon coding, I came up with information dispersal algorithms that allow for scalability in the same way the Internet was architected to scale. (Also of note: Similar mathematical methods are the basis of digital mobile telephony).
One of the most important benefits of storage via information dispersal is its tolerance of multiple failures of hardware, locations or administrators, while still keeping data secure. That's because each individual slice contains too little information to be useful, but any threshold of the slices can be used to perfectly recreate the original data. Also, because this approach does not make copies of data, it reduces the amount of storage space required for high availability storage, saving money on expensive recurring capital expenditures. By virtualizing data at the bit level, information dispersal disassociates storage from hardware, enabling centuries-long storage.
Beyond Storage: How Information Dispersal Affects Content Delivery
While information dispersal radically affects how storage is viewed, it also radically affects the delivery of digital content. While today's content delivery networks (CDNs) cannot dynamically switch networks if they slow down during delivery, dispersal can -- at a pace of thousands of times per second -- in order to optimize content routing for efficiencies and speed.
Consider Chicago's Museum of Broadcast Communications (MBC), a nonprofit that collects, preserves and presents more than 100,000 hours of historic and contemporary radio and television content. Unable to afford sophisticated CDN systems, MBC -- using informational dispersal techniques -- saves money and physical space without sacrificing performance or scalability for their end users. To see how smooth the videos play, go here, register for a free account, and pull up a classic show like the first ever-episode of "The Late Show with David Letterman."
In conclusion, just as packet-switching networks enabled data communications to scale via the Internet, dispersed storage will enable organizations to cost-effectively scale without the bandwidth overhead and additional capacity required with a copy-based storage system. It just goes to show you, even in a seemingly mature technology like storage, innovation continues to win the day.
Chris Gladwin is CEO of Cleversafe, a storage technology vendor based in Chicago.