Data Management


Managing the Long Tail of Digital Storage

“The IRS isn’t interested in the show that flopped.” If you’re unfamiliar with that line, it’s straight out of the Broadway musical The Producers, which was a notoriously over-the-top story about how to increase profit by making something, a product, that does not strike it big. It was a comedic yet interesting idea that by now has permeated the minds of marketers and businessmen everywhere, due largely to Chris Anderson.

In Chris Anderson’s 2004 Wired article aptly titled “The Long Tail,” he explored a basic premise of business and tailored it to the tech sector. Many Web properties and applications today in media, retail, advertising and search, for example, take advantage of an infinite amount of theoretical “shelf space.”

The Internet allows you to sell a limitless number of unique items, each in relatively small quantities — quantities so small, they would have been considered a “flop” or “miss” before the era of e-commerce.

Naturally the Web’s never-ending warehouse has an advantage over physical space, brick and mortar stores, which are shackled by economic scarcity, physical locations and are forced to carry stuff (books, DVDs, etc.) that must generate sufficient demand to earn its keep.

The Internet’s space, combined with real-time information about buying trends and public opinion, generates significant amounts of real revenue flow whose source is not “hits” like teen-pop music or under-acted, special effect-laden summer blockbusters, but more niche, obscure, harder-to-find products.

Hits and Misses

The online world of limitless product has changed the game — it’s a world where “misses,” alongside hits, usually make money too. As Anderson notes, “A hit and a miss are on equal economic footing, both just entries in a database called up on demand, both equally worthy of being carried. Suddenly, popularity no longer has a monopoly on profitability.”

So while Britney Spears will continue to sell a magnitude more than Face to Face, for example, the cumulative size of the long tail and all its titles can result in revenue that rivals or exceeds “the hits” themselves.

Cleversafe Chart

To the right in yellow is the long tail, while the left in green represents the hits that dominate. Extend the yellow long tail infinitely, and the volume reaches a point where it exceeds the finite number of units that are more popular.

The Role of Storage Within “The Long Tail”

While the Internet can, in theory, carry limitless products (hits or misses being irrelevant), there are significant challenges and cost considerations associated with supporting the storage and content delivery infrastructures a true long-tail business requires. While the Internet has “broken the tyranny of physical space,” as Anderson writes, it’s still subject to the available technologies on the market. When you combine traditional storage and content delivery technologies and petabytes of data (especially unstructured data like video or music files), the long-tail business loses some of its sheen.

IDC estimated that the digital universe (i.e., the amount of digital content and replicated data) exceeded more than 281 exabytes in 2007 and will grow tenfold by 2011. The traditional approach to storage — which relies on making copies ee is not suited to handle this deluge of data. In the near future, the increased cost and management challenges due to the overhead associated with duplicating large files of unstructured data is going to cause major problems.

Data protection today, which is based on RAID 6, for example, relies on antiquated replication technologies. Simply do the math — with RAID 6, when more than two drives fail, data is unrecoverable. To address RAID’s shortcomings, organizations use replication, a technique of making additional copies of data. While replication can help with failure scenarios such as location failure, power outages, bandwidth unavailability, just making one additional copy requires 2.5 times the original data stored (assuming the use of RAID arrays), 2.5 times the bandwidth, 2.5 times the storage equipment, cooling, power and floor space. Bottom line: RAID and replication are cost-prohibitive in scale.

Fulfilling the Technological ‘Long Tail’

In storage, achieving “the long tail” for multi-terabyte to petabytes of unstructured content — videos, images, audio files — is being tackled by a storage method called “dispersal.” Instead of copying data, dispersal divides data into “slices” and disperses them across a secure network to different geographic locations. Each slice contains too little information to be useful, but any threshold of the slices can be used to perfectly re-create the original data. The sum of all the slices is still less than maintaining multiple copies of the original data.

Beyond storage, long-tail businesses need to be able to manage and distribute these digital assets anywhere. Some storage vendors are now venturing into providing next-generation content delivery networks (CDNs) to tackle these requirements in a unique and less-costly way. Since the majority of content within a content origin server is not a hit, it doesn’t necessarily need to be routed through a traditional CDN designed to handle a massive volume of parallel requests for the same content.

Admittedly, my initial thoughts when recently re-reading Anderson’s article were (with the benefit of hindsight), “Innovative perspective to expanding market size — hard to tackle practically.” However, the more I thought about it, storage and content delivery technology via dispersal is important in bridging the long tail as a concept to the long tail as a business reality.

Chris Gladwin is CEO of Cleversafe, a storage technology vendor based in Chicago.

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Technewsworld Channels