Data Management

Study: Dark Data Shadow Follows Everyone

The “digital universe” of data was bigger than expected in 2007 and continuing to explode in size, according to a new study from IDC.

The study, sponsored by EMC and titled “The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth Through 2011,” found that there were about 281 billion gigabytes (or 281 exabytes) in the digital universe in 2007, exceeding original estimates by about 10 percent.

With a compound annual growth rate of almost 60 percent, meanwhile, the digital universe is also growing faster than was previously thought, and is projected to increase tenfold over the next five years to reach nearly 1.8 zettabytes — or 1,800 exabytes — in 2011, the study’s authors predicted.

“Society is already feeling the early effects of the world’s digital information explosion,” said Joe Tucci, chairman, president and CEO of EMC. “Organizations need to plan for the limitless opportunities to use information in new ways and for the challenges of information governance.”

45 GB per Person

In 2007 the digital universe was equal to almost 45 gigabytes of digital information for every person on earth, IDC said, or the equivalent of more than 17 billion 8 GB iPhones.

Accelerated growth in worldwide shipments of digital cameras, digital surveillance cameras and digital televisions are among the factors behind the information explosion, IDC found.

Other fast-growing corners of the digital universe include those related to Internet access in emerging countries, sensor-based applications, data centers supporting “cloud computing” and social networks comprised of digital content created by many millions of online users, the study found.

The Digital Shadow

Of the wealth of data that exists about individuals, IDC found that the majority is now created by entities other than the individuals themselves, the study found.

“We discovered that only about half of your digital footprint is related to your individual actions — taking pictures, sending e-mails, or making digital voice calls,” explained John Gantz, chief research officer and senior vice president with IDC.

“The other half is what we call the ‘digital shadow’ — information about you — names in financial records, names on mailing lists, Web surfing histories or images taken of you by security cameras in airports or urban centers,” Gantz added. “For the first time, your digital shadow is larger than the digital information you actively create about yourself.”

New External Focus

With so much data in general and so much information about virtually every individual on the planet, security, privacy protection, reliability and legal compliance will all draw increased attention, IDC said.

For corporate IT departments, one of the biggest transitions will be from focusing purely on internally generated data to also managing data that comes from outside the company, Dave Reinsel, group vice president for storage and semiconductor research with IDC and a coauthor on the study, told TechNewsWorld.

“All of a sudden, companies providing structures for Web 2.0 or other service-oriented architectures are becoming custodians for someone else’s data,” Reinsel explained.

More Unstructured Data

Expiration concerns will be among those that emerge as a result, he said.

For example, “if a customer wants data deleted, it will have to be removed off the primary database but also through the entire infrastructure,” he said.

An increasing proportion of unstructured data, meanwhile, will make it difficult to maintain relevancy, Reinsel added. “With structured data, it’s nicely organized, but when it’s unstructured, many times we don’t even know where it is,” he explained.

IDC also found that the number of individual information packets is growing even faster than the simple amount of information, Reinsel noted. “Managing that influx is going to be very difficult,” he warned. “Companies will need protection schemes and good information management to understand what that data is.”

Privacy Concerns

Privacy advocates, not surprisingly, worry about the effect of all this data on individual privacy.

“My big concern is that pretty soon these organizations that have collected so much information about us will know more about us than we do about ourselves,” Marc Rotenberg, executive director of the Electronic Privacy Information Center (EPIC), told TechNewsWorld. “We need to start thinking about this, particularly as ID theft becomes more widespread.”

Possible approaches to protecting privacy could include limiting the amount of data retained, making companies more transparent in the information they collect, and also making it more difficult for companies to collect it in the first place, Rotenberg said.

“We don’t think the ‘notice and choice’ approach is correct,” he added. “Information needs to be made less personally identifiable.”

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

More by Katherine Noyes
More in Data Management

Technewsworld Channels