How to Drink from the Big Data Firehose Without Drowning
Trends in IT are opening up new frontiers in the enterprise, and that is forcing information architects to jettison old ways of thinking about the volume and speed of the data that's streaming into their organizations. It's not just how to store and study all that information, but how it ties into their security procedures, their mobile workforce, and how much to store in the cloud.
The arrival of big data means the end of the status quo for the enterprise.
It means more than just sorting the sheer volume and velocity of information now available to companies. To be able to derive the most value from big data, large enterprises now have to consider its impact on issues such as security, risk and governance. They have to weigh related factors such as cloud computing and mobile applications when they are expanding their IT processes.
In short, big data necessitates rethinking the standard methods of analyzing and using data. Companies may have to go back to the whiteboard to sketch out new models of data architecture. They also have to think about the data archives they're creating for those who come after them in the organization; will they be able to find value from what's left behind?
Those points and more were covered during a panel discussion about big data's impact on the enterprise at the recent Open Group Conference. Members of the panel included Robert Weisman, CEO and chief enterprise architect at Build The Vision; Andras Szakal, vice president and CTO of IBM's Federal Division; Jim Hietala, vice president for Security at The Open Group; and Chris Gerty, deputy program manager at the Open Innovation Program at NASA.
The discussion was led by Dana Gardner, principal analyst at Interarbor Solutions.
Download the podcast (45:30) or use the player:
Here are some excerpts:
Dana Gardner: You mentioned that big data to you is not a factor of the size, because NASA's dealing with so much. It's when you run out of steam, as it were, with the methodologies. Maybe you could explain more. When do you know that you've actually run out of steam with the methodologies?
Chris Gerty: When we collect data, we have some sort of goal in minds of what we might get out of it. When we put the pieces from the data together, it either maybe doesn't fit as well as you thought or you are successful and you continue to do the same thing, gathering archives of information.
At that point, where you realize there might even something else that you want to do with the data, different than what you planned originally, that's when we have to pivot a little bit and say, "Now I need to treat this as a living archive. It's an 'may live beyond me' type of thing." At that point, I think you treat it as setting up the infrastructure for being used later, whether it'd be by you or someone else. That's an important transition to make and might be what one could define as big data.
Gardner: Andras, does that square with where you are in your government interactions -- that data now becomes a different type of resource, and that you need to know when to do things differently?
Andras Szakal: The importance of data hasn't changed. The data itself, the veracity of the data, is still important. Transactional data will always need to exist. The difference is that you have certainly the three or four V's, depending on how you look at it, but the importance of data is in its veracity, and your ability to understand or to be able to use that data before the data's shelf life runs out.
Some data has a shelf life that's long lived. Other data has very little shelf life, and you would use different approaches to being able to utilize that information. It's ultimately not about the data itself, but it's about gaining deep insight into that data. So it's not storing data or manipulating data, but applying those analytical capabilities to data.
Gardner: Bob, we've seen the price points on storage go down so dramatically. We've seem people just decide to hold on to data that they wouldn't have before, simply because they can and they can afford to do so. That means we need to try to extract value and use that data. From the perspective of an enterprise architect, how are things different now, vis-a-vis this much larger set of data and variety of data, when it comes to planning and executing as architects?
Robert Weisman: One of the major issues is that normally organizations are holding two orders of magnitude more data then they need. It's an huge overhead, both in terms of the applications architecture that has a code basis, larger than it should be, and also from the technology architecture that is supporting a horrendous number of servers and a whole bunch of technology stuff that they don't need.
The issue for the architect is to figure out as what data is useful, institute a governance process, so that you can have data lifecycle management, have a proper disposition, focus the organization on information data and knowledge that is basically going to provide business value to the organization, and help them innovate and have a competitive advantage.
How Much Data to Keep
And in terms of government, just improve service delivery, because there's waste right now on information infrastructure, and we can't afford it anymore.
Gardner:So it's difficult to know what to keep and what not to keep. I've actually spoken to a few people lately who want to keep everything, just because they want to mine it, and they are willing to spend the money and effort to do that.
Jim, when people do get to this point of trying to decide what to keep, what not to keep, and how to architect properly for that, they also need to factor in security. It shouldn't become later in the process. It should come early. What are some of the precepts that you think are important in applying good security practices to big data?
Jim Hietala: One of the big challenges is that many of the big data platforms weren't built from the get-go with security in mind. So some of the controls that you've had available in your relational databases, for instance, you move over to the big data platforms and the access control authorizations and mechanisms are not there today.
Planning the architecture, looking at bringing in third-party controls to give you the security mechanisms that you are used to in your older platforms, is something that organizations are going to have to do. It's really an evolving and emerging thing at this point.
Gardner: There are a lot of unknown unknowns out there, as we discovered with our tweet chat last month. Some people think that the data is just data, and you apply the same security to it. Do you think that's the case with big data? Is it just another follow-through of what you always did with data in the first place?
Hietala: I would say yes, at a conceptual level, but it's like what we saw with virtualization. When there was a mad rush to virtualize everything, many of those traditional security controls didn't translate directly into the virtualized world. The same thing is true with big data.
When you're talking about those volumes of data, applying encryption, applying various security controls, you have to think about how those things are going to scale? That may require new solutions from new technologies and that sort of thing.
Gardner: Chris Gerty, when it comes to that governance, security, and access control, are there any lessons that you've learned that you are aware of in terms of the best of openness, but also with the ability to manage the spigot?
Gerty: Spigot is probably a dangerous term to use, because it implies that all data is treated the same. The sooner that you can tag the data as either sensitive or not, mostly coming from the person or team that's developed or originated the data, the better.
Kicking the Can
Once you have it on a hard drive, once you get crazy about storing everything, if you don't know where it came from, you're forced to put it into a secure environment. And that's just kicking the can down the road. It's really a disservice to people who might use the data in a useful way to address their problems.
We constantly have satellites that are made for one purpose. They send all the data down. It's controlled either for security or for intellectual property so someone can write a paper. Then, after the project doesn't get funded or it just comes to a nice graceful close, there is that extra step, which is almost a responsibility of the originators, to make it useful to the rest of the world.