When Data Met Processing
There's a fundamental change going on in how we approach the data that's coming from IT systems in order to get a better monitoring and analysis capability. "The data has to be analyzed in real-time," said AccelOps' Mahesh Kumar. "By 'real-time,' I mean in streaming mode before the data hits the disk."
How are new data and analysis approaches significantly improving IT operations monitoring and providing stronger security?
AccelOps has developed technology that correlates events with relevant data across IT systems, so that operators can gain much better insights faster, and then learn as they go to better predict future problems before they emerge. That's because advances in big data analytics and complex events processing (CEP) can come together to provide deep and real-time, pattern-based insights into large-scale IT operations.
Here to explain how these new solutions can drive better IT monitoring and remediation response -- and keep those critical systems performing at their best -- is Mahesh Kumar, vice president of marketing at AccelOps. The discussion is moderated by Dana Gardner, principal analyst at Interarbor Solutions.
Listen to the podcast (33:48 minutes).
Here are some excerpts:
Dana Gardner: Is there a fundamental change in how we approach the data that's coming from IT systems in order to get a better monitoring and analysis capability?
MaheshKumar: The data has to be analyzed in real-time. By "real-time," I mean in streaming mode before the data hits the disk. You need to be able to analyze it and make decisions. That's actually a very efficient way of analyzing information. Because you avoid a lot of data sync issues and duplicate data, you can react immediately in real time to remediate systems or provide very early warnings in terms of what is going wrong.
The challenges in doing this streaming-mode analysis are scale and speed. The traditional approaches with pure relational databases alone are not equipped to analyze data in this manner. You need new thinking and new approaches to tackle this analysis problem.
Gardner: Also for issues of security, offenders are trying different types of attacks. So this needs to be in real-time as well?
Kumar: You might be familiar with advanced persistent threats (APTs). These are attacks where the attacker tries their best to be invisible. These are not the brute-force attacks that we have witnessed in the past. Attackers may hijack an account or gain access to a server, and then over time, stealthily, be able to collect or capture the information that they are after.
These kinds of threats cannot be effectively handled only by looking at data historically, because these are activities that are happening in real-time, and there are very, very weak signals that need to be interpreted, and there is a time element of what else is happening at that time. This too calls for streaming-mode analysis.
If you notice, for example, someone accessing a server, a database administrator accessing a server for which they have an admin account, it gives you a certain amount of feedback around that activity. But if on the other hand, you learn that a user is accessing a database server for which they don't have the right level of privileges, it may be a red flag.
You need to be able to connect this red flag that you identify in one instance with the same user trying to do other activity in different kinds of systems. And you need to do that over long periods of time in order to defend yourself against APTs.
Gardner: It's always been difficult to gain accurate analysis of large-scale IT operations, but it seems that this is getting more difficult. Why?
Kumar: If you look at trends, there are on average about 10 virtual machines (VMs) to a physical server. Predictions are that this is going to increase to about 50 to 1, maybe higher, with advances in hardware and virtualization technologies. The increase in density of VMs is a complicating factor for capacity planning, capacity management, performance management and security.
In a very short period of time, you have in effect seen a doubling of the size of the IT management problem. So there are a huge number of VMs to manage and that introduces complexity and a lot of data that is created.
Cloud computing is another big trend. All analyst research and customer feedback suggests that we're moving to a hybrid model, where you have some workloads on a public cloud, some in a private cloud, and some running in a traditional data center. For this, monitoring has to work in a distributed environment, across multiple controlling parties.
Last but certainly not the least, in a hybrid environment, there is absolutely no clear perimeter that you need to defend from a security perspective. Security has to be pervasive.
Given these new realities, it's no longer possible to separate performance monitoring aspects from security monitoring aspects, because of the distributed nature of the problem. ... So change is happening much more quickly and rapidly than ever before. At the very least, you need monitoring and management that can keep pace with today's rate of change.
The basic problem you need to address is one of analysis. Why is that? As we discussed earlier, the scale of systems is really high. The pace of change is very high. The sheer number of configurations that need to be managed is very large. So there's data explosion here.
Since you have a plethora of information coming at you, the challenge is no longer collection of that information. It's how you analyze that information in a holistic manner and provide consumable and actionable data to your business, so that you're able to actually then prevent problems in the future or respond to any issues in real-time or in near real-time.
You need to nail the real-time analytics problem and this has to be the centerpiece of any monitoring or management platform going forward.
Gardner: So we have the modern data center, we have issues of complexity and virtualization, we have scale, we have data as a deluge, and we need to do something fast in real-time and consistently to learn and relearn and derive correlations.
It turns out that there are some advances in IT over the past several years that have been applied to solve other problems that can be brought to bear here. You've looked at what's being done with big data and in-memory architectures, and you've also looked at some of the great work that's been done in services-oriented architecture (SOA) and CEP, and you've put these together in an interesting way.
Kumar: Clearly there is a big-data angle to this.
Doug Laney, a META and a Gartner analyst, probably put it best when he highlighted that big data is about volume, the velocity or the speed with which the data comes in and out, and the variety or the number of different data types and sources that are being indexed and managed.
For example, in an IT management paradigm, a single configuration setting can have a security implication, a performance implication, an availability implication, and even a capacity implication in some cases. Just a small change in data has multiple decision points that are affected by it. From our angle, all these different types of criteria affect the big data problem.
There are a couple of approaches. Some companies are doing some really interesting work around big-data analysis for IT operations.
They primarily focus on gathering the data, heavily indexing it, and making it available for search, thereby derive analytical results. It allows you to do forensic analysis that you were not easily able to with traditional monitoring systems.
The challenge with that approach is that it swings the pendulum all the way to the other end. Previously we had a very rigid, well-defined relational data-models or data structures, and the index and search approach is much more of a free form. So the pure index-and-search type of an approach is sort of the other end of the spectrum.
What you really need is something that incorporates the best of both worlds and puts that together, and I can explain to you how that can be accomplished with a more modern architecture. To start with, we can't do away with this whole concept of a model or a relationship diagram or entity relationship map. It's really critical for us to maintain that.
I'll give you an example. When you say that a server is part of a network segment, and a server is connected to a switch in a particular way, it conveys certain meaning. And because of that meaning, you can now automatically apply policies, rules, patterns and automatically exploit the meaning that you capture purely from that relationship. You can automate a lot of things just by knowing that.
If you stick to a pure index-and-search approach, you basically zero out a lot of this meaning and you lose information in the process. Then it's the operators who have to handcraft these queries to have to then reestablish this meaning that's already out there. That can get very, very expensive pretty quickly.
Our approach to this big-data analytics problem is to take a hybrid approach. You need a flexible and extensible model that you start with as a foundation, that allows you to then apply meaning on top of that model to all the extended data that you capture and that can be kept in flat files and searched and indexed. You need that hybrid approach in order to get a handle on this problem.