Welcome | Sign In
TechNewsWorld.com
Data Center

PODCAST
Fixing IT Problems Before They Occur

Print Version
E-Mail Article
Reprints
Fixing IT Problems Before They Occur

IT executives are seeking more automated approaches to not only remediate problems, but also to get earlier detection. These same operators don't want to replace their systems management investments, they want to better use them in a cohesive manner to learn more from them, and to better extract the information that these systems emit.


Crystal Reports - Discover the Latest Innovations.
Download a free trial, view real-time 'behind the scenes' functionality, and learn about new Crystal Reports Server trade in options! Learn more.

Complexity in today's IT systems makes previous error prevention approaches for operators inefficient and costly. IT staffs are expensive to retain, and are increasingly hard to find. There is also insufficient information about what's going on in the context of an entire systems setup.

Operators are using manual processes -- in reactive firefighting mode -- to maintain critical service levels. It simply takes too long to interpret and resolve IT failures and glitches. We now see 70-plus percent of the IT operations budget spent on labor costs.

IT executives are therefore seeking more automated approaches to not only remediate problems, but also to get earlier detection. These same operators don't want to replace their systems management investments, they want to better use them in a cohesive manner to learn more from them, and to better extract the information that these systems emit.

To help better understand the new solutions and approaches to detection and remediation of IT operations issues, I recently chatted with Steve Henning, the vice president of products for Integrien, in a sponsored BriefingsDirect podcast.


Listen to the podcast (29:53 minutes).

Responding Before the Problem

Here are some excerpts:

Steve Henning: IT operations is being told to either keep their budgets static or to reduce them. Traditionally, the way that the vice president of IT operations has been able to keep problems from occurring in these environments has been by throwing more people at it.

This is just not scalable. There is no way ... [to] possibly hire the people to support that. Even with the budget, he couldn't find the people today.

If you look at most IT environments today, the IT people will tell you that three or four minutes before a problem occurs, they will start to understand that little pattern of events that lead to the problem.

But most of the people that I speak to tell me that's too late. By the time they identify the pattern that repeats and leads to a particular problem -- for example, a slowdown of a particular critical transaction -- it's too late. Either the system goes down or the slowdown is such that they are losing business.

Complexity Equals Challenge

Service oriented architecture (SOA) and virtualization increase the management problem by at least a factor of three. So you can see that this is a more complex and challenging environment to manage.

So it's a very troubling environment these days. It's really what's pushing people toward looking at different approaches, of taking more of a probabilistic look, measuring variables, and looking at probable outcomes -- rather than trying to do things in a deterministic way, measuring every possible variable, looking at it as quickly as possible, and hoping that problems just don't slip by.

If you look at the applications that are being delivered today, monitoring everything from a silo standpoint and hoping to be able to solve problems in that environment is absolutely impossible. There has to be some way for all of the data to be analyzed in a holistic fashion, understanding the normal behaviors of each of the metrics that are being collected by these monitoring systems. Once you have that normal behavior, you're alerting only to abnormal behaviors that are the real precursors to problems.

One of the alternatives is separating the wheat from the chaff and learning the normal behavior of the system. If you look at Integrien Alive, we use sophisticated, dynamic thresholding algorithms. We have multiple algorithms looking at the data to determine that normal behavior and then alerting only to abnormal precursors of problems.

Nip It in the Bud

Once you've learned the normal behavior of the system, these abnormal behaviors far downstream of where the problem actually occurs are the earliest precursors to these problems. We can pick up that these problems are going to occur, sometimes an hour before the problem actually happens.

The ability to get predictive alerts ... that's kind of the nirvana of IT operations. Once you've captured models of the recurring problems in the IT environment, a product like Integrien Alive can see the incoming stream of real-time data and compare that against the models in the library.

If it sees a match with a high enough probability it can let you know ahead of time, up to an hour ahead of time, that you are going to have a particular problem that has previously occurred. You can also record exactly what you did to solve the problem, and how you have diagnosed it, so that you can solve it.

We're actually enhancing the expertise of these folks. You're always going to need experts in there. You're always going to need the folks who have the tribal knowledge of the application. What we are doing, though, is enabling them to do their job better with earlier understanding of where the problems are occurring by adding and solving this massive data correlation issue when a problem occurs.


Dana Gardner is president and principal analyst at Interarbor Solutions, which tracks trends, delivers forecasts and interprets the competitive landscape of enterprise applications and software infrastructure markets for clients. He also produces BriefingsDirect sponsored podcasts. Disclosure: Integrien sponsored this podcast.


Print Version E-Mail Article Reprints More by Dana Gardner


More by Dana Gardner

Pumping Up Performance in Densely Packed Data Centers
November 08, 2009
Thanks to architectural advancements and better efficiencies, densely stuffed data centers can carry ever-greater loads, and that can certainly work to consolidate and ultimately reduce costs. However, having fewer data centers means all the information they handle will likely have to travel longer distances between server and user. Network services and Internet performance management may be the solution.
Where SOA Meets the Cloud
November 01, 2009
Peel away the hype, and SOA is really about breaking down your architecture into a primitive state of components, according to author Dave Linthicum. Rather than being an SOA replacement, cloud computing is basically architectural options or ways in which you can host your services.
Building ERP at the Speed of Web
October 25, 2009
If you want an example of a company making the most out of the advantages of cloud computing, take a look at SaaS-based enterprise solutions provider Workday. CoCEO Aneel Bhusri sees the shift to the cloud as an even bigger change than the transition from mainframe to client-server. "We are obviously leveraging a very different technology base," he said.
Don't miss a story -- sign up for our FREE e-mail newsletters and view the latest headlines at a glance.
Tech News Flash [ View Sample ]
E-Commerce Minute [ View Sample ]
ECT News Network Weekly Newsletter [ View Sample ]
Shortcuts
ECT News Network Information
Reader Services
Corporate
ECT News Network