Most of you reading this have spent a significant amount of time building your brand equity through offline channels. Manufacturing goods that appeal to the target audience, marketing through catalogues, ongoing promotions and offers — all contribute towards creating a strong brand. Moreover, significant due diligence has been conducted to ensure that the offline support staff — be it store staff or customer service representatives — has been made available to process all end-user transactions — purchases, customer queries/complaints etc. — smoothly and within a reasonable turnaround time.
But what about your Web site, the online revenue channel?
Your end-user facing Web applications represent your store with no bounds! Highly unpredictable traffic can emerge on the site, leading to abandoned shopping carts, Web application issues during checkout and user confusion and frustration.
All of these problems can severely dilute the brand equity you’ve worked so hard to build over the years. Worst of all, customers who face such performance issues with your Web channel may just turn to your competitors, and you may not even find out about such performance issues before a substantial chunk of customers have already been lost!
Did you know that even today, a typical Web application experiences 170 hours of downtime every year, according to Gartner Group. Furthermore, it takes 25 hours on average to resolve such Web application issues, with an average of $18,500 lost per hour of downtime, costing up to $160,000 per hour for large retail eBusinesses.
Avoiding the Pitfalls
As your Web applications grow more complex to support a diverse customer base, how do you ensure that your revenue generating transactions — the business throughput of your Web infrastructure — are completed successfully on a consistent basis?
The important word here is consistent. When problems do occur, how do you proactively identify them and streamline the resolution process to mitigate the risk of customer dissatisfaction — all while leveraging your existing Web investments! Here are a few common pitfalls.
Load testing the site prepares the site for “primetime,” but does not eliminate the need for ongoing performance monitoring. Just as you can’t crash test a car and never have to worry about monitoring the ongoing health of the vehicle, you can’t just load test a Web site and not check the ongoing user-experience.
Moreover, load testing assumes a certain traffic profile (volume/mix) which could be dramatically different in real life.
As traffic patterns on the Web channel vary from the pretested patterns, it could affect the Web site performance dramatically. Consequently, it is important to check the Web application health realistically on an ongoing basis. This should be done measuring its availability and the responsiveness for critical Web transactions.
Your site can only work as fast as the most limiting bottleneck. A city might have eight lanes on the freeway, but even if a small stretch of the highway is under unscheduled maintenance with four closed lanes, it could lead to miles of traffic jam very quickly. What if one of two J2EE application servers processing all checkout transactions is down for maintenance, while hundreds of Web servers are operating at less than 10 percent capacity?
Moreover, there is one constant you can count on for better or worse: that is change! What if you launched a promotional campaign for a product, and the checkout transaction volume for that specific product goes through the roof? In this scenario, the application or database instance handling the specific transaction will become the new bottleneck and will still slow-down the transaction, even though the rest of the infrastructure is under 10 percent capacity.
How do you minimize the risk of volatile traffic patterns affecting your end-user experience? How can you make the most of your Web investments and use existing spare capacity for revenue growth?
Monitoring and Enhancing
Enhancing the end-user experience at your online channels and minimizing degradation events starts by monitoring just that — the end-user experience — on a regular basis. You can’t manage what you can’t measure!
In the offline world, you might conduct a dry run before launching a new product or before every major event to be ready for expected traffic. Your online channel is comparable to a store, only there are no geographic boundaries! Traffic patterns change all the time, within seconds in many cases, requiring an “ongoing” dry run to ensure that the user-experience is not at stake.
Rather than waiting for users to complain about degrading site responsiveness, proactively test with “litmus-test” users who follow the same steps — search, purchase, etc. — as actual users to exercise the end-to-end application regularly.
By tracking the user experience proactively from various end-user geographies, your IT support team can then baseline the constituent elements — connection time, ad download time, redirection to ad servers and so forth — and help you define the end-user experience standards.
Furthermore, by tracking similar transactions on your competitor sites, and within the context of industry averages, you can determine realistic end-user performance service-level agreements (SLAs) that drive IT management. Using such end-user SLAs for your Web channel performance can then be used as an effective competitive weapon to further enhance your offline brand.
Nothing is 100 percent perfect. At times, there will still be long lines at your brick and mortar stores no matter how well you’ve planned the store staffing.
For your online channel, as well, volatile traffic patterns, frequently changing content, and IT maintenance events may lead to performance issues at times. Under these circumstances, how do you establish a repeatable, streamlined process to restore performance to acceptable levels, all within a reasonable timeline — as defined by an SLA with your internal IT support staff or external provider. How do you use such SLAs as a bridge to collaborate with your IT providers and enhance responsiveness to reported issues?
Getting Perspective on Problems
By monitoring the user experience from various representative geographies, you can already get a headstart on the resolution process. You can prioritize among events that affect end-user experience as opposed to resource-related issues that don’t impact end-users.
Using an offline-store analogy, a dysfunctional cash register might be of concern, but if all buyers in a queue can be helped within a five-minute window anyway, that may not be important. Similarly for your Web channel, Web servers operating at 80 percent CPU but not impacting a five-second SLA for end-user experience may be considered less important than a database instance that seems to be using only 10 percent capacity, but is slowing down the purchase process dramatically.
Ultimately, it is the number of satisfied users — the business throughput of your Web infrastructure — that should drive most resolution activity.
Once the event affecting end-users is prioritized, breaking down the end-user transaction into the constituent elements will help triage the problem area.
For instance, if the initial connect time for page 1 of a five-page purchase transaction is above a four-week dynamic baseline, then the ISP network most likely is at fault. However, if content download time on the product browse page is high, then you could drill-down to the individual object on the page to determine if a static bitmap image or a dynamic table from the database back-end is causing the slowdown.
Once you triage the problem, then correlating to the resource usage or relevant application metrics will streamline the analysis process. For example, if Web server number 2 has high network usage when a very large image served up by the Web server pool is high, or the database connection pool has very limited connections available while most dynamic elements on the product browse page take longer than usual, you may be close to isolating the root cause.
At times, historical data may not yield enough information to conduct decisive analysis. In such scenarios, creating troubleshooting transactions and variations thereof to exercise the Web element at fault will help decisively pinpoint the root cause.
Following the resolution process, the user-experience with the improved application element should be validated to ensure close-loop analysis. No matter what approach your IT team follows, starting with the end-user experience and breaking it down into granular elements will certainly streamline the resolution process.
All the above resolution steps are possible with several tools today, but doing so while leveraging existing assets is not all that common. Many performance management solutions entail several components that overlap with existing tools but need to be replaced because of inconsistency with the new management solution.
More than the tools, the best practices and methodologies built over the years are rendered of little value with the new solution. When you order an appetizer that doesn’t go with your five-course meal, you don’t throw away the meal — you order an appetizer to go with it! As trivial as that sounds, it’s an analogy that can be applied to many IT environments today.
There are some solutions available in the market that focus primarily on the user-experience monitoring, performance triage and correlation, while leveraging your existing resource and network monitoring tools for their core competencies. Selecting such focused performance monitoring solutions will enable your IT organization to build a comprehensive performance platform to deliver a quality user experience and enhance your brand, while maximizing returns on your IT investments to date.
Many organizations that engaged in e-commerce face the classic challenge of retaining and further enhancing their offline brand through their online channels. While online channels represent a significant revenue opportunity, they also entail a new set of performance management issues, which, if not managed effectively, could compromise your customer satisfaction and brand severely.
The typical load-testing of an application before a major launch and over-provisioning of infrastructure prepares you for prime-time, but is incapable of handling the inherent unpredictability of visitors that characterizes the Web channel.
By monitoring typical user-interactions from global locations and comparing them to industry indices and competitive benchmarks, you can define end-user service level agreements (SLA) to drive your IT organization. Furthermore, by breaking down user-performance metrics into the constituent elements and correlating them with your existing IT tools — resource and network monitoring tools — your IT organization can enhance their responsiveness to performance issues.
Complementing existing tools with such user-performance monitoring solutions enables you to build an effective IT process that keeps transactions flowing smoothly and generating new revenue with minimal hiccups, enhancing your online brand and making the most of IT investments to date.
Dharmesh Thakker is the senior product manager for application performance management solutions at Keynote Systems.