If you are running serious business applications on the Web, downtime or Web site slowness equals lost dollars. This is true either directly, by missed sales if you are running an e-commerce store, or indirectly, through lost customer trust if you are providing a Web-based solution to clients.
When a Web site becomes an important business tool, companies must tackle the issues of uptime and performance. The more important a Web site becomes to your operation, the less bad performance, or worse, downtime, you can afford.
This explains the popularity of high availability hosting solutions. High availability hosting solutions provide server redundancy, both to prevent against downtime in case one server crashes and to make for faster Web page load times when all of the servers are working properly.
These solutions come in a few different shapes and sizes. Some of the most popular are Round Robin DNS and Load Balancing on the Web and Application server levels, and Replication, Log Shipping and Database Clustering on the database level.
Which, if any of these, is the “right” solution for your business? That depends specifically on what you are trying to accomplish, and what’s at stake.
Round Robin DNS
People looking to distribute load between two servers on the cheap might implement a Round Robin DNS solution. In Round Robin DNS, traffic is randomly distributed between two or more servers using DNS. This effectively splits up a Web site’s load, but the problem is, if one of the servers goes down, half of your traffic will see the site as down. There’s no intelligence behind the load distribution.
Additional problems may result if you are running a shopping cart or any other session-based function on your site: If the network where your visitor is originating from is not caching DNS, you could find that they get shifted between servers during a single session.
While this won’t happen in most cases, as DNS caching is the norm, it would make filling out any multiple part forms, such as online order forms, impossible for those visitors. This solution is not recommended for most business uses because it could cause more trouble than it saves you from.
Unlike Round Robin DNS, which randomly sends visitors to two different servers, Load Balancing utilizes either a hardware switch or software to intelligently direct traffic between two or more servers. Traffic can be routed between servers based on which server has a lighter load. More importantly, if one server goes down, a load balancing solution can route all traffic to the servers that are functioning properly.
Software solutions such as Macromedia’s ColdFusion ClusterCATS and the Windows 2003 Enterprise operating system offer load balancing, but in my experience a hardware solution, such as a Coyote Point Equalizer load balancer series, operates more efficiently. This is because the Load Balancing process runs on a separate device from the servers, and thus does not generate additional overhead on the servers. This is a good idea since the servers are probably pretty taxed if a Load Balancing solution is under consideration in the first place.
Alternative Web & Application Scaling Options
Another option should be considered as an alternative to the two redundancy solutions described above. If distributing load is the goal rather than intelligent server failover, then splitting pieces of a Web site onto different servers can be very effective.
Take for example the instance of a business that is processing financial data online. They may have two sets of users: one set that inputs market data, and another set that runs reports on market data. By splitting the functions that allow the inputting of market data onto one server, and the functions that control report generation onto another server, their load can effectively be split without having to invest in a Load Balancing solution. Another nice effect of this solution is that if one of the two servers crash, only one of the user groups will be impacted.
If one wanted as well, one could keep the code for both applications on each server, so that, in the case of a long server outage, one could activate the currently down application on the other application’s server, and run both applications off one server until the downed server gets back online.
The appeal of this method is that it distributes risk. If one part of your site is down, others parts are still available for your customers to use. It’s not right for everyone, but it can be a good intermediate step for many businesses, and with additional growth one can always scale the individual separated process to the Load Balancing framework as needed in the future.
Setting up high availability solutions for databases can be a bit more complex than doing so for Web and application servers. Database replication used effectively can be an excellent, low-cost redundancy solution, whether you are working with MySQL or Microsoft SQL databases. Database Replication copies data from one database (the master) to another (the slave) virtually in real-time.
If your primary database server crashes, you’ll have a copy of your data on another server. The key thing to understand is that replication will not provide for automatic failover. Until you tell your Web and application servers to look at the secondary database server instead of the primary database server, your database will be unavailable. However, making this change is pretty easy to do, and you can have your database back online in short order.
While Database Replication is naturally only a redundancy solution, one can fashion it into a high availability solution to provide faster performance for high-load sites. This can be done by having all database writes go to a master and all database reads come off of slave database servers. If properly set-up, this will effectively split the database load, and should the master database server go down, one of the slaves can assume its place with a few configuration changes.
Transaction Log Shipping
In the Microsoft SQL world, Transaction Log Shipping is an alternative to replication. The main reason why one would use Transaction Log Shipping is because it puts the load of running the data transfer process on the server receiving the data, rather than the server sending it.
Since the server sending the data is already under load as a live server that affects the performance that your customers feel, moving the process to the secondary server is beneficial (though the benefit may only be tangible under extreme high load conditions).
The drawback of Transaction Log Shipping is that it only runs the data transfer process every 15 minutes, so a business that uses this method risks up to a 15-minute loss of data.
Database Clusters, such as the Microsoft SQL Cluster model, offer failover solutions for databases that can run automatically, as opposed to the replication model described above that required manual intervention. Two or more database servers share one common storage source, and should the primary database server fail, the secondary one will take its place in a heartbeat.
The advantage to a system working this way is obvious: no lost data if the primary database server crashes.
There two significant drawbacks, however. One is that performance can suffer because the clustered servers cannot utilize processor cache the way a single server can. The primary server in the cluster must constantly be writing all data to disk in case it fails and the secondary server needs to take over for it. The other drawback is that further scaling the solution to handle more load will incur exponential rather than linear costs.
In my experience, Database Clusters are an excellent solution for a low-volume but mission-critical setups that require absolutely no downtime, because the performance difference will not be noticeable in such a scenario. For businesses that require fast performance and redundancy, a better place to look would be at the replication model described above that involves splitting database reads between multiple slaves and database writes to a master.
Problems caused by growth are generally considered good problems to have. While this is true, snap decisions in crisis can lead to greater expenses and more problems down the road. Many companies will react under pressure of potentially lost revenue and sign up for or internally implement a hosting solution that can handle the load, whether or not that solution is right for their business in the long run.
Even if it functions properly, a solution can still be wrong for a business if it requires an infrastructure growth pattern that will incur a disproportionate amount of increased costs as their Web site is more and more successful. If proper forethought is given to growth before a snap decision is necessary, businesses save money and prevent headaches for their customers and for their employees who will be tasked to solve the problem.
The right growth plan for an individual business is not something that can be genericized in an article like this. It is something that should be developed in consultation with an expert on hosting infrastructure, paying particular attention to the needs and goals of the business in question.
Chris Kivlehan is marketing manager for INetU Managed Hosting, an award-winning Web hosting provider that specializes in managed dedicated hosting for businesses nationwide.