Recovery point objective (RPO) metrics are commonly defined as how much data a system can afford to lose without endangering business processes. Let’s face facts: Some applications require zero-data-loss protection both within one site and between sites. These applications move millions of dollars or potentially company-crashing information in every byte they process, and so justify the complexity and cost of zero-byte RPO.
However, using the same type of protection solution to provide recovery for all of your corporate data intra- and inter-site can produce an unworkable solution that costs much more than the data is worth. Establishing realistic RPO numbers will allow you to plan for what the systems need without breaking the bank or the corporate data systems.
This article will help you to determine what’s available in the disaster recovery (DR) market in general, and how to determine which DR tools your systems will need.
The Where’s and How’s
First, are you planning on performing local or remote data protection and recovery, or both? Local solutions don’t have to worry about bandwidth and latency, and they have the added benefit of not requiring middleware and end-users to seek a new IP address for resources that have failed over. Remote recovery allows for applications to survive a site failure, but must play by much tighter bandwidth and latency restrictions than local networks provide.
Next, do your applications require zero-byte RPO metrics, or not? In the majority of cases, the answer is a solid “no.” Keep in mind that systems such as file servers can’t leverage this technology effectively to begin with. Microsoft Office and many other popular document applications will cache all changes locally on the user’s workstation until the document or file is saved, at which point they write the whole thing to disk. So even a zero-byte replication solution will not save any data loss that happens because the users haven’t saved the files yet. Add to this the fact that the write operations are done periodically, not in real time, and a real-time synchronous solution (see below) doesn’t have significant advantages over other solution sets.
Secondly, many applications do write business data over time, but most businesses can afford to lose even up to several minutes of that data without significant impact to the business as a whole. It is a tough decision to say that some applications will not recover every byte of data, but one that is well justified by the saving in infrastructure costs.
Synchronous or Asynchronous?
There are two types of continuous replication solutions widely available on the market today. They are synchronous data-replication systems (commonly part of disk-based tools) and asynchronous solutions (typically host-based/software based). Both have advantages and disadvantages, and both may have a place within your organization when you do not need true zero-byte RPO for everything.
Synchronous solutions ensure that every I/O (input/output) operation destined for the primary disk system is first written to the secondary. These two-phase-commit solution sets have the benefit of offering true zero-byte RPO for any system that can run on them. The drawback is that you must have extremely high bandwidth and extremely low latency between the two disk systems.
Locally, where LAN (local area network) speeds and fiber fabric networks are common, this isn’t an issue. Across a WAN (wide area network) connection, though, you could cause significant slowdowns in performance as the application is forced to wait for disk writes to the remote system before it can write to the local system. Also note that you typically need the same disk systems at both sides of the equation, which can significantly raise your budget numbers.
Asynchronous solutions are typically byte or block-level systems that intercept I/O operations in the file system of the operating system itself. They then allow the original I/O to commit to disk while they send a copy to the secondary system and make the write happen there. The benefits to this type of solution are that you can support them over much higher latency and lower bandwidth links than synchronous tools.
The drawback is that there could be seconds or minutes of RPO, just due to the nature of networking and latency. Modern systems will allow for re-transmission and buffering, eliminating worries about data consistency, but you may lose any data not replicated to the secondary system if the primary should fail. Conversely to synchronous systems, host-based solutions also tend to tolerate differences in hardware well.
Now to put it all together: If your systems both require zero-byte RPO due to business or logistic concerns, and the applications in question can make effective use of synchronous solutions, then you can provide real-time, zero-byte RTO by leveraging disk-based replication in synchronous mode.
If any other combination of constraints is in place (applications don’t take advantage, latency is too high, business case doesn’t justify infrastructure, etc.), then asynchronous solutions will permit effective protection with an acceptable RPO metric. You will probably find that a combination of both types of protection are worthwhile, and knowing which systems should leverage which types of protection then becomes vital to your budget process. One size does not fit all, so determine your RPO’s realistically to find the right fit for your organization.
Mike Talon is an enterprise systems engineer at Double-Take Software.