Data migrations these days have become a necessary evil in every IT environment. The rapid rate at which hardware and software becomes outdated, coupled with a need to save costs by taking old assets off the books as soon as possible, means that data migrations are something no one can avoid.
Vendors have tried to provide tools to make this task as seamless as possible. However, at the end of day, the data owner is the final authority on the migration and its end result. No matter what kind of tool is used, it behooves the data owner to ensure that data integrity and security are maintained during and after the migration.
First, an understanding of three terms is essential:
Data migration: This is the act of moving data from one location to another. This could be as simple as moving a set of files from one drive to another on the same computer or to move several terabytes of data from one data center to another. The word “data” here does not have any boundaries and the term “migration” literally means a one-time movement of any data to change its permanent resting location.
Data integrity: This is to ensure that the structure of the data is consistent in the manner it needs to be maintained and accessed. Generally speaking, data integrity cannot be considered in a vacuum, as it is intimately tied to the manner it is accessed and the layer at which the intelligence for this access resides. When data integrity is compromised, it is called “data corruption.” For example, from a filesystem perspective, data integrity may be intact — i.e., there are no file access errors — but the application accessing these files, such as a database, may think these files contain corrupt data.
Data security: Every “chunk” of data has security attributes associated with it. The layer at which this data is accessed and the manner in which it is accessed determines the type of attributes that are applied to that layer. Moreover data security itself is like a layered cake — there is security at every access tier, and each layer is important. For example, when data is accessed via a shared SAN, it is important to ensure that host access security (via zoning and LUN masking) is maintained. However, that does not mean that data compromises will not occur at the higher levels — such as filesystems, file attributes, etc. Then there is user-level security, and there’s network-level security — so on and so forth.
How They Relate
How do data migration, data integrity and data security impact each other? In a generic sense, a data migration involves a process of moving data from one location to another. This generally happens via some kind of an “engine” that reads data from the source, performs an internal mapping of this data and then writes it to the target. This engine can be of any form and can reside anywhere in the access stack.
For example, it could be software running on a host, in a dedicated appliance, embedded in the network or in the storage array, or simply a copy/paste tool that a user controls with a mouse. Similarly, the source and target locations for this data can be local — i.e., the same server or array — or geographically distant. The internal mapping in the engine has the intelligence to ensure that the translation or copy of this data (blocks, files, etc.) maintains the attributes of the data at the level where it’s read. As mentioned above, these attributes — when considered holistically — ensure the security and integrity of this data.
However, the big “if” in this situation surrounds the intelligence in the engine that is so critical to the migration. In most cases, a tried-and-tested engine will function as promised. However, that does not mean it should be used in a data migration without proper testing. Therefore, practically all data migration projects need to include a data validation phase, when various teams check to see if these attributes are the same between the source and target.
Tools are available that can probe data at different levels and provide a report on any missing or corrupted files. Similarly, there are tools that can verify whether all the security attributes are intact on the target location when the data copy is complete.
Application-Based or Agnostic
In most modern day data migrations, data is either migrated from within the application itself or in a manner that is agnostic (and transparent) to the migration. The benefit of the former — i.e., application based — is that the application itself ensures that data security and integrity attributes within the application itself are maintained during the migration. For example, Oracle DataGuard is an application level utility that can be used for database migrations.
The benefit of the latter is similar. Since the migration occurs at a lower level in the IO stack, all data is treated the same way, regardless of its type, thereby ensuring that all attributes are carried over as is. These types of migrations — in which the migration method is agnostic to the type of data and the manner in which it is accessed, and the migration occurs at a lower layer in the IO stack — are known as “block level migrations.”
Migrating data at the file level or changing the manner in which this data is accessed can present its own sets of challenges. For example, when data is copied between different vendors’ network attached storage arrays, preserving these attributes during and after the migration can be a nightmare. This is mostly because of interoperability issues.
The same can hold true if data is copied from a Unix server to a Windows server, or the access mechanism is changed from an NFS (Network File System) to a CIFS (Common Internet File System). Of course, there are tools available that can make the migration easier or minimize issues with integrity and security, but they are not perfect. These migrations, therefore, tend to take a long time.
Good Backup Is Critical
In other types of migrations, the data is actually moved instead of copied. In other words, there is a point after which the migration cannot be cancelled or reverted. These migrations often require an intermediate go-no-go checkpoint, during which a preliminary data validation is performed. If everything checks out okay — only then does the process of moving the data begin. If something goes wrong after the checkpoint, the only recourse is to restore from backups.
The most important point of a data migration is a good backup. A good backup is critical in a migration not only in the event of data corruption, but also to allow data validation to occur post migration. For example, if after migrating data a problem with user-level permissions is discovered, it can be compared with the attributes of backed-up data and fixed on a case-by-case basis. Most backup software does a good job of backing up all standard data attributes for security and data integrity.
Data migrations are complex projects. Maintaining the validity of data is one of the most important but unwritten assumptions of a migration. No one really talks about it, but everyone always assumes that the data will maintain all of its properties post migration. No one likes to be told that something happened during the migration and has resulted in this rule being violated. That’s called a failed data migration.
Ashish Nadkarni is a principal consultant at GlassHouse Technologies.