Google Tapes Up Gmail Sprain
Google says it's turned to a tape backup system to restore information some Gmail users lost during a service meltdown last weekend. The problem, according to the company, was caused by a storage software update that introduced an unexpected bug. However, since restoring data from tape backups is a relatively slow process, it may be a while before affected users see their information returned.
Google says it has restored email access to some Gmail users who lost it over the weekend.
"We're still working fast and furious to restore account access," Google spokesperson Jessica Kositz told TechNewsWorld.
Google said 0.02 percent of Gmail users were impacted, but Kositz once again declined to state how many users its email service has.
She referred TechNewsWorld to a post Monday afternoon on the Gmail blog by Ben Treynor, Google's site reliability czar, stating that things should be back to normal for everyone soon.
Restoring the Lost Ones
Google is restoring the lost data from offline tape backups.
Email sent to the affected Gmail users between 6 p.m. PST Sunday and 2 p.m. PST Monday was probably not delivered to their mailboxes, and the senders would have received a notification that their messages weren't delivered, Treynor said.
A check on the Google Apps Status Dashboard at press time found that Gmail still showed a service disruption.
The Cause of the Outage
The Gmail outage was caused by a storage software update that introduced an unexpected bug, Treynor said. When Google discovered the update caused some Gmail users to lose access to their email, it stopped deploying it and reverted to the older version, Treynor added.
The tapes were protected from the software bug because they're offline, but restoring data from them takes longer than restoring data from a backup data center.
However, Google's explanation doesn't wash with Rob Enderle, principal analyst at the Enderle Group.
"A backup process should never delete what it's backing up," Enderle told TechNewsWorld. "That's a going-out-of-business problem."
Backup products are heavily tested to make sure they don't destroy what they're trying to save, so it could be that Google's update wasn't adequately tested, Enderle suggested.
Google replicates users' data simultaneously in two data centers so if one fails the other can take over immediately, according to Google's blog post on disaster recovery.
So, is data backed up to offline tapes later? Were the tapes offline when the bug was introduced? If they were, doesn't it mean that emails sent during the time the bug was active were never in the system and, therefore, not backed up?
Google isn't saying, but Kositz promised the company will disclose all when it has finished restoring the lost emails.
"We will release a full incident report within 48 hours of resolving the issue," Kositz said.
A tape drive provides sequential access storage, while a disk drive provides random access storage. Perhaps it's the need to access email sequentially that led Google to select tape drives.
Companies began moving to magnetic disk storage media back in the 1990s because tape had physical limitations and the prices of magnetic media fell dramatically, Enderle said.
"It was found that tapes were easily misplaced or damaged, and restoration was unreliable," Enderle stated.
Gartner has reportedly estimated that 10 to 50 percent of all tape restores fail.
For long-term storage, tape has historically been considered less expensive on a per-megabyte basis, but that depends on whom you're talking to.
"Companies like EMC argue that when you calculate the total cost of ownership for tape, which takes into account relatively low data rates and potentially lengthy recovery times, you'll find it's no longer cost-effective," Enderle pointed out.
That lengthy restoration process from tape is why it's taking Google so long to get all the affected Gmail users back up and running, Treynor stated on the Gmail blog.