A glitch in VMware’s most recent update had customers scrambling this week. A problem caused by a bug from the beta version of the software that engineers failed to remove or deactivate left VMware users unable to power on virtual machines running the hypervisor software.
The bug, also known as a “time bomb,” is code that developers insert in beta software to push users to upgrade to an application’s final version. It’s a commonly used tool for developers; however, it must be removed from anything into which it has been inserted prior to final release.
The virtualization software maker quickly responded, releasing an “express patch” Wednesday. However, the incident has shaken some VMware customers and given the company a black eye.
The view for VMware after the fiasco is “definitely not very good,” according to Gary Chen, a Yankee Group analyst.
“This is the most publicized issue they’ve had in their history, and it’s really the sort of [an] embarrassing bug that never should have made it past QA (quality assurance),” he told TechNewsWorld.
All About QA
“Last night, we became aware of a code issue with the recently released update to ESX 3.5 and ESXi 3.5 (Update 2),” wrote Paul Maritz, VMware’s recently appointed chief executive officer, in a letter posted on the company’s blog.
When the time clock in a server running the updated ESX 3.5 or ESXi 3.5 software registered 12:00 a.m. on August 12, 2008, the code caused the product license to expire, according to Maritz. As a result, powered-off virtual machines could not be turned on; those that had been suspended could not be awakened from that mode; and machines could not migrate using Vmotion.
The problem has also occurred with a recent patch to ESX 3.5 or ESXi 3.5 Update 2. The company has begun a review of its QA processes, Maritz said.
However, it is to VMware’s credit that it took less than 24 hours to come up with a patch that seems to have corrected the problem, said Chen.
“From what I’ve heard, the patch fixes the problem. You do have to give kudos to VMware for addressing the issue so quickly,” he noted.
Some users have turned to VMware’s Communities discussion pages to vent. “As a VMware Enterprise Partner and VMware Authorized Consultant, I can tell you this IS a big deal for VMware to release a product that has such grave consequences for even a relatively small portion of the total VMware user population,” wrote a user under the handle “wwcusa.” “A small percentage does not diminish the severity of problem for affected users and the upmost urgency is expected from a company that caters to enterprise customers who don’t have ‘downtime’ in their corporate dictionary anymore.
“Bugs happen,” the poster continued. “However, I believe this could have been prevented by not rushing an update to market which was intended to be free and compete with [Microsoft’s] Hyper V. This will no doubt teach VMware a lesson and unfortunately will cast doubt about the reliability of VMware in the enterprise. It’s a shame a clearly superior product is going to get bad publicity from this oversight. Let’s give them credit and hope they learn from their mistakes.”
Effect on Virtualization
Customers generally feel that VMware has responded to the issue as best it could, Chen pointed out. “The issue was fixed quickly, and there was lots of communication as to the status, cause and future changes to prevent another incident,” he said.
“However, some faith has been lost, as most customers I’ve talked to are disappointed that a bug like this made it past QA. Many admins have been pushing virtualization to their executives, and this doesn’t help their case,” Chen added.
“Virtualization is still in the emerging stages, and enterprise reliability is a huge issue that can only be proven over time,” said Chen. “Vendors have been pushing the idea that it is enterprise-ready, and an incident like this hurts not only VMware but the entire virtualization movement. Virtualization is inevitable and will certainly continue to proceed, but people will slow down and think more about how to protect themselves against things like this.”
While VMware engineers immediately quashed any speculation that the glitch was security-related, Chen said that “it does raise the question of reliance on virtualization.
“More and more people are using it, and a major incident, whether a bug or a security hack, could freeze your entire infrastructure. I think people will begin to reevaluate their options and contingency plans for an incident like this, including perhaps diversifying their infrastructure and adopting multiple hypervisors,” Chen concluded.