Does Your IaaS Environment Have Sleeper Cells?
IaaS gives you plenty of rope. It's up to you not to hang yourself. For example consider how IaaS allows you to rapidly create and deploy new virtual machines within the production environment. Without proper care and feeding, this can quickly result in VM sprawl. If a VM remains dormant for a long time and sits out many rounds of updates and patches, what happens when it finally reawakens?
As many active users of IaaS (Infrastructure as a Service) can tell you, IaaS, whether implemented by an external service provider or provided by an internal service provider team, arguably grants you much more control of the underlying technology "substrate" than other cloud deployment models. In some cases, this is a good thing; for example, when you have unique legacy constraints or technology requirements that must be satisfied for applications to work properly.
This control over the environment is generally perceived by customers as a security and operational benefit. However, there are situations in which this added control over the operation of the platform can have a downside as well. In other words, there are times when additional levels of operational control and transparency can actually increase the likelihood of security dangers. In addition, they can actually bring about usage challenges.
How can that happen, you ask? Because each degree of added control that the organization has over the environment represents a decision point -- a point at which the organization can choose to make a decision that will bolster security or one that will detract from it.
To see this in action, consider how IaaS allows you to rapidly create and deploy new virtual machines within the production environment. The upside of this feature is that organizations can rapidly create and deploy new virtual images. The downside is that it allows organizations to choose uncontrolled, disorganized "sprawl" instead of controlled, managed and disciplined growth.
Alternatively, consider how IaaS allows organizations to quickly move virtual images from hypervisor to hypervisor within the environment. This allows organizations to do some pretty cool things, like transfer a live, running machine image without causing downtime. However, it also allows organizations to "jumble up" their environment and toss together systems of high and low sensitivity or containing a mix of regulated and public data into one unorganized heap. The exact same set of parameters and attributes represent both a feature as well as a challenge.
The short version is: IaaS gives you enough rope; it's up to you not to hang yourself.
Sleeper Cells in Your IaaS
One particularly challenging area in which IaaS gives organizations the opportunity to cause inadvertent damage to themselves is in the area of "dormant" virtual machine images.
Specifically, these are images that are created at some point in time, used for a while, and then "spun down" and left to stagnate. Once they get created in the environment, they may persist there indefinitely unless the organization has some specific mechanism to identify them, tag them and make them go away.
What's the problem with these dormant images? The problem is that they have exactly the same characteristics as active machines: potentially containing sensitive data, potentially holding service accounts (with or without administrative privileges), potentially with access into back-end systems, or maybe just located on a point in the network where network filtering controls are less restrictive. The point is the image is to some degree "trusted." But instead of being kept in an active state where security hygiene activities like patching and antivirus are kept current, they're "frozen in amber." They exist in a kind of suspended animation -- and while they do, the overall effectiveness of the security controls within that image degrades steadily. After six months, malware signatures and patches collect. After a year, maybe software versions of AV and scanning tools go by. Years down the road, maybe the operating system itself is no longer supported.
To illustrate this through a (granted, totally overblown) example, consider what would happen if a Windows NT 4.0 machine -- patched at 1998 levels and running an anti-malware scanner from that same era -- was suddenly "unfrozen" and dropped into the middle of your virtual infrastructure. How long do you think it'd be able to withstand attack? Oh, and by the way, that system has direct administrative-level access into back-end data stores, it contains your customers' credit card data and PII, and it's in a network zone where it has unfiltered access to the rest of the computing environment.
So Check Already
You can see where I'm going with this.
The point is, these dormant images represent sleeper cells in your environment. Best case, they get unfrozen and examined during an audit situation. If that happens, you may fail the audit or get dinged for failing to maintain appropriate security controls (for example, due to failure to patch, failure to keep AV software up to date, etc.) Worst case, these problematic images stay there and don't get caught. In that case, they represent an "unrealized potential" for negative security impact to your environment that compounds over time the longer they sit there.
The point of all this is that any organization that makes use of IaaS should have some sort of plan in effect to specifically address this problem just the same way that they have plans in place to address issues like sprawl, network monitoring, patching, movement and inventory of VMs, etc.
Specific planning options can vary. For example, if yours is the type of shop that has good discipline (but isn't strong in scripting or programming), a strategy might be to do a periodic manual validation of all images in the environment that have been dormant over a certain age. If yours is the type of shop that is stronger in automation but light on manpower, another strategy might be an automated one: You could, for example, write custom scripts to periodically activate dormant images so that security maintenance tasks like patching and virus scanning can happen.
The point isn't what you do -- it's that you do something. Put in some mechanism that makes sense in your culture to address this issue. Preventing this problem isn't rocket science, which is why it's a shame that it occurs so often in the trenches and gives so many cloud customers heartburn.