System administrators are well aware of the pain associated with maintenance windows. This effort is usually taking place off-hours and requires challenging coordination between different activities and often different departmental personnel.
Although almost every system administrator will agree to pass on the project, maintenance windows are very important to obtain an updated infrastructure and to mitigate risks of an unplanned outage.
What Worries System Administrators?
While completing the task is essential to mitigating future risk, system administrators must be aware of any factors that may present a problem during the maintenance window. Several issues consistently lead to failed projects:
- Lack of experience handling unexpected behavior at a very late hour (without sleep);
- The work may consume more time than planned, as a result of minor steps that are time consuming and not taken into consideration; and
- Miscommunication with the business and customers.
Because of these issues, some companies prefer to minimize their risk and avoid maintenance windows as much as possible. However, in doing so, many companies open themselves up to additional risks and larger problems down the road.
Greater Risks Exist
It’s often said that the No. 1 cause of an unplanned outage is an environment that is not up-to-date. I tend to agree with that observation. I also believe that maintenance windows do not have to be a painful process, especially nowadays with advanced cluster capabilities, improved storage availability and resilient network topology. Organizations can now plan a maintenance window to be a relatively safe process with minimal risk.
Proper planning is the key for a successful maintenance window, and it will help to identify the critical success factors. Planning will also assist to communicate the right message to customers and managers. This keeps the entire IT department on the same page in order to avoid a failed project.
Steps to Success
After a project is assessed and a plan for the maintenance window is developed, it is time to examine the specific steps behind the process. Here are key steps for a successful maintenance window:
- Try to minimize the downtime as much as possible by doing as much of the work as possible ahead of time.
- Plan for everything to go wrong. Make sure you have valid backup and know how much time it will take to recover/rollback.
- Split the maintenance window activities if necessary — reduce complexity as much as possible.
- Write a clear and detailed plan — consider all the pre- and post-steps. Add all tasks to the actual work plan (even obvious steps), obtain a consensus from your team, and provide an estimated time for each step (allow more time than needed, and add risk time). This will allow you to estimate the time and set the expectations.
- Ensure you communicate as clearly as possible, so your managers and the business/customers understand what you are trying to do. You should also include notifications in your communications plan that will be sent prior to performing the activities and after the activities are completed as follow-up.
It’s All About the Plan
A good plan will help you to execute a successful maintenance window and mitigate any risks associated with the process.
System administrators need to be aware that putting off the process altogether to avoid risks only leads to future issues that could be more devastating.
The plan is a critical component and something that everyone involved in the process should be aware of.
Being responsive and following up with the users will help you do a better job and avoid any potential risks.
Ronny Front is a senior consultant at GlassHouse Technologies.