Reducing human error

Data centre outages can be caused by human error, not equipment failure. For example, accidental shutdowns are still a leading cause of data centre outages.

A number of measures can be put in place to reduce human error. Besides properly training staff, data centre operators can also enforce stricter food/beverage policies by ensuring people don’t drink and eat near equipment, shield emergency ‘off’ buttons, and document maintenance procedures. Everyone working in a data centre should have knowledge of the IT equipment within the facility.

Detailed scenario planning

Scenario planning can be complex as it needs to address a wide range of possible disruptions to the data centre. Detailed scenario planning should address everything from the physical infrastructure and building location to power generation, critical systems and network infrastructure. Scenario planning should entail:

Going through the planned operations of the data centre, highlighting typical disruptions so that operators can understand which systems can be impacted by a specific event
Defining potential planned disruptions, such as capacity expansion, scheduled maintenance or end-of-life replacement
Creating an action plan for each event
Undertaking a walk-through and rehearsal of this action plan
Refining and improving the action plan

BIM modelling can also be used to simulate the ’What if’ scenarios. This type of virtualisation can give insights into the strategies needed to implement to avoid system failures. Aurecon has developed a Work Method Statement that is being used successfully by data centres to minimise failures.

A combination of engineering specifications, design documents, operational matrixes and implementation plans need to be developed to help data centre operators understand how their facility is meant to operate, how the individual systems can fail, how this failure impacts the entire data centre and how they will identify these failures. Understanding all the various aspects is a critical part of avoiding mission critical outages.

Contributor: Shayne Parkin

Unfortunately, you are using a web browser that Aurecon does not support.