Availability & Reliability Considerations for UPS Installations
14-10-2014
Today, ICT equipment is typically involved in e-commerce, market transactions, financial settlement processes or other activities with quality-of-service commitments. In such situations, loss of its availability is intolerable – and as ICT equipment is entirely dependent on its power supply, UPS availability also becomes mission critical. With this in mind, this article looks at what availability actually means and how best to achieve it. We also see why increasing reliability, although important, is only one step in achieving the availability that’s required.
Availability and reliability
Data centre operators care primarily about availability because it is a measure of how much time per year their ICT resource is operational and available. It is formally defined as
Availability = MTBF / MTBF + MTTR
where MTBF = Mean Time Between Failures and MTTR = Mean Time To Repair. This equation shows that we can increase availability by reducing MTTR as well as by increasing MTBF, and we will see that the best results come from employing both of these strategies.
MTBF is ultimately based on reliability, so we should start by increasing this. A UPS system’s reliability is the probability that it can perform its designed function of supplying uninterrupted, clean power over a given time period. This reliability is driven by the quality of the components used, and improves with better-quality, more expensive devices. However, as cost is increased, it reaches a plateau where further spending is no longer rewarded by further reliability – even the best components reach a limit of improvement. At this point we need another tool to drive further increase in MTBF.
Fault tolerance and availability
The answer is to build a fault-tolerant system; one that will continue to deliver uninterrupted power to its critical load even if one component fails. Fault tolerance can be achieved using redundant configurations. Imagine for example a 120 kVA load served by two free-standing UPS units, each of 120 kVA capacity. Either unit can continue to fully support the load if the other fails; through such fault tolerance, the MTBF of the total UPS installation is significantly better than that of a single unit entirely dependent on the reliability of its own components. While a single UPS unit might achieve an MTBF of 50,000 hours to 200,000 hours, a fault tolerant redundant system could achieve 1,250,000hours, depending on its configuration.
Such configurations are generically known as N+n redundant systems, where N (Need) is the number of UPS units essential to support the critical load, and n is the number of redundant units. Accordingly, our example comprises a 1+1 redundant configuration. Although, as we have shown, this improves MTBF and therefore availability, it’s not the best possible solution in terms of efficiency and cost. Data centre managers are constantly under pressure to extract the best possible power availability from the least possible budget and floor space, and with the UPS technology now available we can take more steps to help them.
Firstly, consider our 1+1 redundant configuration; by definition it can never be more than 50% loaded. This is highly inefficient in both capital cost and operating cost terms. A better solution is to configure a 4+1 system which can run at up to 80% loading. Increasing the load like this can improve the UPS’s efficiency and reduce running costs, while capital expenditure is reduced as less excess capacity is being purchased. For our 120 kVA example, we could achieve a 4+1 configuration using five free standing 30 kVA units, any four of which could deliver 120 kVA if one unit fails. In this scenario, the 4+1 configuration does have one disadvantage compared with its 1+1 alternative; as it has more components, its MTBF is reduced from 1,250,000 to 5000,000 hours. We have therefore improved efficiency at the cost of reduced MTBF and availability.