Willkommen! - Bienvenido! - Welcome!

Bitácora digital de Información al cliente de Tux&Cía.
Bitácora Central: Tux & Cía.
Bitácora Técnica (multilingüe): TecniCambalandia
May the source be with you!

Sunday, March 28, 2010

Disponibilidad de un sistema

Calculating Availability of Individual Components

System Availability
System Availability is calculated by modeling the system as an interconnection of parts in series and parallel. The following rules are used to decide if components should be placed in series or parallel:
  • If failure of a part leads to the combination becoming inoperable, the two parts are considered to be operating in series
  • If failure of a part leads to the other part taking over the operations of the failed part, the two parts are considered to be operating in parallel.

Third step involves computing the availability of individual components. MTBF (Mean time between failure) and MTTR (Mean time to repair) values are estimated for each component (See Reliability and Availability basics article for details). For hardware components, MTBF  information can be obtained from hardware manufactures data sheets. If the hardware has been developed in house, the hardware group would provide MTBF information for the board. MTTR estimates for hardware are based on the degree to which the system will be monitored by operators. Here we estimate the hardware MTTR to be around 2 hours. 
Once MTBF and MTTR are known, the availability of the component can be calculated using the following formula:
Availability calculation from MTBF and MTTR
Estimating software MTBF is a tricky task. Software MTBF is really the time between subsequent reboots of the software. This interval may be estimated from the defect rate of the system. The estimate can also be based on previous experience with similar systems. Here we estimate the MTBF to be around 4000 hours. The MTTR is the time taken to reboot the failed processor. Our processor supports automatic reboot, so we estimate the software MTTR to be around 5 minute. Note that 5 minutes might seem to be on the higher side. But MTTR should include the following:
  • Time wasted in activities aborted due to signal processor software crash
  • Time taken to detect signal processor failure
  • Time taken by the failed processor to reboot and come back in service

Component MTBF MTTR Availability Downtime
Input Transducer 100,000 hours  2 hours 99.998% 10.51 minutes/year
Signal Processor Hardware 10,000 hours 2 hours 99.98% 1.75 hours/year
Signal Processor Software 2190 hours 5 minute 99.9962% 20 minutes/year
Output Transducer 100,000 hours 2 hours 99.998% 10.51 minutes/year
Things to note from the above table are:
  • Availability of software is higher, even though hardware MTBF is higher. The main reason is that software has a much lower MTTR. In other words, the software does fail often but it recovers quickly, thereby having less impact on system availability.
  • The input and output transducers have fairly high availability, thus fairly high availability can be achieved even without redundant components.
Calculating System Availability
The last step involves computing the availability of the entire system. These calculations have been based on serial and parallel availability calculation formulas.
Component Availability Downtime
Signal Processing Complex (software + hardware) 99.9762% 2.08 hours/year
Combined availability of Signal Processing Complex 0 and 1 operating in parallel 99.99999% 3.15 seconds/year
Complete System 99.9960% 21.08 minutes/year

No comments: