Downtime Management
Introduction
High availability (in English: high availability) is a system design protocol and its associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period. Availability refers to the ability of the user community to access the system, submit new work, update or alter existing work, or collect the results of previous work. If a user cannot access the system, it is said to be unavailable. The term downtime is used to define the period of time in which the system is not available.
Downtime
Typically, planned downtime is the result of maintenance that is detrimental to system operation and usually cannot be avoided with the configuration "Settings (computing)") currently installed. Examples of events that cause planned downtime are patches to system software that require a reboot or system configuration changes that take effect after a reboot. In general, planned downtime Planned downtime is usually the result of an initiated logical or management event.
Unplanned downtime arises from some physical event such as hardware failure or environmental anomalies. Examples of events that cause unplanned downtime include power failures, CPU or RAM component failures, a crash due to overheating, a logical or physical break in network connections, security breaches, or failures in the operating system, applications, and middleware.
Many computing positions exclude planned downtime from availability calculations, assuming, correctly or incorrectly, that unplanned uptime has little or no impact on the computing user community. By excluding planned downtime, many systems can claim to have very high availability, which can create false illusions of continuous availability. Systems that exhibit true continuous availability are comparatively rare and expensive, and typically feature carefully implemented designs that eliminate any single point of failure and allow hardware, network, operating system, middleware and application upgrades, patches and replacements to be made online.
Percentage calculations
Availability is usually expressed as a percentage of operating time in a given year. In a given year, the number of minutes of unplanned downtime is recorded for a system; the aggregated unplanned downtime is divided by the total number of minutes in a year (approximately 525,600) producing a percentage of downtime; The complement is the percentage of operating time which is what we call system availability. Common availability values, typically stated as a number of "nines" for highly available systems are: