In systems analysis, redundancy is known as the provision of more than one resource, normally with similar characteristics to another existing one, for the execution of the same task that is considered critical or priority. It is used in systems known as security systems and high availability systems of systems.
Not to be confused with perimeter security or operational security"), which is related to intrusions and system vulnerabilities.
Failed
Contenido
Fallo es la interrupción en la función de un sistema, subsistema, parte, dispositivo, componente o elemento que lo forma.
Por su cuantía, un fallo puede ser simple o múltiple.
Por su naturaleza, un fallo puede ser visible o invisible.
• - Un fallo visible implica que la situación ha resultado en un error y este se ha detectado. Un error en un sistema es la representación o manifestación de un fallo cometido en el estado actual o en un estado anterior.
• - Un fallo invisible es un fallo que no da como resultado un error, bien porque no hay error o porque el error no se ha detectado. Un doble fallo, puede dar como resultado una situación correcta, cuando en realidad el sistema es doblemente inseguro. Al tiempo que transcurre entre un fallo y la representación de su error, se denomina latencia del fallo.
Degraded mode
Generally, a failure leads to a situation called a degraded mode of operation. This mode establishes functional, operational, technical or security limitations that are established at design time.
Security systems
Security is a very broad concept that includes everything necessary to establish a relationship of trust between the system and the operator.
• - Security in communications.
• - Safety in operation.
• - Security in data protection.
• - Perimeter security and access control.
• - Security in data storage.
• - Security based on privileges and profiles.
Emergency systems analysis
Introduction
In systems analysis, redundancy is known as the provision of more than one resource, normally with similar characteristics to another existing one, for the execution of the same task that is considered critical or priority. It is used in systems known as security systems and high availability systems of systems.
Not to be confused with perimeter security or operational security"), which is related to intrusions and system vulnerabilities.
Failed
Contenido
Fallo es la interrupción en la función de un sistema, subsistema, parte, dispositivo, componente o elemento que lo forma.
Por su cuantía, un fallo puede ser simple o múltiple.
Por su naturaleza, un fallo puede ser visible o invisible.
• - Un fallo visible implica que la situación ha resultado en un error y este se ha detectado. Un error en un sistema es la representación o manifestación de un fallo cometido en el estado actual o en un estado anterior.
• - Un fallo invisible es un fallo que no da como resultado un error, bien porque no hay error o porque el error no se ha detectado. Un doble fallo, puede dar como resultado una situación correcta, cuando en realidad el sistema es doblemente inseguro. Al tiempo que transcurre entre un fallo y la representación de su error, se denomina latencia del fallo.
Degraded mode
Generally, a failure leads to a situation called a degraded mode of operation. This mode establishes functional, operational, technical or security limitations that are established at design time.
Security systems
Security is a very broad concept that includes everything necessary to establish a relationship of trust between the system and the operator.
• - Safety of operation in dangerous environments.
• - Safety of use.
In general we can define security from a set of situations that should be avoided. To do this, these situations are defined and the system is analyzed mathematically.
Security systems in process analysis are systems in which a process is executed as a finite state machine, in which the states that are allowed are defined. The possible states imply security in the operation, which means knowing in advance what the future situation of all the system resources will be. Knowing all the system variables and their associated resources in advance allows for effective use and safe operation. Furthermore, in this mode of operation, the system has the necessary resources to move between the different states without interruptions, beyond those established by the internal and/or external priorities of the process being executed. Furthermore, a security system must prevent failures, and in the case of failures that do occur, provide the means to ensure that the consequences are as minimal in damage as is materially possible.
A security system has the characteristic that allows the system to not make errors in the event of a failure even when it is in a degraded operating mode. One of the mechanisms used is to stop the operation. In computing, it is considered that the safest computer is the turned off computer.
Generally, systems are protected against a single failure, but there are architectures that defend the system against a double failure.
Safety systems are commonly called intrinsically safe, not to be confused with intrinsic safety systems"), since safety is a quantifiable functional parameter such as the speed, dimensions or mass of an object, forming part of its design and mode of operation.
Integrity
A system is secure when it is capable of maintaining its integrity in favor of security. Quantitative criteria are established that assess how safe a system is through Safety Integrity Levels (SIL; in Spanish, Safety Integrity Level).
The most common mechanisms to maintain a SIL level are the safe fail mechanism. This mechanism consists of the fact that in the event of a failure, the system goes to a safe, known and stable mode, in such a way that it does not affect the security of the higher order systems and, ultimately, the security in the operation.
They maintain the level of security within margins that are conventionally accepted as tolerable.
Return
If a logical process machine is in a protected failure mode, it has the necessary resources to change state to a safe state. Protected failures are alterations to the system process or architecture that offer paths back from an unsafe state back to a safe state. These modifications to normal function give the system failure integrity.
If a state machine is in an unprotected fault, it does not have the resources to modify the situation.
The task of returning to a safe state is called repair and consists of replacing failed elements with functionally correct ones.
Depending on how the replacement is performed, it can be:
• - Hot replacement, if it is not necessary to stop the service.
• - Cold replacement if it is necessary to disconnect the machine or one of its parts for the intervention.
Redundancy
Las redundancias son el empleo de recursos adicionales a los estrictamente necesarios.
Linguistics
Linguistic redundancies and repetition result in emphasis. This resource is effective in transmitting orders or instructions between people.
Testing
A check is a verification of a state against a theoretical state, model, or reference information.
Checks are risk mitigation tools in security processes of any nature, understanding risk as the uncertainty that a negative event may occur within a space-time scope. Risk analysis assesses the possible threats to a system, the criticality of the consequences and the mechanisms necessary to reduce or eliminate said risks.
Verification is a natural safety and quality mechanism consisting of using resources to verify the absence of deviations in a process or function with respect to the function that is considered normal in said system.
Sequential checking consists of using the time resource to perform a task again, in order to verify that the results are the same.
When doing an extensive mathematical calculation, we can do it again to be sure that we have done it correctly. Certainty in mathematical terms is the statistical probability that each of the operations has been performed correctly. This is a self-checking mechanism in which the probability of making the same error in different iterations is considered zero for a person with experience in the task.
In railway engineering, sequential checking is an implicit safety task, forming part of the usual vocabulary of Command and Signaling systems. Interlocking controls the points through Command and Check interfaces, in which each movement order has an associated check.
In space operations double checking is used, which consists of performing an action, checking that it has been done correctly twice.
In civil aviation, cross-checking is used, which consists of two cabin crew members simultaneously checking that their colleague has correctly performed the task corresponding to the safety instruction transmitted by the commander.
In logical redundancy, control resources are used to perform tasks and implement logical checks that result in thorough confirmation that the task has been performed correctly. In railway engineering, logical redundancies known as 2 of 2 and 2 of 3 are used, in which two systems simultaneously working in parallel must offer identical readings, information and decisions for movements to be authorized.
Physical Redundancy
Physical redundancy involves reserving more resources than are necessary to perform a task. This is what is called fault-tolerant design. This design prevents the failure of a node or link from causing a service failure.
Redundancy mechanisms are used to address situations of:.
• - own faults. When the system we design presents failures or damages.
• - third party failures. The measures to take when the failure is caused by third parties, such as suppliers.
In both cases, the so-called Contingency Plans are prepared, which foresee the most unfavorable situations to apply the necessary mitigation mechanisms based on the security, availability and functionality in degraded mode required.
• - parallel series redundancy.
As a result, different mechanisms have been created to protect massively used topologies such as the tree network. This configuration allows us to reach a large number of users with few hierarchical levels. These designs isolate entire groups in the event of failure of one of the nodes or links.
An improvement to availability for tree configuration is the employment of redundant links such that a single failure maintains functionality. This functionality applies to both energy links, communications, protections, and transportation. Telecommunications networks are organized around meshed configurations with redundant links. Energy networks create meshes for the safe distribution of supply. The existence of redundant links necessarily implies the existence of elements that manage said links.
• - links without redundancy.
• - redundant links.
The ring configuration is a design model that shows an arrangement such that it apparently generates an infinite loop within a process.
In these systems the links are modified in real time according to the boundary conditions.
The loops are eliminated, the ring being a theoretical mnemonic device.
The ring configuration is a dynamic architectural model in which a tree with dynamically variable branches is established.
• - Ring topology. Concept.
• - Ring topology. Main mode of work.
• - Ring topology. Failure event.
• - Ring topology. Degraded mode of operation.
The Ring Topology Operation in Main Mode image shows the nodes as they work in normal mode in a system with dynamic link allocation.
In the event of a failure, there is a reordering of the nodes based on alternative or reserve links, leaving the failure on an inactive link awaiting a solution. Degraded mode generally implies an imbalance of loads and a reduction in the reliability of the system working in this mode, since it lacks protection for a second failure event.
Emergency systems
Los sistemas de emergencia son una parte vital de los sistemas de seguridad. Son sistemas adicionales pensados para mantenerse a la escucha y actuar solo en condiciones específicas de riesgo declarado.
Por su función permiten tres tareas principalmente.
emergency stop
Redundancy generally works in favor of operation. However, many systems work to stop the operation in case of danger when there is verifiable data that the operation would result in damage.
Its function is to stop a system that is malfunctioning to prevent further damage. These systems ensure the stopping of processes regardless of the failure conditions.
Emergency systems use resources external to the system they protect, to act independently by performing tasks such as braking, fluid shutoff, supply interruption, and access denial. The stop can be automatic or manual. The system can replace human decisions when the decision depends on numerous variables at the same time and must be made in short periods of time. They are protection tools against an imminent failure and are used in dangerous situations.
Emergency thermostats are common on instrumented pressure vessels. They are regulated to operate above the temperature of a normal working thermostat. This function is a series protection function, such that any of the interventions stops the fluid, which prevents a contact from generating an unsafe situation.
Emergency actions are manifestations of a declared risk, so after the intervention, the system usually requires rearming, since the action of the emergency system implies that the operating systems have not met their required reliability.
Emergency brake on railways, emergency brake and descent on elevators, emergency braking on vehicles, emergency stop on escalators, emergency stop on automatic production lines. Generally all systems with an associated movement have an emergency stop mechanism.
bypass
They usually establish alternative routes in the event of the failure of the main systems. They generally disable the functions of the main system to be able to perform functions manually or semi-automatically, but directed from a manual control. The term override is also used.
Evacuation
They allow evacuation tasks to be implemented to a safe state after a failure. Opening doors, releasing electromagnetic closures, operating motors using manual mechanisms, lifeboats on boats, emergency exits.
Functional tests
Redundant systems have a series of functional tests called commissioning. These tests verify that the main system takes resources from the reserve correctly, at the estimated intervals, returning to normal when the fault has been restored. These tests check all possible states and ranges of all variables.
Functional tests are sometimes repeated systematically, to verify that the systems maintain their functionality, specifically in the
emergency systems whose correct function is verified in the protocols and drills carried out in accordance with the emergency plans.
Resilience
Resilience is a characteristic of a functionally active system, by which the system itself has sufficient resources to recover from failures by returning from an active reserve in degraded mode to a main system without external intervention. Given the extensive nature of the word system, it can apply to a material "Resilience (engineering)"), in what is known as elastic behavior, to a person "Resilience (psychology)") as the ability to overcome adversities, to an organization as the ability to recover after unfavorable situations, or to the environment
. In a system, resilience implies that the system itself has self-configuration and self-checking mechanisms to restore the service without intervention.
Redundancy as a source of failures
There are redundancies that could be the source of system failures.
Primary source
Duplication of information involves using a scarce resource, such as storage or memory, to maintain the information necessary for a process.
Quality systems establish that the information within the process must have primary sources, so that there is a single origin of the information. This precondition allows for space savings, periodic revisions, enhancements, and a tree update, where a revision in the source updates all content.
An example of Primary Source is Referential Integrity used in SQL-type data trees, where a data set is accessible thanks to a combination of references or keys that are organized in a tree and where the primary key allows access to all the information in the data set. A secondary key allows access to only part of the data set. The non-existence of two equal keys, whether primary, secondary or n-ary, is the original responsibility and task of the designer in the first instance and of the database engine during operation.
Balancing redundancies
Redundancy balancing is a mechanism that allows a system as a whole to better perform its function.
If Fn is the reliability of the nth system of a chain and FT is the total reliability. FT=F1*F2*F3*f4.
Using systems with unbalanced redundancy, the system always has lower reliability than the system with less reliability.
This condition implies that strengthening a segment of a chain is not efficient when there are weaker segments.
By using resources to improve less reliable systems, the system gains overall reliability.
• - Safety of operation in dangerous environments.
• - Safety of use.
In general we can define security from a set of situations that should be avoided. To do this, these situations are defined and the system is analyzed mathematically.
Security systems in process analysis are systems in which a process is executed as a finite state machine, in which the states that are allowed are defined. The possible states imply security in the operation, which means knowing in advance what the future situation of all the system resources will be. Knowing all the system variables and their associated resources in advance allows for effective use and safe operation. Furthermore, in this mode of operation, the system has the necessary resources to move between the different states without interruptions, beyond those established by the internal and/or external priorities of the process being executed. Furthermore, a security system must prevent failures, and in the case of failures that do occur, provide the means to ensure that the consequences are as minimal in damage as is materially possible.
A security system has the characteristic that allows the system to not make errors in the event of a failure even when it is in a degraded operating mode. One of the mechanisms used is to stop the operation. In computing, it is considered that the safest computer is the turned off computer.
Generally, systems are protected against a single failure, but there are architectures that defend the system against a double failure.
Safety systems are commonly called intrinsically safe, not to be confused with intrinsic safety systems"), since safety is a quantifiable functional parameter such as the speed, dimensions or mass of an object, forming part of its design and mode of operation.
Integrity
A system is secure when it is capable of maintaining its integrity in favor of security. Quantitative criteria are established that assess how safe a system is through Safety Integrity Levels (SIL; in Spanish, Safety Integrity Level).
The most common mechanisms to maintain a SIL level are the safe fail mechanism. This mechanism consists of the fact that in the event of a failure, the system goes to a safe, known and stable mode, in such a way that it does not affect the security of the higher order systems and, ultimately, the security in the operation.
They maintain the level of security within margins that are conventionally accepted as tolerable.
Return
If a logical process machine is in a protected failure mode, it has the necessary resources to change state to a safe state. Protected failures are alterations to the system process or architecture that offer paths back from an unsafe state back to a safe state. These modifications to normal function give the system failure integrity.
If a state machine is in an unprotected fault, it does not have the resources to modify the situation.
The task of returning to a safe state is called repair and consists of replacing failed elements with functionally correct ones.
Depending on how the replacement is performed, it can be:
• - Hot replacement, if it is not necessary to stop the service.
• - Cold replacement if it is necessary to disconnect the machine or one of its parts for the intervention.
Redundancy
Las redundancias son el empleo de recursos adicionales a los estrictamente necesarios.
Linguistics
Linguistic redundancies and repetition result in emphasis. This resource is effective in transmitting orders or instructions between people.
Testing
A check is a verification of a state against a theoretical state, model, or reference information.
Checks are risk mitigation tools in security processes of any nature, understanding risk as the uncertainty that a negative event may occur within a space-time scope. Risk analysis assesses the possible threats to a system, the criticality of the consequences and the mechanisms necessary to reduce or eliminate said risks.
Verification is a natural safety and quality mechanism consisting of using resources to verify the absence of deviations in a process or function with respect to the function that is considered normal in said system.
Sequential checking consists of using the time resource to perform a task again, in order to verify that the results are the same.
When doing an extensive mathematical calculation, we can do it again to be sure that we have done it correctly. Certainty in mathematical terms is the statistical probability that each of the operations has been performed correctly. This is a self-checking mechanism in which the probability of making the same error in different iterations is considered zero for a person with experience in the task.
In railway engineering, sequential checking is an implicit safety task, forming part of the usual vocabulary of Command and Signaling systems. Interlocking controls the points through Command and Check interfaces, in which each movement order has an associated check.
In space operations double checking is used, which consists of performing an action, checking that it has been done correctly twice.
In civil aviation, cross-checking is used, which consists of two cabin crew members simultaneously checking that their colleague has correctly performed the task corresponding to the safety instruction transmitted by the commander.
In logical redundancy, control resources are used to perform tasks and implement logical checks that result in thorough confirmation that the task has been performed correctly. In railway engineering, logical redundancies known as 2 of 2 and 2 of 3 are used, in which two systems simultaneously working in parallel must offer identical readings, information and decisions for movements to be authorized.
Physical Redundancy
Physical redundancy involves reserving more resources than are necessary to perform a task. This is what is called fault-tolerant design. This design prevents the failure of a node or link from causing a service failure.
Redundancy mechanisms are used to address situations of:.
• - own faults. When the system we design presents failures or damages.
• - third party failures. The measures to take when the failure is caused by third parties, such as suppliers.
In both cases, the so-called Contingency Plans are prepared, which foresee the most unfavorable situations to apply the necessary mitigation mechanisms based on the security, availability and functionality in degraded mode required.
• - parallel series redundancy.
As a result, different mechanisms have been created to protect massively used topologies such as the tree network. This configuration allows us to reach a large number of users with few hierarchical levels. These designs isolate entire groups in the event of failure of one of the nodes or links.
An improvement to availability for tree configuration is the employment of redundant links such that a single failure maintains functionality. This functionality applies to both energy links, communications, protections, and transportation. Telecommunications networks are organized around meshed configurations with redundant links. Energy networks create meshes for the safe distribution of supply. The existence of redundant links necessarily implies the existence of elements that manage said links.
• - links without redundancy.
• - redundant links.
The ring configuration is a design model that shows an arrangement such that it apparently generates an infinite loop within a process.
In these systems the links are modified in real time according to the boundary conditions.
The loops are eliminated, the ring being a theoretical mnemonic device.
The ring configuration is a dynamic architectural model in which a tree with dynamically variable branches is established.
• - Ring topology. Concept.
• - Ring topology. Main mode of work.
• - Ring topology. Failure event.
• - Ring topology. Degraded mode of operation.
The Ring Topology Operation in Main Mode image shows the nodes as they work in normal mode in a system with dynamic link allocation.
In the event of a failure, there is a reordering of the nodes based on alternative or reserve links, leaving the failure on an inactive link awaiting a solution. Degraded mode generally implies an imbalance of loads and a reduction in the reliability of the system working in this mode, since it lacks protection for a second failure event.
Emergency systems
Los sistemas de emergencia son una parte vital de los sistemas de seguridad. Son sistemas adicionales pensados para mantenerse a la escucha y actuar solo en condiciones específicas de riesgo declarado.
Por su función permiten tres tareas principalmente.
emergency stop
Redundancy generally works in favor of operation. However, many systems work to stop the operation in case of danger when there is verifiable data that the operation would result in damage.
Its function is to stop a system that is malfunctioning to prevent further damage. These systems ensure the stopping of processes regardless of the failure conditions.
Emergency systems use resources external to the system they protect, to act independently by performing tasks such as braking, fluid shutoff, supply interruption, and access denial. The stop can be automatic or manual. The system can replace human decisions when the decision depends on numerous variables at the same time and must be made in short periods of time. They are protection tools against an imminent failure and are used in dangerous situations.
Emergency thermostats are common on instrumented pressure vessels. They are regulated to operate above the temperature of a normal working thermostat. This function is a series protection function, such that any of the interventions stops the fluid, which prevents a contact from generating an unsafe situation.
Emergency actions are manifestations of a declared risk, so after the intervention, the system usually requires rearming, since the action of the emergency system implies that the operating systems have not met their required reliability.
Emergency brake on railways, emergency brake and descent on elevators, emergency braking on vehicles, emergency stop on escalators, emergency stop on automatic production lines. Generally all systems with an associated movement have an emergency stop mechanism.
bypass
They usually establish alternative routes in the event of the failure of the main systems. They generally disable the functions of the main system to be able to perform functions manually or semi-automatically, but directed from a manual control. The term override is also used.
Evacuation
They allow evacuation tasks to be implemented to a safe state after a failure. Opening doors, releasing electromagnetic closures, operating motors using manual mechanisms, lifeboats on boats, emergency exits.
Functional tests
Redundant systems have a series of functional tests called commissioning. These tests verify that the main system takes resources from the reserve correctly, at the estimated intervals, returning to normal when the fault has been restored. These tests check all possible states and ranges of all variables.
Functional tests are sometimes repeated systematically, to verify that the systems maintain their functionality, specifically in the
emergency systems whose correct function is verified in the protocols and drills carried out in accordance with the emergency plans.
Resilience
Resilience is a characteristic of a functionally active system, by which the system itself has sufficient resources to recover from failures by returning from an active reserve in degraded mode to a main system without external intervention. Given the extensive nature of the word system, it can apply to a material "Resilience (engineering)"), in what is known as elastic behavior, to a person "Resilience (psychology)") as the ability to overcome adversities, to an organization as the ability to recover after unfavorable situations, or to the environment
. In a system, resilience implies that the system itself has self-configuration and self-checking mechanisms to restore the service without intervention.
Redundancy as a source of failures
There are redundancies that could be the source of system failures.
Primary source
Duplication of information involves using a scarce resource, such as storage or memory, to maintain the information necessary for a process.
Quality systems establish that the information within the process must have primary sources, so that there is a single origin of the information. This precondition allows for space savings, periodic revisions, enhancements, and a tree update, where a revision in the source updates all content.
An example of Primary Source is Referential Integrity used in SQL-type data trees, where a data set is accessible thanks to a combination of references or keys that are organized in a tree and where the primary key allows access to all the information in the data set. A secondary key allows access to only part of the data set. The non-existence of two equal keys, whether primary, secondary or n-ary, is the original responsibility and task of the designer in the first instance and of the database engine during operation.
Balancing redundancies
Redundancy balancing is a mechanism that allows a system as a whole to better perform its function.
If Fn is the reliability of the nth system of a chain and FT is the total reliability. FT=F1*F2*F3*f4.
Using systems with unbalanced redundancy, the system always has lower reliability than the system with less reliability.
This condition implies that strengthening a segment of a chain is not efficient when there are weaker segments.
By using resources to improve less reliable systems, the system gains overall reliability.
The online configuration of a system does not necessarily imply the bus topology. Rings called planes are implemented in which the components of the ring have the links provided by the ring, within a given geographical or organizational linear distribution.
• - Improved Bus-Bus Topology - Linear Ring.
• - Redundant bus. Link fault tolerance. No tolerance to node failure.
• - Linear ring. Simple fault tolerance.
There are systems that do not have a dynamic configuration of links, such as ground network rings, where the connections are buried and fixed.
These systems can work simultaneously or sequentially.
Depending on the operation of the redundant system, they can be:
• - From active reserve. If the operation of the secondary system does not imply the start of the reservation.
• - From inactive reserve. If the device is turned off and needs to be started. This configuration is commonly called Hot-Standby because of the relationship between the primary system and the backup system.
• - Spare. If the reservation system is stored. The correct restocking process in the case of spare parts minimizes the MTTR or mean time to repair.
Depending on the space-time coordination, they can be:.
This protection establishes criteria to mitigate third-party failures, such as power supply failure. It has a receiving point with n supply alternatives.
In this redundancy, the alternative system is generally:
• - idle.
• - active in other functions. This functionality is what is called capacity reservation.
Depending on its origin, the capacity reserve can be:
• - internal capacity.
The device has more capacity than that used in normal operation.
This characteristic is achieved by applying an oversizing at design time that allows this function. These systems maintain the original design capacity or slightly less, ensuring the necessary and essential movements to return to a higher level of safety. It is used when one of the subsystems is inactive for maintenance tasks, or inactive due to a failure.
• - external capacity.
The system relies on collateral systems to perform tasks that its damaged or inactive functionality does not allow it to perform.
when a system in operation reserves a physical, functional, operational capacity with pre-assigned resources and procedures, to meet the demand from clients with high availability requirements.
The power supplies of critical function computer rooms are configured so that failures in any of the supply networks do not lead to the unavailability of the system, but rather the existence of alternative supplies allows operation in the event of single failures, even multiple failures. The system is going down from a network of high availability and high energy efficiency such as the conventional electrical network, to more limited and inefficient networks such as diesel generators in order to maintain the service.
As the higher-order networks return to normal, the system switches in reverse until it reaches its initial situation. Network changes are transparent to the subsystems that make up the receiver. This word means that the operation is not stopped or affected or modified, at least not substantially, although analytically, the changes are associated with the so-called micro cuts, which are small interruptions of the electrical fluid.[1].
At an organizational level, having 2 or more suppliers when resources allow, is a protection mechanism against unavailability events. Quality systems recommend performing certain administrative tasks in triplicate in relation to suppliers, specifically when evaluating a supply, budget, quote or offer, to always have more and better alternatives in this function.
In systems with fluid or gas storage, redundant evacuation points are designed as an alternative to blocking a main valve.
In information systems with active standby, this configuration is called a mirror configuration. The redundant system has the information from the main system to continue the function in case of failure. The efficiency is 50%, since twice as many resources are used as for a basic system without such protection.
• - 1+1 with functional simultaneity]].
These systems are called hot standby or hot standby, since the redundant device is operating at the same time as the main system.
The n-of-m systems have m devices of which n must work, while m-n can fail sequentially or even simultaneously.
The relationship between n and m is given by the redundancy that is applied to the process itself.
• - n of m.
The most common redundancy protection mechanism is RAID Redundant Array of Independent Disks. Redundant array of independent disks. It depends on this system that data centers can fail without losing information, thanks to the redundancy that consists of dividing the information and using a space within a disk to house information from the rest of the disks, to replace any of them in case of failure.
This mechanism is another type of protection by active reservation, which is applied in systems that work in parallel in a collaborative manner, that is, adding capacity to the total system.
Mass storage units have traditionally consisted of elements with moving parts, which are one of the main causes of failure, breakdowns and corrective maintenance actions.
Extended parity systems consist of using additional data that relates the information to itself. In this way, we can implement the rescue of lost data knowing how similar they were to each other from external references.
The image shows elements that are related to each other, with elements that are added to add more relationships that allow data retrieval.
The system memorizes positions and relationships in such a way that recovering data means applying logical functions with the information that has remained intact.
• - Parity explained with colors.
In data transmissions, adding a bit called redundancy allows us to know if the transmission has been carried out correctly. They are called error correction systems.
Timers are systems that are added to processes to verify that operation is within normal statistical margins.
These are independent systems that work with their own clocks and form an additional security layer to those already existing. Systems such as Watchdog establish the criterion that functions take a certain amount of time to develop their content.
The online configuration of a system does not necessarily imply the bus topology. Rings called planes are implemented in which the components of the ring have the links provided by the ring, within a given geographical or organizational linear distribution.
• - Improved Bus-Bus Topology - Linear Ring.
• - Redundant bus. Link fault tolerance. No tolerance to node failure.
• - Linear ring. Simple fault tolerance.
There are systems that do not have a dynamic configuration of links, such as ground network rings, where the connections are buried and fixed.
These systems can work simultaneously or sequentially.
Depending on the operation of the redundant system, they can be:
• - From active reserve. If the operation of the secondary system does not imply the start of the reservation.
• - From inactive reserve. If the device is turned off and needs to be started. This configuration is commonly called Hot-Standby because of the relationship between the primary system and the backup system.
• - Spare. If the reservation system is stored. The correct restocking process in the case of spare parts minimizes the MTTR or mean time to repair.
Depending on the space-time coordination, they can be:.
This protection establishes criteria to mitigate third-party failures, such as power supply failure. It has a receiving point with n supply alternatives.
In this redundancy, the alternative system is generally:
• - idle.
• - active in other functions. This functionality is what is called capacity reservation.
Depending on its origin, the capacity reserve can be:
• - internal capacity.
The device has more capacity than that used in normal operation.
This characteristic is achieved by applying an oversizing at design time that allows this function. These systems maintain the original design capacity or slightly less, ensuring the necessary and essential movements to return to a higher level of safety. It is used when one of the subsystems is inactive for maintenance tasks, or inactive due to a failure.
• - external capacity.
The system relies on collateral systems to perform tasks that its damaged or inactive functionality does not allow it to perform.
when a system in operation reserves a physical, functional, operational capacity with pre-assigned resources and procedures, to meet the demand from clients with high availability requirements.
The power supplies of critical function computer rooms are configured so that failures in any of the supply networks do not lead to the unavailability of the system, but rather the existence of alternative supplies allows operation in the event of single failures, even multiple failures. The system is going down from a network of high availability and high energy efficiency such as the conventional electrical network, to more limited and inefficient networks such as diesel generators in order to maintain the service.
As the higher-order networks return to normal, the system switches in reverse until it reaches its initial situation. Network changes are transparent to the subsystems that make up the receiver. This word means that the operation is not stopped or affected or modified, at least not substantially, although analytically, the changes are associated with the so-called micro cuts, which are small interruptions of the electrical fluid.[1].
At an organizational level, having 2 or more suppliers when resources allow, is a protection mechanism against unavailability events. Quality systems recommend performing certain administrative tasks in triplicate in relation to suppliers, specifically when evaluating a supply, budget, quote or offer, to always have more and better alternatives in this function.
In systems with fluid or gas storage, redundant evacuation points are designed as an alternative to blocking a main valve.
In information systems with active standby, this configuration is called a mirror configuration. The redundant system has the information from the main system to continue the function in case of failure. The efficiency is 50%, since twice as many resources are used as for a basic system without such protection.
• - 1+1 with functional simultaneity]].
These systems are called hot standby or hot standby, since the redundant device is operating at the same time as the main system.
The n-of-m systems have m devices of which n must work, while m-n can fail sequentially or even simultaneously.
The relationship between n and m is given by the redundancy that is applied to the process itself.
• - n of m.
The most common redundancy protection mechanism is RAID Redundant Array of Independent Disks. Redundant array of independent disks. It depends on this system that data centers can fail without losing information, thanks to the redundancy that consists of dividing the information and using a space within a disk to house information from the rest of the disks, to replace any of them in case of failure.
This mechanism is another type of protection by active reservation, which is applied in systems that work in parallel in a collaborative manner, that is, adding capacity to the total system.
Mass storage units have traditionally consisted of elements with moving parts, which are one of the main causes of failure, breakdowns and corrective maintenance actions.
Extended parity systems consist of using additional data that relates the information to itself. In this way, we can implement the rescue of lost data knowing how similar they were to each other from external references.
The image shows elements that are related to each other, with elements that are added to add more relationships that allow data retrieval.
The system memorizes positions and relationships in such a way that recovering data means applying logical functions with the information that has remained intact.
• - Parity explained with colors.
In data transmissions, adding a bit called redundancy allows us to know if the transmission has been carried out correctly. They are called error correction systems.
Timers are systems that are added to processes to verify that operation is within normal statistical margins.
These are independent systems that work with their own clocks and form an additional security layer to those already existing. Systems such as Watchdog establish the criterion that functions take a certain amount of time to develop their content.