Systems Failure Mode Analysis

The closest i can explain of what I see when presented with a system is like looking at a human but from the eyes of a seasoned doctor or traditional healer.

It’s not a physique nor an expression or posture. It’s all of that which is physical and mental, demeanour and then further; an understanding of the existing environment and their diets. What makes them and breaks them; what they are is not just at their best but even at their worst.

System builders depend on pre-established interfaces and standardised APIs to pass information between building blocks. The deeper the system builder understands the history and logic behind these standardised interfaces, the more the system can be understood. As such, system builders building holistic systems need a passing criteria for systems at high performance not just pass marks for achieving integration goals

Interfaces and databases that enable components to communicate with each other often have more data than is efficient in systemic communication. Blocks within a system also have different development milestones and legacy building blocks. These have to be studied in detail to understand what the overall combined systems ideal operational parameters are.

Machines and systems are built with a particular function in mind and some pre-planned performance outcomes. Engineering often builds to these specifications without considering the environmental affects on those specifications. Every component undergoes various modes of decay (and degraded performance) before actual failure occurs.
These changes in operating environments become amplified as more nodes are incorporated in a integrated system. If human interaction differences and errors are thrown into the mix, we have chaotic performance environments potentially built in. Often in such system designs, engineering targets a more stringent performance criteria; a smaller tolerance level.

 

Building tolerances into a living system working in tandem with humans and systemic decay is a little harder given that the environments and tolerance units themselves are unpredictable and changing.

 

One way to get a more holistic understanding of such systems is to map out what I call their failure modes. The theory is that every system has the actual programmed or designed modes and various out-of-design, non-compliant or failure mode.

 

These failure modes in multi-node systems can actually be more than the designed modes. If the most frequent failure modes are tested and recognised, troubleshooting and resolving future systemic issues can be quick. The key is that failure mode analysis is constantly being analysed and the performance envelope constantly monitored and updated.

 

The system then evolves through systems component decay, parts replacements and evolution (upgrades). Failure modes then change whilst performance modes are mostly maintained. So to observe and predict long term system evolutionary performance envelopes, failure mode becomesthe primary lens of observation.