Using Failure Rate Functions to Drive Early Design Decisions [transcript]

We have good requirements for the reliability of our design. We also have a preliminary design with ideas of how we’re going to manufacture it. Is our design idea good enough? Are there things we should do to improve its performance and reliability? Can our Reliability Engineering friends help us in this situation? Absolutely! One of the ways they can help us is with an analysis of a failure rate function. More about this and the bathtub curve after this brief introduction.

Hello and welcome to quality during design the place to use quality thinking to create products others love, for less. My name is Dianna. I’m a senior level quality professional and engineer with over 20 years of experience in manufacturing and design. Listen in and then join the conversation at QualityDuringDesign.com.

For a physical product, there are three general stages in its life cycle. In many cases the failure rates of physical products can be represented by a reliability bathtub curve. This curve is really a plot of a hazard rate function, also known as a failure rate function. On the x-axis is time (or distance or cycles) and on the y-axis is the failure rate (the instantaneous rate of failure at a certain time point). The curve looks like a bathtub, with a decreasing failure rate in the early failure stages (or initial product life), a steady looking failure rate during the useful life stage (which is the flat bottom part of our bathtub), and then an increasing failure rate in the wear out stage. The podcast blog has a picture of a typical bathtub curve.

Starting from the plot left, the decreasing failure rate in the earliest stages of product use is the Early Failure Stage. You may hear Reliability Engineers talk about ‘infant mortality’ or burn-in period. They’re talking about this early failure stage. These are early failures in our product use that are caused from flaws or defects in our product, likely from errors in manufacturing. We’ll talk more about what burn-in is in a bit.

The flat, bottom part of the curve has a steady failure rate, or at least it’s approximately constant. It represents the time period when our product is within its Useful Life Phase. The failure rate is about steady, so it’s not that there’s no failures in this phase. Special causes of failures are likely to cause us headaches, here. Environmental issues, random loads, human error, and chance events contribute to failures in the useful life of our product. Failures that are random and independent over time is what’s implied with the constant failure rate. Any of these independent failures may have its own function over time and not be steady. But they are so numerous, random, and independent from each other that, together, they approximate a constant failure rate.

The last phase, the Wear Out Phase, with its increasing failure rate, is an easy one to describe because it’s when things, well, wear out. It could be from aging, friction, corrosion, fatigue, or even cyclical loading.
Reliability Engineers can use data to estimate the shape of this bathtub curve for the product being developed. And our goals are to change the curve in the way that we want by improving either the product itself or how it’s manufactured. The most influence we can have on this curve is at the concept and design stages of the product development cycle. That’s because, at that point, it’s not too late to change the design fallout components or manufacture in a different way.

Even though we might not have a finished product in-hand, there are still ways we can collect data about this product. A Reliability Engineer can piece together several failure distributions, making a composite bathtub curve for a product. And the failure rate for a complicated system can be the sum of the individual failure rates of its components, given some assumptions. We can analyze the failure rates of subsystems or individual components to get the whole system’s failure rate. Our data collection in early design stages could be from test results from testing proposed prototypes or components. If we’re designing a system and one component’s failure can cause a series of other problems, then we may want to focus attention on that component and tested it if we can.

We can also gather data from component manufacturers’ data. A lot of times, manufacturers test their products and report on the failure rate or reliability of their products under certain conditions. Or they test to a particular standard. This is information that we can use to help us build a model for the system.

We can also get data from evaluating similar devices on the market. If we’re designing a third-generation product and have field data on the 1st and 2nd generation of that product, then we’ll use that data. Or maybe we’re designing something new but using a component from a different product, in a similar way. Look for ways we can use field data if it’s available.

Another way we can collect data is to use standard tables of mechanical device failure rates. There are military standards, government standards, and other guides that have published failure rate estimates for mechanical components [and also electrical components].

Considering all of these data collection methods, if we can get specific failure data by failure mode, then Reliability Engineers can really start to get specific about what kind of design changes would improve reliability. The more specific we can get with failure data, the better the reliability estimates of the system will be.

If we’ve pieced together a bathtub curve from our data, how is that going to help us make decisions for improving the design? That depends on what part of the performance we want, and can, improve.

Maybe we want to improve the early failure stage by flattening the slope and shortening its length. We would want to do this to decrease early failure rates in the hands of our customers. And it can reduce the time for burn-in if we’re doing that. Burn-in is a manufacturing technique where we’re working our product in-house through this early failure time before it ships. The purpose is to wear-in and eliminate the units that will have an early failure rate. Stress screening methods are also used in production or manufacturing for these purposes. If burn-in isn’t something we can or want to do, we can also focus on the process parameters used by our own proposed production or that of our suppliers. We can study and tighten those process parameters to flatten the slope of this early failure stage.

Instead, or maybe in addition, we can improve the useful life stage of our product’s failure rate curve. We can decrease the overall failure rate. Doing this increases the life expectancy of our product and reduces failure in the field. We can do this by analyzing failure data. If we understand the most frequent failure types, we may be able to change the product design and production process design to decrease the failure rate. Changes to our design could be design redundancy or derating or coming up with a different design solution. A way to improve the weakest link of our design is to perform HALT, highly accelerated life testing. This is an iterative test-redesign-test process that we can use to make our design more robust. I discuss HALT methods in a previous episode of this podcast.

With our bathtub curve, we may want to, instead, improve the wear out stage, delay when it happens and flatten the curve. And this extends the useful life of our product. We can do this by changing preventive maintenance, including component replacement schedules. If the Reliability Engineers have the data to understand the failure rate function, then perhaps the maintenance schedule can be changed to occur more often as the component ages.

What if our design doesn’t fit with the bathtub curve? The ideas are the same. If we can gather some data about hazard or failure rates of the components and are able to make some assumptions, a Reliability Engineer can build out a model to help analyze the failure rate function. Even if the plot of our hazard function only looks like one or two phases of the bathtub curve, we can still use it to help us decide what types of actions to take to improve reliability of our system.

What if we’re evaluating software? Can we use reality curves for that? Yes, there’s typical curves for software, too. Instead of an early life, useful life, and wear out phases we’ll look at a test phase and a useful lifetime phase.

What is our insight to action today? Reliability Engineers can help with the early phases of a design in many ways. One of those ways is by analyzing the hazard rate or failure rate of our design choices. We can start to estimate the reliability of our system early in the design process. This gives us a sense of our design. Is it going to meet the reliability requirements we have? If it doesn’t look good, the failure rate model could indicate ways that we can improve reliability including production, design, and maintenance options. We can review our products’ reliability specs with our Reliability Engineering friends and get them involved in the concept phases of our design. Ask if they can evaluate the failure rate of the system and to help us make the choices so we can design products that others love, for less.

Please visit this podcast blog and others at qualityduringdesign.com. Subscribe to the weekly newsletter to keep in touch. If you like this podcast or have a suggestion for an upcoming episode, let me know. You can find me at qualityduringdesign.com, on LinkedIn, or you could leave me a voicemail at 484-341-0238. This has been a production of Denney Enterprises. Thanks for listening!