Saturday, July 20, 2013

From Outliers to Process Anomalies: Predictive SPC. Part 1

Predictive SPC. Part 1
Statistical Process Control (SPC) is a well described framework used to identify weak points in any process and predict the probability of failure in it.  The distribution parameters of process metrics have been translated into process capability, which evolved in the 1990s into the Six Sigma methodology in a number of incarnations. However, all techniques derived for SPC have two important weaknesses: they assume that the process metric is in a steady state and they assume that the process metric is normally distributed, or can be converted to a normal distribution.  The concepts and ideas outlined here make it possible to overcome these two shortcomings. This method has been developed and validated in collaboration with Josep Ferrandiz.   This post starts a series on Predictive SPC.



Definitions are here

Part 1

State of the Art Today: Axioms and Assumptions

There are a number of excellent references to SPC in the literature and in the cyberspace, from George Box's classic books to a variety of Six Sigma web sites, including www.isixsigma.com.


Any reasonable cost-benefit analysis of using SPC says in no uncertain terms that when done right, it pays to keep the mean of the process on the target, and the control limits within the specification limits.  


Six Sigma is merely a goal that, according to its champions, every organization that cares about the quality of its products should pursue. This goal states that the variance of the processes should be such that there will be six standard deviations between the process target and either one of the specification limits.  That translates into 99.9997 percent of the product being within the specification limits, meaning that no more than 3 in a million units will be defective.


Extension of this statement brings us to an important concept.  If each unit of product is one opportunity for a defect, then with six standard deviations, we are guaranteed to have 3 defects per million opportunities, or 3 DPMO.  But if each unit represents multiple opportunities for a unit to be considered defective, then the DPMO rating will change.  


For example, if we are measuring the three dimensions of a box, and each of them has the same specification limits, then if we are trying to fit six standard deviations into the specification limits on each, we are talking about 3 defects per 3 million opportunities (one for each dimension), or 1 DPMO: the unit is defective if either one of the dimensions is off.  In other words, when we judge the defectiveness of an item by any one of many dimensions, we can relax the specification on the number of standard deviations, going, e.g., from 6 Sigma to 3.

Specification Limits and Target

Watch out:  Axioms and Assumptions (A&As)!

The goal of process control is to ensure that the process metric distribution stays within specifications.


Nonparametric analysis is usually computationally complex.  For this and other (mostly historical) reasons, the rivalry between Fisherian vs. Bayesian statistics having been a large part of this,  the primary methodology for SPC has evolved along the lines of the simplifying assumption of normal distribution, characterized by only two parameters,  mean and standard deviation.  


It is a convenient simplification that works for most cases, especially considering that many skewed distributions can be made normal by applying the Box-Cox and other standard transformations, and random sampling ensures the Central Limit Theorem works on the analysts’ side, driving the distribution of the means of the samples to normal.


This seemingly innocuous abstraction is very powerful, as it has made it possible to break down the absolute majority of SPC problems into three categories:


1.       Are we concerned about being off target?
2.       Are we concerned about being outside the specification limits?
3.       Are we concerned about both?


Answers to these questions are critical, because different statistical techniques have been developed for evaluating the processes, and care must be taken to use the right methods for each.


But there always is a fourth question:
4.       Are we concerned about something else?


More on A&As

SPC is resting on a solid theoretical foundation, and any theory rests on some basic assumptions.  They may be implicit, but they are nevertheless always present.  Since the goal of SPC is to implement stochastic control of variables that have a significant random component, we have to be aware of the assumptions implicit in the standard statistical tests.  


The main assumption is that the underlying distribution is normal (Gaussian).  If that is not true, then generally speaking, the T-test and the F-test will not work.


There are ways to make a distribution appear normal; there is the often abused Central Limit Theorem; however, it is important to remember that even if the Box-Cox transformation has been successfully applied, it does not change the fact that the underlying data are not Gaussian, and the symmetrical bell curve does not describe such data (more on that, e.g., here and here).


From the practical standpoint, it means that we cannot generalize the methods derived for the "classic" SPC, and that before we draw any conclusions about the data behavior, we have to always check the distribution for normality.


Parenthetically, even normality is not a guarantee of success: the samples have to be random and independent, and when it comes to sampling a time series, we run into the bigger issues of sample interarrival time distribution, distribution of sample durations, etc.


If we find (with or without the transformation) that the distribution is indeed normal, we can claim victory, because it means that the simple statistical tests derived by the classics will work, and they are what modern SPC is using.

A few more words about Central Limit Theorem and distributions

The Central Limit Theorem (CLT) states that no matter what the distribution is of the population, random samples of sufficient size taken from this distribution will have their means appear normally distributed if the number of samples is "big enough".  It does not say that, thanks to CLT, “big data” will assuredly have a normal distribution of the data.  Nor does it say that we do not need to worry about the actual shape of the population’s distribution.  Ultimately, it all boils down to being able to detect data behavior on the distribution tails, and this is where the skew and the kurtosis (flatness) of the distribution matters.


What CLT does say, however, is that if we use the means of a sufficiently high number of sufficiently random samples from a data population, we will certainly deceive ourselves into thinking that the population’s distribution is normal.

Example1:

The Poisson distribution is known to describe a wide class of processes where we measure frequency of events. For more details, see [1].  What it means in practice is that the interarrival time of events described by Poisson distribution is distributed exponentially (we will not go through the details of mathematical proof of this statement here; any textbook on statistics will have it).  


This is the basis of, e.g., Erlang’s models used in traffic analysis.  (Erlang’s models connect the offered traffic, the number of parallel lines to serve the traffic, and the blocking probability. Initially created for use in telephony, they have by now entered a wide variety of applications from highway design to IT capacity planning.  For more details, see [Josep Ferrandiz, Alex Gilgur. Level of Service Based Capacity Planning.  - 38th International Conference of the Computer Measurement Group (CMG’12), Las Vegas, NV, December 2012].)


However, if we have hundreds of transactions arriving and processed within an hour, and the only metrics we have are the hourly average arrivals for the hour and the hourly average processing times, we will be deceived into thinking that we cannot apply Erlang’s models - which is not necessarily true.  For more details, see, e.g.,  [Josep Ferrandiz, Alex Gilgur. (2012) Level of Service Based Capacity Planning. – published at the 38th International Computer Measurement Group (CMG12) Conference, December 2012, Las Vegas, NV.]


Finally (about A-and-As)

It is a useful deception, and we will use it to describe the SPC concepts.  Later in this series, though, we will see if we can take off the blinds and use the same principles with distributions that we know cannot be normal (or know nothing about).


Stay tuned! Next post will outline the Anatomy of an SPC Chart


No comments:

Post a Comment