More Resources

Detecting outliers in complex profiles using a [[chi].sup.2] control chart method.(Report)


The quality of products or manufacturing processes is sometimes characterized by profiles or functions. A method is proposed to identify outlier profiles among a set of complex profiles which are difficult to model with explicit functions. It treats profiles as vectors in high-dimension space and applies a [[chi].sup.2] control chart to identify outliers. This method is useful in Statistical Process Control (SPC) in two ways: (i) identifying outliers in SPC baseline data; and (ii) the on-line monitoring of profiles. The method does not require explicit expression of the function between the response and explanatory variables or fitting regression models. It is especially useful and sometimes the only option when profiles are very complex. Given a set of profiles (high-dimension vectors), the median of these vectors is derived. The variance among profiles is estimated by considering the pair-wise differences between profiles. A [[chi].sup.2] statistic is derived to compare each profile to the center vector. A simulation experiment and manufacturing data are used to illustrate applications of the method. Comparing it with the existing non-linear regression method shows that it has a better performance: it misidentifies fewer non-outlier profiles as outliers than the non-linear regression method, and misidentifies similarly small fractions of outlier profiles as non-outliers.

[Supplementary materials are available for this article. Go to the publisher's online edition of IIE Transactions for the following free supplemental resource: Appendix]

Keywords: Data mining, outlier detection, profile, statistical process control, [[chi].sup.2] control chart

1. Introduction

In many Statistical Process Control (SPC) applications, a manufacturing process or product is characterized by a profile, i.e., responses as a function of one or more explanatory variables. Examples of profiles include the percent of a pharmaceutical dissolved as a function of time, and the density of a wood product as a function of the depth into the plank.

From the definition of profiles, one may expect to see smooth curves or hyper-planes depicting the functions. However, if we loosen the definition of a function by letting it take any form (even not smooth or without any physical meaning), vectors can be transformed into profiles. This is done by taking the index of each dimension as the explanatory variable, and the value in each dimension as the value of the response variable. In Albazzaz et al. (2005), high dimensional vectors are transformed into profiles for detailed visual interpretation.

This leads us to enquire if we can reverse the process and transform profiles into vectors, by taking the index of each value of the explanatory variable as the dimension index, and the value of the response variable as the value in that dimension. Thus, vector analysis methods can be used to analyze profiles.

We have experience in a real-world industrial oven process where engineers use profiles and vectors interchangeably. The engineers believed that the quality was determined by the temperature profile that a product experienced in the oven. In this case, the profile is simply the temperature as a function of the 14 thermocouple locations. There is no explicit expression for this function. This profile is actually a 14-dimensional vector.

Treating profiles as vectors is especially useful, and sometimes the only option, when profiles are highly complex. It is usually hard, if not impossible, to fit a regression function to express the complex relationship between the response and explanatory variables. In the remainder of this paper, we use profiles and vectors interchangeably.

By treating profiles as high-dimension vectors, we apply a [[chi].sup.2] control chart to identify outliers. Here we assume that all profiles take fixed values of explanatory variables such that when we treat profiles as vectors, all vectors are in the same space.

The application of the [[chi].sup.2] control chart to identify outlier profiles is valuable in SPC in two ways.

1. It can be used to identify and remove outliers in the baseline data in Phase I enabling the creation of a better model.

2. It can be used for on-line monitoring of processes in Phase II by determining whether a newly observed profile is different from the baseline profile, i.e., out of control.

The [[chi].sup.2] control chart method works as follows. Given a set of profiles, we treat it as a set of vectors in a high-dimension space. A central vector is derived by finding the median in each dimension. The variance among profiles is estimated by considering the pair-wise differences between profiles. Then, each profile is compared to the central vector. A [[chi].sup.2] statistic is developed to measure their difference. If the [[chi].sup.2] statistic exceeds a threshold value, it is labeled as an outlier.

We assume that there is only one response variable and one explanatory variable. However, the [[chi].sup.2] control chart method can be also applied with one response variable and multiple explanatory variables. We also assume that the explanatory variable takes a fixed set of values in all profiles. If this is violated in practice, we can use linear interpolation to derive a profile dataset where all profiles share a fixed set of values for the explanatory variable.

One may think that we can apply methods of outlier detection from the data mining area, such as the local outlier factor method; see Breunig et al. (2000). Usually these methods require the number of vectors to be large compared to the number of dimensions. It might not be satisfied in a profile baseline dataset such as the Vertical Density Profile data in Section 3 which has only 24 vectors in a 314-dimension space.

In this paper we apply the [[chi].sup.2] control chart method to simulated and real data. The simulated profiles are generated from a highly non-linear complex equation. The [[chi].sup.2] control chart method is able to identify outliers that are generally too high or too low relative to the preponderance of the profiles. Also, it can identify an outlier that is near "the middle of the pack" but has the wrong shape. The results of the [[chi].sup.2] control chart method on simulated and real data also show that it performs well even when the dimension of the profile (the number of fixed values of the explanatory variable) is large.

When using simulated profiles, Type I and Type II errors are computed to measure the performance of the proposed [[chi].sup.2] control chart method, and compare its performance with the existing methods. Type II error is the percent of non-outlier profiles identified as outliers. Type II error is the percent of outlier profiles that are identified as non-outliers. In contrast, Mahmoud and Woodall (2004) assess the performance of several methods to detect outliers for linear profiles by considering the probability of identifying at least one outlier, regardless of the number present.

We also apply the [[chi].sup.2] control chart method to data that gives the density profile of a wood product as a function of the depth into the plank. This data was originally presented in Walker and Wright (2002) and is used in Williams et al. (2003) to test an outlier detection method based on nonlinear regression. In contrast to the method in Williams et al. (2003), the [[chi].sup.2] control chart method identifies outliers masked by other profiles but with the wrong shape. Also, the [[chi].sup.2] control chart method does not require qualitative judgment to determine the outliers as in Williams et al. (2003).

There is a growing body of research about profiles. Regression-based methods fit an explicit model relating the response and explanatory variables and focus on the coefficients of the model to determine outliers. Other methods, including the [[chi].sup.2] control chart method and wavelet transformations, do not create an explicit function and can be used when the profiles are complex and regression would involve too many regression parameters. The power to detect outliers drops significantly when the number of parameters is large, as discussed in Jeong et al. (2006).

Focusing on linear profiles, Mahmoud and Woodall (2004) compare their outlier detection method to those proposed by Stover and Brill (1998), Kang and Albin (2000) and Kim et al. (2003). Kang and Albin (2000) simultaneously monitor the slope and intercept of a linear profile with a [T.sup.2] chart. Kim et al. (2003) remove the correlation between the slope and intercept by coding X values such that the mean is zero and separately monitor the slope, intercept and error variance. Mahmoud and Woodall (2004) create two multivariate linear models: one gives the response as a function of the explanatory variable and the other includes an indicator function for each profile as additional explanatory variables. They conclude there are no outliers if the two models are not statistically significantly different. Mahmoud and Woodall (2004) compare these methods for linear profiles on simulated data and conclude that their own method and the method in Kim et al. (2003) perform best.

Considering non-linear profiles, Williams et al. (2003) detect outlier profiles by creating a non-linear regression model and identifying outliers with four [T.sup.2] charts. Jin and Shi (2001) and Lada et al. (2002) use wavelet transformations for non-linear profiles. They focus on a subset of coefficients chosen using engineering knowledge. Jeong et al. (2006) also use wavelet transformations but they select the key coefficients with an adaptive procedure. Wavelet methods handle complex profiles well but can be somewhat difficult to interpret. Woodall et al. (2004) is a good overview of the literature on applying SPC on linear or non-linear profiles.

Page 1 2 3 4 5 Next »
COPYRIGHT 2009 Institute of Industrial Engineers, Inc. (IIE) Reproduced with permission of the copyright holder. Further reproduction or distribution is prohibited without permission.

Copyright 2009 Gale, Cengage Learning. All rights reserved. Gale Group is a Thomson Corporation Company.

NOTE: All illustrations and photos have been removed from this article.


Marketplace

Learn how to distribute a press release

Try our new online printing. theupsstore.com/print
Today on Entrepreneur

Sign Up for the Latest in:
Online Business
Franchise News
Starting a Business
Sales & Marketing
Growing a Business

E-mail*

Zip Code*