We can see just by eyeballing the plot above that there exists a central cluster of part signatures, with some scattered signatures on the outskirts and some way far off. The ones that are ‘way far off’ are our anomalies. The ones that are slightly far off may be capturing our measurement error, or are minor deviations from normal machining activity which may include scenarios like a slight deviation in load or spindle speed due to ambient conditions.
Why didn’t we take transformations?
It should also be noted that we tested several transformations of our data, including taking the log, rolling means, rolling standard deviation, and first derivative. We discovered that just taking the non-transformed signature is most effective way of detecting anomalies. We define most effective as separating out true anomalies most from the other points in 2D PCA space. This also makes sense theoretically, as
- the log of the series flattens the spikes in each series, which could be excluding critical information
- the rolling means does the same thing, and removes key features that are detected by the anomalous package
- the rolling standard deviation may neglect important features while amplifying attributes related to variance, which may not be what we want
- the first derivative may accentuate changes in the signature which are relatively minor, and thus obfuscate the true nature of the signatures