Predictive Analytics - Implementation Strategies
- Paul
- Apr 30, 2020
- 3 min read
Introduction
This article focuses on Predictive Analytics - The forecasting of some future event, based on historical data. At Cloudforest, we've been working on a cycles-to-failure prediction model for turbofan jet engines using a publicly available dataset from NASA, and I wanted to share some of our learning from the experience with our audience.

Within the engineering industry, data is increasingly being used to predict system Remaining Useful Life (RUL), reducing unnecessary servicing and increasing uptime. Predictive analytics is a growth area worldwide, with businesses across a range of verticals targeting reductions in operating costs through the optimisation of system servicing and maintenance.
One of the big challenges of predicting failure in the engineering sector is the relatively small amount of model-suitable training data - Most operators are averse to running multiple systems to failure, given the considerable expense this can entail.
Developing a high-fidelity digital twin can help here - Simulations can create a 'sensor dataset' for various failure scenarios, and these datasets can be used to train a predictive algorithm. In fact, this is exactly what the NASA project did.
Predictive Analytics - Approach Strategies
So how can organisations start bringing predictive analytics into their engineering programmes? Projects are broken down into three key stages - Pre-algorithm Foundations (Data Collection), Algorithm Development, and Algorithm Deployment & Improvement.

Pre-algorithm Foundations
Predictive algorithms are trained on data, and this data is needed before modelling work can begin.
Data principally comes from two sources:
Real-world data, sourced from systems that have failed. This can be costly and time-consuming to acquire, and hardware is often serviced before the failure event, limiting the predictive value of the data.
Simulation data, where a simulation generates 'sensor' outputs representative of an oncoming failure event. Whilst circumventing the cost and complexity of physical testing, multi-domain simulations are complex to build and validate.
In either case, an organisation needs to make a strategic decision about where to source the training data. Factoring this decision early into the programme creates time for meaningful, quality data to be sourced.
Algorithm Development
Once training data has been sourced, a development can begin. The primary goal of the algorithm is to predict when a system will fail over a variety of operating conditions, and independently of manufacturing and assembly variation.
Machine Learning (ML) is a powerful tool for developing a predictive algorithm, and there are a number of model types that can be rapidly trained and evaluated, from simple regression models to different types of neural networks.

The modelling approach first involves using a simple ML technique such as linear regression to create a baseline model performance figure, against which different types of model (such as neural networks) and tuning parameters can be evaluated.
We found on the NASA training dataset our initial models were able to predict cycles to failure with an error of around 34%, and our best-performing model improved to about 18%:

The PCA referred to in Fig. 4 stands for Principle Component Analysis, a data-manipulation technique.
Whilst the best-performing model could predict failure within a few tens of engine cycles, this can almost certainly be improved upon through further optimisation and improvements to the input data (whilst also supporting a predictive maintenance programme).
Algorithm Deployment
Once a predictive algorithm has been developed, the next steps are Deployment and Continuous Improvement.
Model deployment involves provisioning a service where the wider engineering, production and deployment team can request and receive RUL estimates for supplied usage data. This usually takes the form of a webservice, with users sending data via the internet, and receiving a prediction from the algorithm.
At the same time, work continues to improve the speed and accuracy of the algorithm, by collecting further real-world data, comparing new modelling approaches against the incumbent, and reworking the code to improve robustness. As new models are developed, they go through quality control and are provisioned for users.
Summary
This month's post looked at how to get started with predictive analytics and Remaining Useful Life estimations, covering some of the challenges with getting started, and the specific strategies that can be used to develop a useful algorithm.
This is a growth area for data science in engineering applications, and creates opportunities for companies developing and operating products and systems to optimise service and maintenance activities.
There are other ways the approach can be applied to other problems - Predictive algorithms can be developed for any forecastable parameter, such as fuel consumption, system performance or a myriad of other interesting parameters.
I hope you found this month's post useful and informative - If you'd like to know more, please do get in touch.
Thanks,
Paul
Comments