Boosting Solar Plant Operations with Analytics

Paul
Oct 29, 2021
4 min read

How can Data Analytics Improve Plant Investment Returns?

Like nearly all electricity generators, the financial business case for solar plants is built on selling the generated power for more than the combined costs of building and running the plant, and any opportunity to improve how a plant runs can greatly enhance the investment return. Developments in data analytics and machine learning are opening up new avenues to realise these opportunities.

One way data analysis can help is by detecting when the plant isn't working efficiently - A panel may be degrading, an inverter may be faulty, or the photovoltaic cells may require cleaning - and enabling a maintenance intervention.

Another is in predicting future generation, so that other grid systems - generators, storage, and large consumers - can be primed to make the best use of low-cost energy and smooth fluctuations in the grid.

At Cloudforest, we recently built an analytics solution to support solar power plant operations - Maintenance detection algorithms for finding faults and identifying panels in need of cleaning. We found challenges both in and around the raw datasets - Here's what we did, what we found, and what we learned.

What We Did

We analysed two months' worth of performance (AC and DC power outputs) and weather data (including a suite of internal temperatures) from two solar arrays at a plant in India, searching for trends in panel performance in the datasets, and building forecast models for predicting future output.

Preparing the Data

Although we live in a time of unparalleled data availability, you can still run into numerous quality issues when you try to use it - Missing values, incorrect formats and a lack of context can all cause projects to stumble.

An initial exploration on this project found issues - Some data was clearly erroneous (generation figures during hours of darkness, for example), and there were missing entries too. We were also confronted by a lack of context, such as maintenance records / plant operation records, which made understanding trends and insights more ambiguous and uncertain.

Sometimes you are better dropping questionable data and focusing on fewer, higher quality metrics for modelling, and this is something we did with this dataset. We also engineered extra features, such as hourly and daily performance metrics, to help spot abnormal and off-trend behaviour. Sometimes, it is these types of engineered metrics that provide the most useful insights.

Identifying Failing Panels

How do you start looking for evidence of failure in a system dataset? Principally, you apply a little domain-specific knowledge to develop some hypotheses, and then you set out to explore and prove them.

For example, in this project we made an assumption that panels would have no output, rather than low output, if they were faulty (inverters tend to disconnect if supply voltage drops too much), and we looked for panels with large numbers of zero-output periods compared to the peer group. This type of problem lends itself well to clustering / classification routines, but we were actually able to visualise suspect panels with a straightforward scatter graph:

ree — A scatter graph plotting mean power output for a cohort of solar cells, versus how often they output zero power.

At this point, we ran into a challenge with the raw dataset - We don't have any maintenance records to verify if these panels were actually faulty. However, we've got a prototype routine we can start using with realtime plant data, and tune as we gather maintenance information.

Identifying when Panels Need Cleaning

A similar, but subtly different strategy, can be applied to determine when panels need cleaning - Rather than look for 'black sheep' panels with unique behaviour, we'll look for when the output of the peer group is less than expected.

This involved training machine learning models to predict the mean output of the plant under certain conditions, and then look for large deviations in the datasets. We trained a simple linear regression algorithm as a baseline model, and added a deep neural network as a comparator.

This approach seemed to work quite well - We were quickly able to identify periods where the real collective output was down compared to predictions for effectively the same conditions, and subsequently recovered (possibly due to cleaning?).

ree — Time-Series of actual/predicted power, with declines (and recoveries) in expected power output.

As with identifying failing panels, the lack of cleaning data hinders a full validation, but we've been able to identify a group of panels that are collectively underperforming and then recovering, which may be in response to something like a fixed cleaning cycle.

What We Learned

This project created some excellent learning outcomes in terms of working with and thinking critically about a limited dataset, because this is the type of data you work with all the time on realworld business and technology projects.

Data is very rarely perfect, and a considerable amount of time and resources will be spent on processing and massaging the raw inputs - Knowing how to do this efficiently and effectively to achieve project outcomes is hugely valuable.

Wrapping up the project, we identified some key learning points worth carrying forward into future work:

The bulk of the project was spent cleaning and configuring the dataset for analysis - This was the most important step to get right, and was a foundation for all subsequent steps including deeper analysis and predictive modelling.
Sometimes no amount of data cleaning or feature engineering will provide all the context needed to solve a problem - You will need to step back and collect additional information about the problem at hand to make progress. Doing this frequently, as part of analysis sprints, significantly improves the efficiency and progression of the project.
Having data from secondary and tertiary sources (cleaning and maintenance records, for example) can play a crucial role in developing predictions, and should receive appropriate focus. Not all data needs to be connected through digital sensors; Sometimes written records and information are just as valuable.

You can find out more about the project, including the original source code, on the Cloudforest GitHub page: https://github.com/CloudforestTechnologies/solar-power-generation-project

In a future post, I'll go over some of the other common problems that can be encountered during these types of data projects, together with solution options.

As always, thanks for reading.

Paul

Boosting Solar Plant Operations with Analytics

Recent Posts

Comments