Helping Users Find Value in Unstructured Plant Data

Three types of plant data coexistent & unstructured data is not fully utilized

In a present-day facilities, a large amount of data is being created continuously -- 24 hours 7 days a week -- by production equipment, control devices, process sensors and operators. Besides the increasing amounts of daily data, many facilitiesare also making serious efforts towards digitalization of existing hardcopy data, which can end up amounting to several gigabytes. This data consists of everything from of real-time numerical data from machines to text recorded in logbooks written by operators. By collecting all of this we are able to extract value in the form of what is known today as industrial big data.

In big data, we always talk about the 4 V’s [1]:

  1. Volume: the size of generated and stored data.
  2. Variety: the type and nature of the data.
  3. Velocity: the generation speed of data.
  4. Veracity: the quality and reliability of data.

Generically, big data can be divided into different types, depending on if it has a pre-defined data model. With a pre-defined data model, the data is known as structured data. The data having partial rules makes up semi-structured data. All other data is otherwise known as unstructured data (i.e. data without any pre-defined rules). From the viewpoint of 4 V’s, we can see the differences between structured and unstructured data in the figure below. Here we focus on these 2 kinds of data.


Figure 1. The 4 characteristics of big data in a plant: Structured data VS Unstructured data (Semi-structured data omitted)


Let us take a closer look at these three types, we know that:

  1. Structured data consists of process control data, time-series data from sensors; basically all types of data that can be stored in a relational database after a numerical conversion. Fortunately, with the development of PIMS, powerful data analysis and prediction tools, plant managers and operators can use this type of data at full value to increase production efficiency.
  2. Semi-structured data, which indicates that the data follows some rules but is not fully structured, for example, operation logs or incident reports in XML format, or attribute data in JSON and the rest. This type of data is easy to exchange and understand between different systems and organizations, but the core contents of this data type of are almost always unstructured.
  3. Unstructured data consists of all textual data from operation logs, incident or maintenance reports, facility pictures taken for maintenance records, or even from text alert messages or maintenance logs of DCS and field devices.  The unstructured data has no pre-defined data model; it’s just like a firsthand snapshot of the plant, as it only shows the special timing of that plant, without a structured or unified model, hence data analysis and statistics will be a bottleneck for full use. But with many techniques developed with big data, text mining, image recognition and natural language processing, this type of data can also be considered as a potential data source, chock-full of undiscovered patterns.

Figure 2. Importance of unstructured data. According an early research. Much usable information may come from unstructured data.

Looking at some of the leaders in big data such as Google and IBM, many internet companies begin to provide analysis tools for unstructured data (e.g. pictures, textual data in SNS) for business applications. This trend also has an influence on data analysis in Operations Management of a processing because there is a great amount of usable information that may come from unstructured plant data [2].

Operation Management solutions must help users find value in unstructured data

As we enjoy our daily lives full of smart devices and cloud technologies, huge amounts of unstructured data (photos, blogs, tweets, messages, etc.) are being created on the internet every second and many web companies have shown us good examples about how to extract value from this kind of data. We can see this clearly with searching keywords or using google trends, which can find the most popular topics in a specific area. And it is widely known that many marketing companies analyze what people tweet and share on social media, looking for valuable information and trends to learn more about their customers preferences and decision-making patterns. The above cases are just common practices in internet world, but the situation is a little different in a plant, as most of the unstructured data is saved in the intranet with different formats and styles, and most of the analysis-based solutions are focused on structured process data which is numerical, which means that the data is convenient to process and more reliable. Few applications provide solutions for the analysis or statistics of unstructured data in logbooks or reports, especially in the Operations Management area. But the good news is most of the plants are digitalizing their operation logs or reports, which will lay the foundation for the high-value use of unstructured data.

Figure 3. The use of unstructured data becomes easier

It is true that a machine can provide more accurate and reliable information than operator-written text in a logbook or report, but in Operations Management the operator plays a key role. The insight and experience of the operator cannot be replaced by machines and should be fully utilized, including unstructured data information. Fortunately, we have more and more powerful techniques emerging every year on how to do this. By taking advantage of technologies like text mining, image recognition, machine learning, even AI, the untouched value that is hidden in logbooks, maintenance reports, photos, alert messages, even the conversations between operators (if recorded) can be found and delivered to plant managers and operations leaders.

Let's say text mining can find the recent topic or the equipment that many operators are reporting about, and also recognize texts or words patterns in maintenance reports to predict the possible failures. In addition, when a lot of alert messages are generated, text mining with pattern matching can extract the critical message to help the operator to make the right judgment. A text positive/negative analysis about operation logs may also discover the potential operation problem in a plant early, before an incident occurs.

Figure 4 shows an example of hot words historical trend about a special production line. Some specific keywords correlate with failures (for example: “Noise”) that may show up in a regular pattern, which can help the plant management team to make a maintenance plan.


Figure 4. An image of keyword trend history of a production line with text trend analyses


Another case is that some important reports or logs with valuable information are sometimes buried in large amounts of daily logs and reports, and like a needle in a haystack, a manager may miss the useful information. Another analogy can be made comparing an operator to a person who lives in the desert and watches the weather forecast everyday: for every 100 times the weather report is given, 99 times the forecast will be “sunny tomorrow,” which is not very helpful. Whereas the one time a forecast reports “tomorrow will be rainy” is understandably regarded as information that is more valuable. Similarly, techniques for textual data analysis can find the logs with important information and give notice to the plant manager ahead of time, which would be very valuable information.

The image below shows an example of how a word cloud can show a visible and statistical feature of equipment line A & B in a plant. It can help the management team grasp the character of a line in as little as one second.


Figure 5: An image of equipment log (A text mining result shown in word cloud)


The last case I will make for unstructured data analysis is how to make use of images that are taken for maintenance. Maintenance teams may take photos of the facility at regular intervals to check the corroded condition and to decide when to repaint. I believe that with image recognition technology and some machine learning, the corrosion ratio of each part of the facilities can be calculated from photos and a repaint plan can be made and added into the maintenance team’s to-do list, an order for painting materials can even be placed automatically. Google was able to teach a computer how to recognize cat faces; I presume having a  machine recognize corrosion ratios from maintenance images would be much easier.


Plant managers do not need to be data scientists to make full use of the unstructured data. With the commoditization of data analysis modules, AI, and machine learning tools, Operations Management solution providers are able to bring user-friendly and effective data analysis solutions at a reasonable cost.

By helping users to process and analyze their unstructured data, an Operations Management solution will bring high-level business value out of the hidden data that is stored which will be a big differentiator in how Operations Management solutions are marketed and sold in the future.

P.S. "Apple acquires AI company Lattice Data, a specialist in unstructured ‘dark data’, for $200M".. It looks that the big players are also beginning to take the value of unstructured data seriously...

Read our other articles on shift handover and logbook design

[1] Hilbert, Martin. "Big Data for Development: A Review of Promises and Challenges. Development Policy Review.". Retrieved 2015-10-07.
[2] Automatic Processing of Natural-Language Electronic Texts with NooJ , Springer, Linda Barone, Mario Monteleone, Max Silberztein.2016