Analytical Insights #2 - Operational Data Types

Part 2 of Microsoft Most Valuable Professional Dave Shook, Fusion’s Chief Data Officer’s 9-part educational series discussing various topics for asset-intensive industrial companies who need analytical insights to improve operations. The second part of this series is the topic of Operational Data Types.

Read the video transcript below for your own convenience:

There are several specific types of operational data and the fact that different ones exist and how many there make life a little bit complicated. The single most common type of data that people think about when they talk about operational data is what we call time series. Time series data is the data that comes from instruments or that goes out as commands. The thing about time series data is there is always an identifier that we typically refer to as a tag. It takes on values over time, so it's a time series, and it's always within the context of a single tag or possibly multiple tags together. But, one time series is one identifier. 

This data comes in packages consisting of an identifier, a timestamp, a value, and the quality. The quality tells us whether the value itself is of any use, and if not, why not. But it always comes in in this sort of bundle of data. Alarms events are quite different. They can be things like high temperatures, or they can be things like batch complete. The thing is that alarm and event data are not structured the same. It doesn't have a single identifier and a single timestamp. Often these things will have a start time and an end time. The fact that it has it set up with an interval means that it's not a times series. Also, what happens is that there'll be some sort of object – might be a batch, might be an instrument, piece of equipment, and there will often be some sort of topic alarm or even high alarm and low alarm. Then there'll be a bunch of values, and the values are a function of the topic. So we now get into this world where it’s more relational than this simple time series structure.

For example, look at a vibration monitoring system, which is more like an IoT device. What we have here are we'll have an identifier, and that'll be for the sensor or the location. But the problem is that it doesn't send raw data because the data that comes in here might come in at 10,000 Hertz, so 10,000 values a second. It doesn't make sense to send all that data up to the cloud. Instead, what it does at some timestamp is that it will put together a bundle of data, like an array of 2048 raw value, an FT or Power Spectrum, which is 2048 outputs plus a bunch of features. The features depend on configuration as well as the specific manufacturer model. These are things like the total power or what frequency the biggest peaks are at  – that kind of thing. This is a much more complex data set than the alarms and events and far more complex than a simple time series. This is just one example of the type of complex data structure that can come up from operations. 

In addition to this, there is metadata. The simplest form of metadata for any given measurement is the engineering units. It matters whether this is in degrees Celcius or degrees Fahrenheit. It matters whether it's in feet per second, meters per second, and so on. There are also things like validity limits. A lot of instruments have hard limits on minimum and maximum that they can measure accurately. Provenance – the origin of the data is also important. This metadata is associated with any single measurement or value that comes in. It changes from time to time, so we need to retrieve it. 

And finally, there's something called context. The issue with context is this if we look at this vibration monitoring system, it is configured to understand that these two vibration sensors are 90 degrees apart on a particular bearing and that it will probably be configured to know that these two bearings are on the same pump. It may be configured to know that these bearings on this pump are connected to a shaft from that motor which also has these measurements.

That's as far as it goes. The context information within the vibration monitoring system knows nothing about the process context. It only knows about the equipment context for the equipment it is monitoring.

But for us as Engineers or people analyzing what's going on, the process operating context is as important as the equipment context. The problem here is that each source system has a data model. But there's more than one, and they will overlap because they refer to the same physical environment. They are not necessarily configured in such a way that they know about it, and they're not necessarily configured consistently. So this context information and these data models also need to be acquired and brought up to the cloud. 

In summary, we’ve covered five quite distinct types of data – time series, alarms and events, complex sensor device record, metadata, and context. 

Previous
Previous

Analytical Insights #3 - Secure Communication from DMZ to Cloud

Next
Next

Analytical Insights #1 - Data Acquisition from OT (Operational Technology) Systems