Analytical Insights #5 - OT Data Ingestion to Cloud

Mar 6

Part 5 of Microsoft Most Valuable Professional and Fusion’s Chief Data Officer, Dave Shook’s 9-part educational series discussing various topics for asset-intensive industrial companies who need analytical insights to improve operations. The fifth part of this series is the topic of OT Data Ingestion to the Cloud.

Read the video transcript below for your own convenience:

Why do we want to move data to the cloud?

There are a number of different reasons.

You can lose the data if you don't keep it.
A lot of data pertaining to configuration information, changes in the configuration of the control system, or for that matter, in the instrument data itself, that information has a finite lifetime. For compliance or long-term performance reasons, you might want to hold on to that data for a long period of time. It can be very inexpensive to retain data for the long term in the cloud.
The cloud is a very useful place to combine data from multiple sources.
You don't have to worry about networking among different systems because the data's all been brought up to the cloud. Also, some data from third-party systems, like vibration monitoring systems, often ends up in the cloud to begin with. Basically, it's transmitted directly from those systems through a cell or direct network connection to a vendor's cloud location. So that data is already a cloud resident.
If you need lots of compute.
Computation on site is owned; it's a capital asset which means that it is constrained. So if you want to run a high computational load job, like a simulation of an oil reservoir over a period of years using seismic data, you will need lots and lots of computers to do that or you will need a very long time in order to do that. In the cloud, you can rent those CPUs for when you need them.
Some services are only available online.
Services like certain analytic AI/ML are really only available online. In that case, you got to get the data up to the cloud just to be able to use the services.

Do you bring it up in real-time, or do you bring it up in a batch mechanism?

Near real-time data can be extremely useful. That's the sort of streaming data use case, and that's the case; whereas data is created here, we read it rapidly from our data acquisition client and push it up to the clouds so that it can be analyzed with low latency.

For example, streaming. Let's say it's sub one minute from a source to the cloud. That can be very effective if you're looking to do high-speed analytics, but industrially most analytics that has to run that fast once they're in production tend to run down here in the DMZ or even down in the control system itself. So the value of bringing streaming data up to the cloud in an industrial environment kind of depends on the use case.

There's also a downside to streaming data, which is that, in many cases, when you're acquiring data from something like a historian. If we're bringing data once a minute basis or less, it is possible for that historian to receive updates on data that is old. So data that took longer than usual to get to it. And in that case, it may not show up in the streaming data set depending on the problems, are the architecture and design of that particular system. Batch data transfer is typically done at longer intervals but has a better chance of delivering a complete data set. Practically speaking, you tend to need both streaming and batch for industrial data acquisition to the cloud.

How do you acquire this data?

MQTT
MQTT is a data transport protocol. It is relatively lightweight, which means to say that it doesn't use a lot of CPU. It's not hard to program. It does not impose a very heavy burden on the network.
OPC
In contrast, OPC is much heavier but contains a lot more information about the source system. We can think of this as more complete. They both are used in different domains. OPC is used extensively in an environment where you have a good network, decent power out in the field, and intelligent machines. You've got power in both electrical computing and robust networks. MQTT tends to be used in the process industry SCADA world as well. There are strengths and weaknesses to both protocols. But they are two of the most commonly used.
AMQP
AMQP was extensively used in banking and probably still is, but it never really caught on in the industrial IoT space.
Proprietary
There are a number of proprietary ones. They are quite honestly a pretty good fit for the purpose, but the problem always is interoperability with proprietary ones.

Why not move data up to the cloud?

One issue could be surround surrounding and opening up the possible vectors of attack to the operational environment when we get out to the cloud. However, the security argument is actually pretty well managed. It's less about security and more about cost these days.

If we think about the computing that happens down in the controller, we have high-frequency data, so multiple values a second. We also have low compute latency, so multiple calculations a second. We also have significant consequences of failure. We can see that these three dimensions really determine where you want to put compute. As you move further to the right, this gets into the Purdue Reference model. It really says that as you move further away from the instrument, the time span around, which you're making a decision increases, but the scope of the decision increases. So you start with down here, and you're doing with one instrument and milliseconds as you come up to higher and higher levels, you go to broader and broader scopes and longer and longer cycles.

So up in the cloud, the data should primarily be used for either building or configuring the calculations that are going to run on-premise or to help people deal with integrated data for decision-making so that people can make those decisions.

Stay tuned for parts 6, 7, 8, and 9.

Dave Shook

Analytical Insights #5 - OT Data Ingestion to Cloud

Read the video transcript below for your own convenience:

Uptake Fusion Industrial Data Analytics Hub with Azure

Analytical Insights #4 - Secure Management of Data Acquisition from Cloud