IT vs OT: Challenges and Opportunities – Art Conklin

IT vs OT: Challenges and Opportunities – Art Conklin

Art Conklin

The differences between IT and OT teams and approaches both make life difficult and represent opportunities to improve industrial operations.

Dr. Art Conklin is the Director of the Center for Information Security and Education at the University of Houston.

Go To The Podcast Channel ️

          

READ HERE THE FULL TRANSCRIPT:

Intro: The Industrial Security podcast with Andrew Ginter and Nate Nelson, sponsored by Waterfall Security Solutions.

Nate: Welcome to the Industrial Security podcast, I’m Nate Nelson here with Andrew Ginter, vice president of Industrial Security at Waterfall Security Solutions. I’m going to give it to him, he’s going to introduce our guest for today. Andrew.

Andrew: Thanks, Nate. Hello, everyone, our guest for today is Professor Art Conklin, he is the director of the Center for Information Security Research and Education at the University of Houston, and he is a longtime specialist in industrial control system cyber-security.

Nate: Then without further ado, Art Conklin.

The subject of today’s episode, the convergence of OT with IT security. Art, my first question to you, in industry, we see a lot of integration of OT and IT security teams into a single team with corporate wide responsibilities. Is this a good thing or a bad thing?

Art: Well there’s the thing, it’s happening and it’s happening across many different industries, it’s being driven by management. So, if you look at it from a management perspective, it’s a good thing. They have less senior people managing wider swaths and you’re combining 2 different types of security functions under a common eventual reporting structure at the very top. But it’s also potentially very bad because operationally, from an operational perspective, these 2 systems are completely different. They have different security objectives, and because of these different objectives, the OT security team operates completely differently than an IT security team.

Nate: So, Andrew, I asked this of Art now I ask it of you, is IT/OT integration a good thing or a bad thing?

Andrew: Well it’s not so much that it’s a good thing or a bad thing, it’s a thing that’s happening. The Gartner Group coined the phrase back in I think it was ‘95 or so, and back then, this was for 20 years ago, they observed that even back then, increasingly, industrial control systems were using the same technology as IT systems, the same computers. Back then, this was the day of Solaris, Sparks and HP-UX, PA-RISC, there was a lot of CPUs in play. Nowadays, I’m sorry, it’s all Intel or maybe AMD. It’s the same hardware, it’s the same operating systems, it’s a lot of the same platform software; the same databases, the same applications. And so, when there’s all of this technology congruence coming, the Gartner Group observed that it makes sense to have the people who are managing all of this do it the same way. So, they suggested integrating the people because the skill sets are the same, integrating the business processes, the purchasing processes. If we have a corporate license for Oracle, why does it make sense to buy a sequel server when we need a relational database in the plant? And so, they observed that, because the technology was converging, it makes sense to converge the people as well. It was very controversial in the day, but they have been proven right, it is happening a lot.

Nate: I see. The next question I asked Art was about response protocols, how all this fits into that picture. Let’s hear what he had to say.

You mentioned that IT and OT systems are fundamentally different. How does this affect each of their respective security response protocols?

Art: Oh, it matters a ton. The first and foremost the most important thing to understand is, in the world of operational technology, that SCADA, that’s any cyber-physical system that’s placed where computers or controlling actual equipment, your number 1 problem there is that your operational technology your IT functions so to speak is part of that process. It’s not ancillary, it is part of the running process. If it stops working, the process can stop working. If it goes awry, the process can go awry. And so, we’ve seen mega-disasters because of this, pipeline explosions, things like that slowly because, oops, the IT function didn’t quite work as planned. The other aspect about how they’re different is different protocols. The OT operates under completely different set of protocols in the IT function, and with those protocols come things inherent to the protocols. They operate with different architectures. Why different architectures? Because they have different security objectives. And I think this is the biggest is OT has one set of things they’re trying to achieve while IT has a completely different set.

Nate: So, that sounds like a lot of differences, I thought the whole point here was that there were commonalities that we can take advantage of.

Andrew: Well, it is a lot of differences, but if you think about the differences that Art was pointing out, he’s pointing out differences in priorities. He’s talking about differences in objectives, he’s talking about different architectures which is different ways of arranging the same technology. He’s not talking about technology differences. The Gartner Group pointed out that IT/OT integration is coming because the technology is the same. I think the lesson here, this is the lesson of any organization doing IT/OT integration, we have to deal with these differences in order to reap the benefits of commonizing the technology.

Nate: And Art expanded on what he was talking about, let’s return to him.

Can you speak more to the differences in security objectives with OT and IT?

Art: Oh, sure. IT security objectives, I think pretty much everybody in the world by this point is used to what they are, it’s built around the concept of CIA, confidentiality, integrity and availability of your data. It’s really built around the movement of data, what moves the data, things about the data. In the operational technology world, it’s a completely different aspect. They define something as being secure if it’s running in a safe and optimal manner, “Is the process doing what the process is supposed to be doing safely?” And that safety factor is something that’s completely separate from an IT world. Though nobody died from missing an email, although, okay, maybe you didn’t get the email, you didn’t get the job you wanted, so you jumped out your window. It’s a little different than and if your IT signal, your email showed up 2 milliseconds late so your pipeline explodes. In the OT world, because the operational technology is an integrated part of the physical system, when it goes down, your physical system can go down. When it goes wrong, your physical system can go wrong. And we’ve got lots of documentation of things where, because of a signaling function on the operational technology net with error, things get broken, things get damaged, you can hurt people, you can hurt revenue, you’re going to end up a fines, the environment can be damaged, depends on the system you’re running. But the scope of what can go wrong, way wider, hence the need for safe and proper operations.

Nate: It seems we have a few balls in the air. Andrew, could you sort of condense this question? What is the essential point that that Art is making?

Andrew: Well, Art is talking here about priorities and about consequences. When somebody steals my personal information, the consequences are they might impersonate me, they might steal money from me, there are business consequences to IT breaches. What Art’s talking about is physical consequences. And so, the priorities, the priority at every industrial site I’ve ever visited is safety first. You go there, the first thing you do is take the safety training before you’re allowed to set foot in the site. And Art points out the classic IT SEC priority is, “Protect the information.” The priority at industrial sites is, “Protect the physical operation from the information.” All cyber-attacks are information. We don’t need to protect the information at industrial sites, we need to protect physical operations, physical safety, physical continuity and reliability and quality of output from attacks that might be embedded in the information; different priorities.

Nate: Now, let’s move on from this discussion. The next thing I asked Art about was architecture, let’s hear what he has to say.

Now that you’ve spoken to the differences in architectures between OT and IT, what are the main differences and why do they matter?

Art: Again, I think it’s to start with the architecture, most people know and understand and I could say love or they may not, and that’s the IT architecture where everything literally is connected to everything. The beauty of Internet Protocol is it sorts out all your traffic, it puts the right conversations to the right machines, to the right ports, and you don’t have to do anything. Networking folks love large flat networks because, the fewer interruptions to the network, the easier their job is to keep all the bits flowing. And so, at the end of the day, your networking team in IT tries to get every machine connected to every machine and let the Internet Protocol sort it all out. They’re not a fan of segmentation, they’re not a fan of firewalls or excessive numbers or any of these things, simply put because they just want everything to talk to everything. In the operational technology world, it’s the exact opposite case; we don’t want everything to talk to everything. The operational technology world is built around a real-time communication system that’s part of the process. And so, timing is a critical element, timing was planned when the system is designed and built. So, we don’t want outside traffic, we only want to connect the speakers that need to talk to the devices that need to listen. And all those conversations were defined when the system was designed and built. And so, kind of very limited communications, specifically between certain pieces of equipment for certain purposes due to their protocols. This has led to a completely different architecture. Rather than being a big flat architecture, it leads to highly segmented architectures, it keeps conversations in their own little network segments. This led to something called the Purdue Network Architecture, and that network architecture has become a major part of the security solution. And remember, security here is being defined as keeping the system running safely and within the proper realm of operations. And so, in the case of OT, the highly segmented network is part of what makes the system operate and run, and this has really large ramifications when it comes to how normal security operations are performed on a system.

Nate: Personally, I think that segmentation is always one of the most interesting aspects of any of these discussions.

Andrew: It is, in a sense, it’s intrinsic to industrial control systems. Segmentation is a big deal. You look at the very first industrial cyber-security standard that came out that I’m familiar with, the 2007 IEC 62443-1-1, back then, it was is ISASP99. It spent something like 20% of the first standard talking about zones and conduits, which is their terminology for segmentation. And I think Art’s observation here is that segmentation is naturally part of control systems, it’s the principle of protecting components from accidental outside interference, protecting sensitive equipment from accidental interference. This is an older principle than protecting that equipment from deliberate outside interference, but in both cases, it’s talking about protecting from the outside.

Nate: Right. And it’s so interesting hearing about this sort of subject from different guests of our show. The next subject that I wanted to build on with Art was about IIOT, something that’s come up previously on the show and has different implications on that question of segmentation. Here it is.

So, you’ve touched on a fundamental difference between OT and IT paradigms, that is, IT tends to be diffuse and very interconnected, whereas OT is very segmented, that separate industries are separated from one another. You’ve spoken to a fundamental difference in paradigm between IT and OT that the former tends to be diffused and interconnected, and the latter quite segmented. Now, previously on this show we’ve had some guests speak to the integration of a IIOT technology, that is industrial Internet of Things that would presumably further connect industrial systems. Have you heard about this? What are your views on it? Is this a good thing or potentially dangerous?

Art: Well, I think the most important thing to remember are just because I suddenly call it the industrial Internet of Things, there was a purpose, you put something in place. I’m not connecting things just because I can, I’m connecting things because I need to for specific purposes. And so, as long as you remember those aspects and you make the connections between all these different devices because these are purposeful as part of my solution, then it makes sense and it works. If I just do it that, hey, my refrigerator at home can connect to my neighbor’s refrigerator and it could also connect to the local Best Buy store, anybody that’s out there connected a refrigerator just because that does not make sense. And that’s where we were on awry of assuming, “No one will connect to me just because I’m there.”

Nate: So, the way Art describes it, it seems quite obvious and sensible what he’s saying, but we’re talking about connecting devices and refrigerators, Andrew, I’m curious, what exactly what types of devices are we talking about here? Are we still talking about operational devices or are we talking about IT now?

Andrew: Well, I think we’re talking about operational devices. The IIOT is still in its infancy. A lot of people are talking about it, but there’s not that many people deploying stuff. The classic application, the IIOT application that everyone talks about is predictive maintenance, predicting when equipment needs to be maintained so that it doesn’t just fail and shut us down unexpectedly. So, for example, a new kind of IIOT application I saw a little while ago was monitoring engine oil quality on big diesel engines. You might have these big diesel engines driving compressors or pumps in oil and gas pipelines. You might imagine let’s say we put the IIOT sensors on these diesel engines, to Art’s point, they need to be connected through the internet to the vendor who’s the expert on these engines who’s going to monitor the engines and give us advice as to when we should schedule maintenance for them. So, that’s a connection that’s needed. If we look at it and say, “Well, these engines are part of the control system, how about I connect the sensors on the engines to the control system, and through the control system, to the IT network and then straight out to the internet?” that’s not a connection that’s needed. We need to connect these sensors to the internet, it may be convenient to connect the sensors to the IT network to get out to the internet. We don’t need to connect these sensors to the control network and introduce a new attack pathway from the internet through the sensor into our control systems.

Nate: So, then can this question be seen as a risk management problem? Because connecting oil diesel trucks seems like a low risk situation, whereas what you just said, it seems like what we’re talking about here is trying to connect things that, should they be vulnerable, wouldn’t harm essential systems. I phrased that poorly, but you get the point.

Andrew: Yeah, absolutely. I take Art’s point which is, we don’t connect things because we can, we don’t connect things because it’s convenient to do so, we connect things because we need to do so. And that’s a major principle of controlling the number of paths that we have to attack our critical systems. And we need to come back to Art’s interview.

Nate: Fair, then let’s do that.

In what ways does the integration of OT and IT teams go wrong typically?

Art: Well, there are many, many different ways that things can go wrong, but when we go look at the fact IT and OT are complete in how they’re built out how to run, the number 1 issue is when an IT person that’s doing a security function assumes that this OT system was just like his IT system. So, when the IT security person that a box on the OT and wonders, “Why isn’t it authenticating to me?” well, it never will because there’s very limited authentication systems built into some of the equipment, and so authentication is not a response used between boxes. If the IT security work person that’s now working an OT circuit decided, “Why is it crypted?” well find that, guess what? Again, cryption is something we just don’t do in IT network space because the low communication between different pieces of equipment typically don’t support that. Number one thing to look at is, what did IT like 30 years ago, 20 years ago? That’s what OT typically looks like; because a lot of the equipment in OT is a lot older than IT. It doesn’t have a 3 year equipment refresh cycle, it has a 30 year equipment refresh in some cases. And so, when you have a piece of equipment that’s running for that long, you can’t say that, “Oh, current state-of-the-art says we’re going to do X,” because current state-of-the-art, you’re looking at piece of equipment that was installed in 2001 and it’s just now reaching midlife. And so, this is a huge problem where people just make that mistake and thinking, “Well, we’ll just patch it,” we don’t do patching an OT. We will, but only when the system itself is turned off. How do I hash something to verify my piece of data, my code, whatever is correct? Well, we do hash checks all the time when updating things I’m looking at files an IT, those very functions don’t exist in the OT world. So, that’s your first mistake, just assuming that, “Hey, it’s a white box or it’s just like everyone I’ve ever seen.”

The second, and it’s related to this, is when a security person treats an OT element like it’s an ancillary support process, like it’s an email server, it’s a web server. There’s obviously other servers that’ll take over its capabilities if I slow it down for some reason or I have to go take it offline or reboot something. And in this case, consider it sort of like you’re driving down the highway and you decide to make these sort of changes in your car while it’s running, you’re going to reconsider completely doing an engine overhaul or, “Hey, I’m going to do something that’s going to turn off the engine while I’m driving down the highway.” But yet, in the OT world, when the processes run, the IT components, the OT piece that looks like IT is a functioning part of in essence that car going down the highway. And so, that little, “Oops, oh, I had to reboot a computer,” that’s, “I had to reboot your plant. I had to reboot your traffic signal. I had to reboot your pipeline. Oh, your airline engine that’s running on the side of the airline right 33,000 feet, yeah, I was getting some security data and, well, we just rebooted it.” So, the places these go wrong resolved around assumptions of, “I think I understand your system,” without any clue of the complete system. And so, that’s one of the big hurdles on trying to join these 2 teams together.

Andrew: So, Nate, this is classic. The IT guy I on nightshift logs in to the plant to apply the new IT security policy, all computers company-wide will be rebooted at 2:00 in the morning and the latest security updates applied. So, he presses the buttons and the whole plant shuts down. It would be funny if it didn’t cost so much. The simplest IT class changes can impair operations. You hear these examples, the IT guy logs in and says, “New policy, antivirus everywhere,” fine, he puts antivirus everywhere, it runs a scan at 2:00 in the morning and impairs the operation of the process historian. The historian has to gather values from the process second-by-second and record them. If you take it down for 20 minutes because you’re scanning the hard drive and that historian kind of really can’t do anything in that time because all of the Disk I/O Bandwidth is being taken up by the antivirus scan, you’ll lose the batch record. In some industries, this is a very big deal. In pharmaceuticals, if you lose the batch record, if there’s a hole in your batch record, you cannot sell the product, you have to throw it away. I was working with a Pharma plant once, their batches took 3 months. It was a very high tech plant, their batches took 3 months to produce 1 batch of stuff. Each batch of stuff was worth a quarter billion dollars, 250 million dollars. So, yeah, a plant shutdown is the simplest consequence, as Art points out, there can be much more serious consequences for different kinds of malfunction.

Nate: Alright, let’s get back to Art.

I think you’ve hit on something that that feels like a sort of theme of this podcast, which is trying to integrate modern solutions, modern emerging tech, IT tech with older aging OT infrastructure. What are the right ways of doing this? How do we solve this problem?

Art: Well, I think there’s an opportunity for a lot of cross-pollination so to speak. For instance, if you ask an IT security person for a list of IP addresses that your equipment should never talk to, my webserver or my clients are things like that should never go to these IP addresses, any security team worth their salt will have a mechanism (and there’s a wide variety different ones) that’ll say, “Hey, we don’t go to these sites. The key address is nope, nope, nope, nope, just nope.” That concept should apply to the OT as well should anything from one of these band sites try to connect and our external connections OT network’s essential. So, there are aspects of IT security that could really, really help the OT people, they’re not necessarily up-to-date on (unclear) [25:26] greatest of how some of those happen. Likewise, things like, we’re going to have a third party connect in, part of the contract is they have to maintain some of our equipment. So, how are they going to connect in? There’s been a lot of advances between VPNs, remote desktop, jump hosts that are well understood and well managed in the IT world, and again, the OT people can benefit from that; how to structure the extra connectivity to achieve those things, there are places where the IT people can bring significant capability to the OT team.

Nate: Okay, so here finally are the benefits of cross-pollination, right?

Andrew: That’s right. We’ve talked a lot about how OT is different, OT is special, but IT does have knowledge and skills and experience and even technology to bring to play. This sort of cliché is that what IT has to learn is that you can’t put antivirus and security updates everywhere, some of the OT systems are special. What OT has to learn is that, not all of their systems are special, and we really should put antivirus and updates almost everywhere. And in particular, to Art’s point, any equipment that interacts with the outside world, that gear is almost certainly not critical. Almost all industrial control systems are designed to run indefinitely correctly without any input from the outside world. And so, if we have equipment at the interface to the outside world, it’s to serve a business function, it’s the interface to the more dangerous internet exposed networks, those systems should be secured fully to IT standards. And we can do so safely because the control system is designed not to depend on those systems. If they go down because the update doesn’t work and they need to be reloaded for a day with a backup of the operating system, that’s fine, the plant keeps working.

Nate: And my final question for Art was about general takeaways, here it is.

Art, is there anything we’ve yet left on the table or do you have any major takeaways from our discussion?

Art: Well, there are a couple takeaways. I think, first of all, we have to realize that integration convergence is going to happen to this team whether you resist it, want it, don’t want it, it’s a business function. From a management perspective, it actually makes senior management more focused on the things that need to happen, there’s some streamlining, having 2 different security functions going all the way to the board of directors just would not make sense. From a worker training perspective is where we talked about there is some images that IT security has through their tactics, techniques, and procedures that can be adapted and brought into the OT world to improve OT. It also helps with the training bench, the more workers, it’s hard to find OT security people, it’s hard to find IT security people. So, when you start getting them, the more you can cross train them and help build a better cross-functional team, the more ability you’re going to have in times of crisis or times a problem. So, there’s a lot of good that can come from smart integration. And what I’m going to say is, the key is that smart integration, it’s not a takeover, it’s not like, “Well, our IT security team has 85 people and your OT security team has 8, therefore, we’re 10 times more important than you and you’re going to do it our way now.” Well, suddenly, you realize that you would never have 85 IT security people in your physical processes running them, your operations department would say like, “No, no, no, no, no, no.” And I joke because teams that don’t do this smartly, it’s the easiest way to get an opening for a chief security officer is have an unsmart integration and you shut down the revenue generation of your company or you cause a major disaster because your IT team did not understand what they were doing and they broke OT. And that’s what happens when you forget why we had OT security, it’s been around since the beginning, it’s not something new, the OT people have always cared about the safety and the proper operation of their plan. And as long as you never lose sight on that, you don’t have a problem. When you lose sight, well, somebody has to be fired and it usually goes pre high up the food chain like the CSOs told, “Thank you.” So, I would say that there’s some serious takeaways. It’s going to happen, it could make life better if it runs in a smart way, but you really, really have to pay attention to the pitfalls because a pitfall here is much more expensive, “Oh, I lost a server.”

Nate: This makes a lot of sense to me. So, it’s not that you have to integrate or you have to integrate in a certain way, it’s that you have to be smart about integrating and take the good and consider the possible bad, and that companies can benefit, not just by doing these things, but by really thinking them through, right?

Andrew: That’s right. And a lot of organizations have been going down this road for over 20 years now; and some started sooner, some started later. What we observe in the most mature organizations is that the IT and OT functions are cooperating very closely, whether they’re 1 group or separate groups is irrelevant, they’re working very closely together, they’re using the same business processes, and they have a healthy respect for. Each other each of them understands the domains where they need to call on the other’s expertise to produce the best result for the business. And of course, others are still working this out. So, back in the day, the Gartner Group predicted that, within 20 years, something like I think it was 70% of industrial businesses would have merged their IT and OT teams. That was 20 years ago, they were mostly right. The process is completed some businesses and is still in progress at others.

Nate: Okay, that’ll be all today. Thank you to Art Conklin for speaking with me. Thank you, Andrew, for speaking with us.

Andrew: Always a pleasure, Nate.

Nate: Until next time, this has been the Industrial Security podcast, I’m Nate Nelson, thanks for listening.