Cybersecurity Risk Assessment using IEC 62443 | Episode 104

Picture of Waterfall team

Waterfall team

Risk assessments are a staple of industrial security programs.

Paul Piotrowski, a Principal OT Cybersecurity Engineer at Shell, walks us through a deep dive into his experience using IEC 62443-3-2 risk assessments and the lessons he’s learned, with lots of examples.

The Industrial Security Podcast – Hosted by Nate Nelson & Andrew Ginter

The Industrial Security Podcast takes a deep-dive into the most pressing emerging issues in SCADA technologies today. But don’t just take our word for it: each new episode of the show features a leading voice in the world of industrial control systems security.

You’ll hear from executives, engineers, researchers and more, each with their own unique take on what’s wrong with how we do things today, and how to fix it. ICS security is complicated. Here is where it all comes together.

What You Will Learn:

  • The Drivers of OT Cyber Risk Assessments: Understand the regulatory, operational, and financial factors compelling organizations to evaluate industrial control system (ICS) risks.

  • A Breakdown of IEC 62443-3-2: Discover the step-by-step process of executing the industry-standard methodology for system design risk assessments.

  • Effective ICS Zoning: Learn how to define the “system under consideration” and break environments down into logical, functional, or physical network zones.

  • Applying the Risk Assessment Matrix (RAM): See how to evaluate unmitigated risk based on likelihood and consequence (impact to people, assets, community, environment, and business loss) and apply controls to reduce it.

  • Building the Right Team: Identify the key stakeholders—from operators and safety engineers to finance professionals—needed for a comprehensive workshop.

  • The Cyber-Safety Intersection: Explore the critical relationship between process hazard analysis (HAZOPs) and cybersecurity, and why relying solely on hackable controls for physical safety is dangerous.

  • Uncovering Hidden Threats: Learn about surprising attack vectors in industrial environments, such as USB license dongles, handheld calibration units, and HVAC systems.

Transcript:

Thank you. [Music] And you go through different threat scenarios, there’s an authentication attack, right? What if malware is introduced? Is there any intellectual property that might be stolen? [Music]

Welcome listeners to the Industrial Security Podcast. My name is Nate Nelson. I’m here with Andrew Ginter, the Vice President of Industrial Security at Waterfall Security Solutions. He’s going to introduce the subject and guest of our show today. Andrew, how’s it going?

I’m very well, thank you, Nate. Our guest today is Paul Piotrowski. He is a Principal OT Cyber Security Engineer with global responsibility at Shell, and he’s going to be talking to us about risk assessments. I mean, risk assessments are our first step in a lot of security programs, you know, they’re, they’re a recurring task, you know, once the program’s established. Paul walks us through a deep dive into the IEC 62443-3-2. 3-2 is the standard for risk assessments, you know, mostly we talk about 3-3, which is the standard for, you know, how to prevent attacks. So he’s going to take us through risk assessments, best practices, some first-hand experience, and some examples. Then without further ado, here’s your conversation with Paul.

Hello Paul, and welcome to the podcast. Before we get started, could you say a few words about yourself, about your background, and about the good work that you’re doing at Shell?

Yeah, thank you, Andrew. Thank you for the invite, glad to be here to talk about, uh, my experiences doing cyber security risk assessments at Shell. So yes, I’ve been in this industry for a long time now, 20-plus years at Shell. Um, started my career early on in the operations kind of discipline, running firewalls and networking, and then eventually moved into, uh, cyber security. And the last 10 years I have spent my time as a global OT security, uh, role, working globally for our joint ventures and non-operated joint ventures, and also involved in SANS as well. Helped create the ICS 410 class a decade ago. I also teach that class, work with the SANS organization on numerous different initiatives around the world.

Our topic today is cyber security risk assessments, and you know, the, the needs, the drivers that you see, uh, in that space. Can you, can you, you know, start at the top, um, talk about risk assessments?

Yeah, so the last few years there’s been a lot of focus on cyber security risk assessments, and there’s many drivers from our vantage point on why an organization would do a cyber security risk assessment. So kind of iterate through, you know, the first one I see is regulatory requirements. So certain industries, like the functional safety IEC 61511, or actually requires it in their standard. So the functional safety discipline is driving a lot of the, the work that we’re seeing internally within our organization. But more widely, there’s also an operation, there’s a, there’s an appetite to understand your operational risk. We all know cyber security and ICS is difficult, you can’t patch everything. So you really have to hone in on the high-risk items in your, in your environment. And so doing a cyber security risk assessment allows you to do that and allows you to apply your finite resources to where it matters. Um, we also see risk assessments being used for internal and external assurance activities. So, uh, you know, every organization has an audit arm, and we use those as well to provide that necessary assurance across the organizational, uh, or lines. And in unique situations, we use cyber security risk assessments also to justify and refute investment decisions. So we’ve had a few cases where we do a cyber security risk assessment, and it actually goes against what the business capital investment plans are, which is a unique kind of observation that we found in doing some, uh, some risk assessments.

You mentioned a situation where, uh, um, a risk assessment, uh, contradicted a, a, an investment initiative. Um, can you give us just a little more insight into this? Um, you know, was this a security investment that the risk assessment said was not necessary? Was this a, you know, uh, building a, uh, I don’t know, a, a facility for processing hydrocarbons that, um, you know, the design proved too dangerous cyber-wise? What, can you give us just a little more insight into what, what…

Yeah, yeah, absolutely. So it was more of an obsolescence issue, Andrew, and so the business decided that they wanted to spend x amount to update some obsolescent fielded devices for a SCADA application. And once we did the cyber security risk assessment, we determined that we had lots of surplus equipment, and our, our philosophy around sparing was very sound, and the overall cyber security and operational risks were very low. And so we actually went back to the business and presented that, and we’re kind of working through that. But that was an interesting kind of observation where the cyber security risk risk assessment activity determined that it was a low risk, that we were mitigating, uh, the risks, right, of hardware failure, etc. And we deemed that it wasn’t in the best interest of the organization to do a large capital investment, and the finite capital could be deployed somewhere else that is more value-add to the organization.

I would say that’s a relatively unique thing to say on this podcast, Andrew, and in past episodes whenever it comes up, I’ve always heard you harp on the, uh, the issue of shaking loose budget, that there’s always too little money and how do you convince people of the necessity and the, the specifics of industrial security problems. In this case, he’s saying, you know, you know, they had a case where they didn’t need the money, so you could go use it elsewhere.

That’s, that’s true, but, um, you know, what was it, last episode, a couple episodes ago, uh, Matthew Malone was on from, from Yokogawa. He was talking about shaking money loose, and he was saying, if you remember, you know, he’s working in the oil and gas industry, but he’s working with the small to medium-sized, uh, organizations, uh, you know. He said unlike the majors and the super majors, um, you know, it’s really hard to shake money loose in these sort of, uh, underfunded organizations. Shell is not a small organization. So yeah, they are one of the leaders in the space, they’ve been doing cyber security since the beginning. Um, and so, you know, you know, this kind of example is, is, uh, you know, kind of what we expect from the leaders in the field as opposed to the, the small outfits that are, are scraping by on the border of profitability.

You folks, you know, you personally use the IEC 62443 standards routinely in your risk assessments, especially the, uh, what was it, the 3-2 standard, uh, which is about risk assessments. Um, we’ve never really had somebody walk us through the standard. Can you talk about what is 3-2 and, and, uh, you know, sort of what’s in there, what, what do you use?

Yeah, so a lot of people think that IEC, right, 62443 is just kind of one set, um, or one document, but it’s actually a suite of documents, and the 3-2 speaks to security risk assessments for system designs. And so we decided to use that standard, um, because, um, as you could understand, we’re a big organization and the engineering discipline really gravitates to the IEC standards, all right? And we felt that we could scale it, uh, globally, all right, the use of it. And as well, it’s more well understood than other risk assessment kind of methodologies in the engineering discipline. And so in, there’s numerous cases where, you know, our internal team, we’re not big enough to do all these cyber security risk assessments across the globe. And so you can easily go to a vendor, a third party, and ask them to follow the IEC 62443 methodology, and usually you’ll get good results, right, because it’s well understood and you have kind of the same basis for discussion, um, that allows you to, uh, you know, execute these, uh, these risk assessments.

That, that makes sense. I mean, it makes sense that you use the, the 62443, um, but you haven’t really told us about what the process looks like. Can you, can you dig into 62443-3-2 for us? You know, you show up on site, what happens? What sort of big picture, what, what does one of these assessments look like?

Yeah, that’s a good question. Um, each assessment, I would say, is a little bit unique and you have to kind of customize it, right? So usually in our organization, a request would come in from a line of business or a particular asset where they would say, “We need some assistance doing a cyber security risk assessment, we don’t have the skill set on site to, uh, to facilitate it,” and they would call our organization or, you know, my division, right, to help facilitate that. So we would have a kickoff meeting, right? Um, you know, during COVID it was obviously virtual, but now we’re going to sites more often. You would, uh, you know, sit down and I usually like to have a kickoff meeting, you know, and bring in all the necessary people that are going to be involved and really frame the discussion of what we’re trying to do, right? And, uh, what we’re trying to do is identify the system under consideration. So what exactly are we assessing? Are we assessing a pipeline infrastructure SCADA network? Are we trying to cyber, cyber assess a DCS system by a, a particular vendor? Is it a small manufacturing facility? Is it an, uh, is it a Gulf of Mexico platform? You know, you really have to hone down on where the scope ends, right? You have to kind of draw a circle of which system you’re actually going to be looking at. That might include or might not include third-party connected systems, etc., right? You have to draw the barrier, right, of what you’re actually going to be looking at.

And then you want to do some reviews, right, of previous audits or observations, right, that the site has. You need to look at network diagrams. You want to walk down the site to understand, you know, how it’s physically connected, whether it’s serial connections, TCP, wireless connectivity, you name it, right? If there’s any standalone hosts, you really need to get an asset inventory of what is in your system under consideration. Once you have a good understanding of, you know, what you’re going to be a cyber, cyber assessing, you break it down into logical or physical or functional kind of zones, right? So if you have a wireless deployment of field sensors, you know, that might be, you know, one particular zone. If you have transient assets like technical laptops, that might be another zone. If you have engineering workstations, controllers, etc. So you need to really break it down into, into different zones.

Once you have that identified and you have the host names and IP addresses and VLANs and the technical details of these zones, you would sit down with people that are knowledgeable in those systems and have a discussion about the different threats and vulnerabilities that might exist for a particular zone. So I always use in my kind of initial kickoff meetings, you know, everybody knows ransomware, you know, we’ve seen pictures and, and lots of publications in the news around ransomware, and you have a discussion. What if this engineering workstation was encrypted with, uh, with malware, right, or with a BitLocker, whatever it might be, um, asking you for cryptocurrencies, right? What would be the impact, right, to the asset by this machine or this network going down? And then you would start talking about the likelihood of that occurring, the consequences of that. And then you start talking, how can we mitigate? And then you rank that, right? You have a, a RAM matrix that we’ll talk about, Risk Assessment Matrix, for your organization, we’ll talk more about that, you know, later in this podcast.

But once you start assessing the unmitigated risk and, uh, of that occurring, then you start talking about what controls do you have from an organizational standpoint that would mitigate that machine from being encrypted with ransomware. And then you have those discussions, how well are those controls being operated, etc., etc. And then, you know, hopefully can reduce the likelihood of that occurring, and, and, uh, and then get validation that your controls that you have in your organization are effective or maybe not effective. And then you have to put in additional controls in place to mitigate the risk of ransomware encrypting a particular machine.

And so some of the, uh, the main two concepts that we use when we do this whole activity is talk about likelihood and consequences, right? So the likelihood, you have to determine whether it’s going to be a certain likelihood of a particular scenario materializing, and then the actual consequences, right? What can happen, right, if that engineering workstation gets encrypted with ransomware, right? Is there a consequence to people, to assets, to community, to the environment, you know, consequential business loss, right, of you rebuilding that machine, loss of production? All those questions have to be answered as part of the 62443 risk assessment methodology and process that you will execute in your organizations.

So I mean, this sounds, you’re going to drill down in a minute, this sounds like a lot of work. Can you, can you tell me, you know, how long does one of these engagements typically take? Are we talking months, are we talking years, are we talking days? What sort of, what’s, what’s the, the, the scale here?

Yeah, so it depends on actual engagement, but let me give you some examples. For like a big capital project, let’s say you’re building a new plant or you’re doing a big brownfield project, um, it’s not uncommon for it to take, um, quite a bit of time. 100-plus hours, right? Preparation is key, um, and getting and doing those workshops either in person or in a virtual environment, you know, it does take time, you know. Let’s not sugarcoat this, this is doing a proper cyber security risk assessment on a larger asset takes a long time. And so we’ve done risk assessments that, where we’ve identified 40 zones, you know, I gave you a few examples of the engineering workstations or technical, you know, transient laptops, etc., controllers. Um, you know, those are some of the zones, but we’ve had bigger, larger cyber security risk assessments where it’s 40-plus zones that we’re assessing. And we had, at the biggest one that I have facilitated, included over 35 people, um, globally, right? We had people dialing in from a MAC vendor, main automation contractor vendor in India, right, providing feedback and input into their DCS infrastructure, etc.

So, and then obviously on the smaller side, I’m involved in a smaller cyber security risk assessment, it’s still taking quite a bit of time, but we only have six people, right, in that particular risk assessment because those six people are very knowledgeable, you know, on the process, on the asset, on all the connected systems. And they’re able to make those final calls with regards to the consequences, right, of a particular event, threat scenario materializing. And, uh, we’ve, we felt that we don’t need additional people because the six people involved, um, will suffice, all right? But sometimes, you know, we do have to call in some other, other members from other teams to get involved. So kind of words of wisdom is, you know, involve the right amount of people, you know. Don’t, uh, don’t invite people that don’t need to be there that are just listening in. You want people to be active contributors in the workshop or in the conference calls that you’re, uh, that you’re facilitating.

Paul say here, in my understanding, one of these assessments, you know, typically takes a few weeks, you know, some, I don’t know, three weeks, six weeks at a large site. This is my understanding. Um, you know, it’s, it was interesting to me to contrast this with, uh, what we heard from, uh, Sarah Friedman. Sarah was talking about the CCE methodology, which is primarily a risk assessment methodology as well, you know. Um, I regard the CCE methodology as sort of the, uh, what’s the right word, the gold standard in the industry for risk, you know, industrial risk assessment. She said that, you know, a typical CCE engagement will take, you know, three or four months, which is longer than I understand the longest that, uh, you know, the 3-2 takes here. Um, and it’s ironic that, uh, Sarah positioned the CCE methodology, you know, I said, why was it, why was the methodology created? She said because risk assessments were taking too long, because she was talking about, you know, military-grade risk assessments taking, you know, one to two years, and by the time the risk assessment is done, the world has changed so much the, the results of the assessment are meaningless. And so, you know, they shortened down that, you know, heavy-duty risk assessment to a mere three or four months. Yet here, uh, you know, 3-2 seems to be, uh, uh, what’s the right word, this is what most of the industry uses and, uh, you know, like I said, Idaho National Labs is trying to persuade people to go to the sort of the more thorough, uh, CCE sometime down the road that, that, you know, is, is, uh, taking rather longer.

Yeah, and it seems great in theory that we’re, um, implementing so much efficiency that a once two-year process becomes three or four months, becomes three or six weeks. But like, what exactly is it that we are shortening? Like, what, what shortcuts are we taking or what are we making so much more efficient that we could make such a vast improvement?

I don’t know if it’s an improvement, I think it’s a different approach. Um, my, and I didn’t ask Paul this question, but my understanding of the difference between 3-2 and CCE is what CCE calls system of systems analysis and, you know, an even more thorough mechanism. Uh, you know, we had, um, uh, the gentleman on from, uh, amanaza who does the attack trees. That’s an even more thorough, uh, analysis of, of attacks. Um, systems of systems analysis is looking for, uh, mechanisms by which the, you know, a sequence of events by which a, an industrial control system could be compromised. Think of it as a path through the system to, uh, to result in compromise, an attack path. Uh, attack trees sit down and enumerate all or, you know, as close to mathematically possible all of the attack paths, and we’re talking as many as over a billion, one, you know, with 10 to the ninth attack paths for a given site. What, what CCE tries to do is do that not, uh, algorithmically with technology, they try to do it sort of manually, but they go through the analysis and they try to find choke points. So they, they don’t try to enumerate by hand all the billion attack paths, you can’t do that, you need technology for that, that’s why amanaza has a market. Um, but you can do it sort of qualitatively and look through and say, you know, there’s a choke point here, all of this kind of attack, you know, there’s only, there’s only three ways to get into the system, you have to come through one of these choke points, and then they focus their defenses on the choke points in order to interrupt the largest number of attacks.

Whereas what I understand 3-2 does is a little more qualitative. It looks at an asset like the engineering workstation and the ransomware scenario with that Paul was talking about and says, “Okay, um, how heavily defended is the asset? Have we got, you know, passwords and Antivirus? Um, you know, is, uh, how many layers of firewalls do we have out to, to the, you know, the internet, the source of all evil?” Um, and they, they do, you know, a more qualitative assessment of how heavily defended is it, and the more heavily defended something is the, uh, the, and in a sense, the more sort of the, the, uh, the less common the code base is, the harder, you know, the more code you have to write to attack an asset, the, uh, more difficult it is to attack the asset, the, uh, less likely a qualitative score you get. So they do more of a qualitative analysis. CCE does, you know, attack paths and choke points, whereas an attack tree will be comprehensive.

So then in what scenarios is 3-2, uh, appropriate versus when is, uh, CCE necessary, or if, if need be, the military-grade one-to-two year kind of assessment?

That’s a good question and that is something that’s debated in the industry. Um, you know, everyone that I ask believes that what they’re doing is the right thing to do, um, but you know, and I didn’t ask Paul this question, maybe I should have, um, so I, you know, I don’t want to put words in his mouth, but, um, you know, my guess is that the, uh, the majors generally are always open to new ideas. I know these, these people, uh, some of, you know, some of them are evaluating CCE, they’re sending their people to the training, they’re trying to figure out where to use which methodology. Um, I know that, uh, you know, if you have, um, an asset where the consequences of compromise are, you know, truly unacceptable, I don’t know, if you have a, uh, a power plant or a refinery close to a population center, you know, you might want to be even a little bit more cautious and, you know, take on the, the attack tree tool in addition to the CCE. I don’t know, this is, this is not a space that, that, uh, I’m really familiar with, but I know that, you know, factors that, that play into it are consequence, uh, especially consequences for public safety and just sort of understanding. I mean the, the, the space continues to evolve, the threat environment continues, continues to evolve, act, you know, owners and operators in the space, they’re evaluating these tools, they, uh, basically everybody in response to the, the worsening threat environment, everybody I talk to over time is becoming more and more thorough with everything they do. Everything from the, the smallest organizations who say, you know, “We got to get started,” to the largest organizations to say, you know, “We got to take the next step.” Um, you know, where is the whole industry, different industries in that space? I, I can’t say, we need to get an expert on who’s, you know, sort of more familiar with risk assessments in lots of different industries.

In the beginning of the industrial security revolution, engineers were told to use IT security principles, protect the information. We were told, we knew this was a poor fit, but it was all we had. Today, the top security priority at industrial sites is safety: don’t kill anyone, don’t cause an environmental disaster. And the second priority is reliability: do not shut down our factory or infrastructure. Today, safe and reliable operations use unhackable protections from cyber risks, not just cyber security. For a deeper look at the evolution of the revolution, we invite you to download Waterfall’s report on the Emerging Consensus for Industrial Security Engineering. You can access the report at the Waterfall website: waterfall-security.com/engineering-consensus or just go to the resources menu and click on white papers and ebooks.

You’ve been using the 62443-3-2, the, the risk assessment methodology for some time, and you know, that document is, is a number of years old now. Uh, you know, people learn by using the document, it eventually gets revised. Have you modified the methodology at all? Is, is there stuff that, that you see that, uh, you know, arguably needs to be changed in a future version of the document?

No, so we, we’ve tried not to change it too much because the problem that we’ve encountered is, so if you change it, then you need to explain it, right, to our vendors and our partners that are doing these cyber security risk assessments. So we try not to change it as much as possible. Internally, we have changed the color coding that we use on the, on the matrix, on the RAM matrix, because there was some confusion with the HSSE RAM matrix. And so we made those changes to differentiate between the two RAM, between the mid, two matrices. And then we also added a separate column around consequential business loss because our businesses wanted to understand what is the financial cost, and they wanted to see it, right, as a consequence if a particular unit or a production line, right, is off-spec or is shut down due to a threat, etc. And so those are kind of two major ones that we’ve made that actually helped us facilitate the cyber security risk assessments in a better fashion.

Now you’ve mentioned these matrices a couple of times. Can you go a little deeper? Um, you know, what, what is this matrix and, uh, you know, you’ve talked about sort of network zones, you, I think you have a matrix per zone, you know, how does this work?

Yeah, so we actually have one matrix, um, that we refer back to when we do these cyber security risk assessments, and our matrix is, uh, is a 5×5 kind of matrix, right, that goes into the likelihood. And so we have kind of five categories, right: highly unlikely, and then unlikely, probable, likely, and certain, right? So getting back to my previous comments, you know, with, uh, you know, ransomware or malware infected, uh, via USB, a Windows machine or whatever, you know, we consider that on a certain, you know, certain to happen, all right? And so you have, it’s a, it’s a combination of likelihood and then the consequences, all right, on the, on the other axis. And so the consequences that we look at, um, you know, or from HSSE standpoint, right, is the impact to the people, is there impact to the physical asset, is there impact to the community, is there impact to the environment, and then is there any consequential business loss, right, of a particular unit or process going down? What is it, you know, what’s the cost associated with not being able to produce?

Okay, so you’ve, you’ve mentioned the risk matrix, you’ve talked about zones, you know, are you talking about zones in the sense of 62443, you know, network zones which sort of everyone else calls network segments, or, you know, is a zone something else and, and how do you apply the risk matrix to a zone? What, you know, walk us through dealing with the zone, please.

So from the system under consideration step, right, once we’ve identified what we want to assess, we break the, the ICS environment into different zones. Now the zones could be a functional zone, it could be a network segment, network VLAN, right, it really depends on what the team thinks is, uh, is appropriate. And so it could consist of one host or multiple hosts or multiple VLANs, etc., right? So once you have a zone defined, right, it could be an engineering workstation or workstations, it can be a whole bunch of wireless devices, it can be calibration units, you know, whatever it might be, switch infrastructure, right. Once you have a zone defined and you understand what the IP addresses are, what the physical devices look like, etc., all right, you say we’re going to hone in on that particular zone, right? We’re going to assess that zone from a consequence and a likelihood perspective and map it back to the risk assessment matrix that you as an organization have picked. Now the RAM matrix can be a 3×3 matrix, it can be a 4×4. Ours is a 5×5 matrix that has different categories for likelihood, we use highly unlikely, unlikely, probable, likely, and certain. And then we have different consequences as well, right, from a severity impact perspective related to people, assets, community, environment, and consequential business loss.

And so once you have a zone, you sit down with your team, right, consisting of how many, of how many people, you know, are able to attend and who are knowledgeable on the system, and you go through different threat scenarios, right? What if malware is introduced? Is there any intellectual property that might be stolen from a particular zone, right? Um, you know, those types of things, right? If there’s an authentication attack on that particular zone. And then you go through the, the, the, the matrix, right? What is the consequence of that happening, of that materializing? And you first assess the unmitigated risk. The unmitigated risk with no controls whatsoever applied, right? And then you have a consequence, and then you have a discussion about the likelihood, right? Is it a Windows machine? Is it, uh, is it, is it, uh, a particular, you know, switch that has firmware on it, etc.? How exploitable is it, right? And then you essentially, you know, find a box, right, and you have a discussion, right, with your, with the people and how, and find where it needs to be on the RAM matrix.

And then once you have the unmitigated consequence and the likelihood, then you start talking about what controls your organization has. Do you have antivirus installed on that Windows machine, right? How are you doing physical access controls to that machine? Do you have proper governance for that particular machine, all right? And hopefully with those controls that you understand then and are operating effectively, the likelihood of that threat scenario materializing is going to be significantly less. The consequence will never change, right? You can’t change the consequence from the unmitigated consequence, but you can reduce the likelihood of that threat materializing by having, by doing patching, by having antivirus installed, by having it physically secure in a data center, whatever it might be, right? Incident Management, right? All those controls that we live and breathe as cyber security professionals should be implemented in some capacity, which will reduce the likelihood to an acceptable level in your organization.

All right, and you’ve said that you, you know, you try to avoid, uh, changing, uh, you know, 3-2 because, you know, it, it needs to be, uh, you know, understood the same way by, by everybody, you know, in a large organization worldwide. Um, you know, can I ask you maybe a different question? Um, you’ve been doing this for a while, you’ve been using 3-2 for a while. Um, you know, before you started the process, when you got into this business, um, what, what do you wish someone would have told you, sort of what’s the, what’s the big, you know, the big piece of advice you can give to someone saying coming into this, uh, you know, here’s what you, what you need to watch out for?

Yeah, I think one of the first things that I would recommend organizations do is have a really good understanding of what controls your organization has in the OT environment, who’s operating them, and how mature are they from an effectiveness standpoint. There’s been too many cases where I’ve, you know, started off a workshop and started talking about controls and then somebody pipes up and says, “Well, it’s not working here because we haven’t onboarded that system or that pers, that, uh, host is not part of the domain or this vendor hasn’t given us, you know, the go-ahead to maintain it ourselves, they’re still maintaining it,” etc., etc. And you start going down a rabbit hole and it’s a huge deterrent, right, for moving forward in the cyber security risk assessment activity. So spend the time, front-load a lot of this work, and do spot checks to ensure whatever controls you have listed are operating effectively in your organization. That will make the whole workshop and this whole activity go much smoother.

So it sounds like in his example, Paul’s describing an organization where security is sort of, the process has been started, there’s stuff in place, but it’s not actually done yet, like the picture isn’t complete. Is this something that we would expect commonly?

Unfortunately, I think the answer is yes. Um, industrial control systems are frequently really messy environments. There’s an incredible number of, in, of vendors involved, there’s a huge number of devices involved, you know, and it’s not, you know, that there aren’t tens of thousands of laptops on an IT network, it’s just that these devices tend to be more different than the same. Um, you know, out of my own experience, I remember, uh, you know, sitting in a, a planning process for deploying, I mean Waterfall sells unidirectional gateways, we were replacing an IT/OT firewall with a unidirectional gateway, and one of the things you have to do is figure out exactly what’s going through the firewall so that you can make sure you replicate all that stuff out to the IT network through the gateway. So we’re going through, reviewing the firewall rules, one rule after another. “That system, yeah, we talked about it already.” Um, you know, “That’s, that’s the PI system.” “Oh, okay.” Um, “That rule, you know, that’s a system that we used to have five years ago and we don’t use anymore.” And, you know, people are looking at each other in the room saying, “Well, are the servers still deployed?” And they’re going, “Well, I think they are, but, you know, who’s using, who, who’s receiving that information on the IT network? Are the, are the receivers still deployed?” You know, “We might have, we might have thrown in the, you know, erased them and throw them in the garbage.” And, you know, they’re trying to figure stuff out on the fly, and you know, the next rule comes and they say, “What’s this? I don’t recognize this at all.” “No, that’s a leftover from the, the Cisco firewall that we had in the previous firewall, not the current firewall, and it meant blah blah blah.” And they try and figure out, is it still around? Move on to the next rule. “That’s the, a Barracuda rule.” And we’re, you know, the question, “I thought the previous firewall was a Cisco?” “No, that’s the one from before the previous firewall, that was the Barracuda management system, I think. Well, can we get…” So you know, like I said, they’re, they, this is more common than you might think. They, they are, they can be noisy, messy environments, and I think Paul’s point is, you know, do your homework before you waste the time of 12 people sitting around the room trying to do a risk assessment while you figure out in real time, you know, what still exists in your plant.

You’ve talked about sort of the, the risk assessment team a few times. Um, who do you like to have on that team? How do you assemble that team?

Yeah, so you need to have the right people in the right room, and that’s a very broad statement, but you need to understand, uh, and get views from the operators, from the maintenance teams, from the safety system engineers, right, network engineers. So people need to be in the room that can talk about how the network is architected, you need cyber security professionals, you need risk professionals, you also need a finance view, right? What if a certain unit goes down, what’s the financial implications, right, to our organization by not being able to produce or have product that is off-spec, etc., etc.? So it really depends on the activity, but you need to be able to have people that can talk to the people, asset, community, environment, and consequential business loss impacts in your organization.

And when you get this team together, I mean we’ve, we’ve talked about sort of, uh, the, the big, the big lesson before you start, once you get into the weeds, you know, have you found some surprises down there? What, what should we be watching out for when, when we’re deep into the process?

Yeah, so certain zones that kind of surprised us, right, were, you know, there were USB licensed dongles, right? You know, dongles and USB still exist, and if that license is, uh, if that USB device is stolen or misplaced, uh, there’s lots of work, right, you have to re-license, right, the whole, uh, DCS environment, um, possibly purchase new licenses, etc. So from a physical threat perspective, the USB licensed dongles were their own separate zone, and that was a bit of a surprise to us. And also, uh, handheld calibration units were a surprise to us too because the operations stated that these calibration devices in the wrong hands could actually reprogram a field safety system even if your PLC is locked, right, with the key position, etc. So it bypasses a lot of the traditional controls, so that was an interesting kind of zone that we assessed as well. Um, the HVAC system is traditionally not part of your OT environment, but in certain refineries and chemical plants, the HVAC system is really critical from a people perspective because it keeps the building pressurized against explosions and toxic gases. So we’ve actually come across that a few times where the cyber security controls around the HVAC, the heating, ventilation, air conditioning systems, is not, uh, not, uh, well understood, and so we’ve kind of raised that as well. And then physical failures too, right? We’ve started looking at that too because, um, you know, HP had a firmware bug after 40,000 hours on some of their SSDs, the, the hard drives failed, right, because the firmware was not being updated on SSDs, which is kind of unique, right, not a lot of people think about that. So as you start doing these cyber security risk assessments, all these things, these interesting kind of scenarios start to percolate up.

Andrew, correct me if I’m wrong, it sounds like the, a lot of what Paul’s talking about is operational and has maybe less to do with cyber specifically.

You know, you’re right to an extent, uh, you know, cyber security, um, a lot of the consequences of cyber security threats are not, you know, they’re not all safety, some of them are just the plant goes down. And so, um, these risk assessments, yeah, they, in my recollection of my understanding, they do tend to drift a little bit sometimes, mostly they’re focused on security, they’re focused on deliberate attacks, um, but you know, they’re touching on, uh, you know, other cyber issues, operational issues that might take the plant down. And you know, people generally welcome, uh, becoming alerted to issues like that that they might not have taken into account earlier. Um, but you know, another reason in my estimation why people, you know, sort of don’t mind drifting across that line and then drifting back is insiders. I mean, Paul gave the example of the USB dongle. You might think, hey, if someone misplaces the, the license dongle and the whole control system stops working and the plant goes down, that’s an operational issue, is it? That’s not really a cyber security issue unless it’s an insider who’s deliberately misplaced the dongle to take the plant down. So you know, the, the, the operational space is where insiders start, insider attacks start getting fuzzy. Is this an operational issue, could it be a deliberate issue? Um, and so to me it, it is relevant, and it’s something that, uh, that you know, in, in my understanding of the space, it, it’s a line people don’t mind occasionally crossing.

Okay, so, so those are sort of surprises on the, the nasty side. Um, have you run into any surprises on the good side in terms of, you know, controls or mitigations that you, you discovered that sort of were more effective than, than you expected?

Yeah, definitely. Yeah, so this is one of the reasons why I like doing these cyber security controls because I’m always surprised and I always learn from assets on how they, what approach and what controls they have implemented. So from a cyber security perspective, most of us gravitate towards buying new equipment or new licenses, etc., but sometimes the simple controls are the most effective, and, uh, I’ll give you a few examples here. There, an asset was using a Windows XP machine and they virtualized it, so that was great, right? A lot of assets, you know, still Windows XP machines are still alive and well in many industrial control systems, but the cool control that they’ve actually implemented is they actually shut down this virtual machine that is an engineering workstation when not in use. And so engineering workstations traditionally, depending on your, uh, your deployment, are not always used, right? And this asset said, “Well, we’re not using it, so we’re actually going to remove it from our attack surface by shutting down the virtual machine,” all right? And it’s only brought up when engineering functions need to be used, and they put that under the permit-to-work process and they also have a testing procedure to make sure that the VM does start up, which I thought really cool. Um, firewall rules, I’ve seen another asset actually implement timed firewall rules. So they have a machine on their network that does periodic scanning and they didn’t want that to be compromised by an adversary to launch a destructive scan of the OT network, so they actually implemented a one-hour timed rule to allow the scan to initiate during the scheduled scan that the, that occurs at that particular asset. So I thought that was kind of cool. Um, and then decommissioning hosts, right, consolidating, right, hosts as well, virtualizing hosts, dealing with contracts, updating contract terms, right, all are kind of simple, you know, implementing dip switches, you know, additional training, all those kinds of things, right, are controls that don’t cost, um, but it don’t cost anything and are very easy to implement and very effective from a cyber security standpoint.

And you’ve mentioned, you know, safety a few times. Obviously, you know, cyber vulnerabilities that, that can result in, in, you know, casualties, environmental damage are, are very serious, um, but you know, you’re dealing with sites that have done, you know, HAZOPs and process hazard assessments in the past. How does cyber security fit with safety?

Yeah, it fits very well and there’s more importance on it, Andrew. I think in the coming years we’re going to see more synergies and more dovetailing of those disciplines. Traditionally the functional safety teams are not cyber, uh, literate, right, or that’s not their subject matter expertise, right, it’s more of a functional safety, but with architectures blending, there’s numerous vendors out there that have integrated SIS and BPCS infrastructures, right, um, the risk is substantial. And what we’ve tried to do, this is kind of unchartered territory currently in the organization, but you know, at the, one of the things that you want to strive for in a mature cyber security risk assessment is understand if any of your HAZOP situations or scenarios are 100% cyber vulnerable. And what I mean by that is you want the initiating event, uh, and all the controls that are implemented are cyber vulnerable or hackable, right? Ideally you would not want that, right, because you’re putting a lot of reliance on your cyber security controls. And so ideally you would like to have some kind of mechanical device to be able to protect you, right, from a particular scenario, right, whether it’s over pressurization, leaks, right, explosions, you know, etc. So you’d want like a pressure sensing valve, maybe a relay, or some kind of ruptured disc, or a mechanical oversee, overspeed switch, etc. But a lot of the HAZOPs and the functional safety teams haven’t really considered this and do, uh, put a lot of reliance on cyber security controls, uh, in their HAZOPs.

So you said that sometimes when you look at a, uh, you know, a safety scenario, um, you conclude that a cyber attack could in the worst case bring about that scenario and, um, there aren’t physical mitigations in place, all of the mitigations in place are cyber. When you hit that, what do you do?

Yeah, this is an excellent question. The first thing is the, the hardest thing is finding those kinds of scenarios, right? There’s a very time intensive, and you have to read the hazard reports which sometimes are in the hundreds of pages, and cyber security individuals are not functional safety experts as well. So this is where you need to, you know, sit down with your functional safety people and really go through with a fine-tooth comb to kind of understand the certain scenarios. And once you have those threat lines, you know, you can have more in-depth discussions on whether that is acceptable to the organization or not. You know, there’s no real easy answer to this where you, you know, this is not an if or then statement, right, um, and so you have to have those discussions with the organization and understanding that we’re putting a lot of emphasis on our cyber security controls to keep our asset in safe production and safe manufacturing scenarios. And maybe that will require you to go back and make a justification as part of your business case to incur, improve, or increase your cyber security funding. Maybe you would have to go back and look at your HAZOP and for unacceptable kind of scenarios and maybe put in, right, a physical device next time you review the HAZOP with the organization. But this is kind of unchartered territory, right? You know, a lot of organizations are still, you know, at the cusp of understanding this and looking at this and understanding kind of next steps on how they should be doing this.

So this has been great and you know, we’ve, we’ve been talking for a while, I’m going to ask you in a moment to sum up, but before we get there, you know, is there, is there anything we’ve missed? What, you know, what, what other sort of lessons would, would, uh, would you like to impart from your long experience using the, uh, the 62443 risk assessment methodology?

Yeah, so I think one of the things that was surprising to me when it’s, when we started doing this, uh, is that there’s actually some controls that we, our vendors were doing without our own knowledge, right? So yes, we have our own standard and we think we do a good job implementing it and operationalizing it, but we were surprised that some of our third-party contractors were actually doing cyber security risk and controls without our knowledge that we could actually take credit for. So one of our third-party contractors actually was upgrading their compressors, um, without our knowledge. So they were periodically on an annual basis doing maintenance and updating the firmware on their devices and, which was a surprise to us. They were kind of doing it without our knowledge as part of their maintenance cycles, which we thought was, uh, was definitely enterprise first and, uh, was, was very welcome. So my recommendation is also look outside or, you know, don’t have tunnel vision only internally within your own organization. Contact your third parties and hopefully they have an interest in this and are willing to have those discussions and ask, you know, are they doing firmware updates, right? Do they have any additional security features in their protocols that can be enabled, right, etc., etc., right? And you’d be surprised, right? Sometimes you will find a gem and you will be surprised by, by their responses.

Well Paul, this has been great. Uh, thank you so much for joining us and, and sharing your insights. Um, before we let you go, um, can you sum up for us? What are sort of the, the key takeaways here? What, what should we be, be thinking about when we’re doing these risk assessments?

Yeah, so I think the theme that comes to mind is being realistic, Andrew, right? And I, and I mean that on mult, in, in multiple ways. So first, right, I think we have to be realistic that we’re early on in this process or the trajectory that we’re on is going to require development of skills, development of process, right? We have to build that organizational and industry muscle, right, to be able to do the, these cyber security risk assessments. So that’s point number two, one. Point number two is when you do these workshops, be realistic and have an operational mindset because when you’re doing

Key Takeaways:

  • Audit Your Controls Before You Start: Document and verify the effectiveness of your existing OT security controls before gathering the assessment team. Trying to figure out what is currently deployed during a workshop leads to immediate roadblocks.

  • Simple Controls Can Be Highly Effective: Security doesn’t always mean buying new equipment. Simple practices—like shutting down engineering virtual machines (VMs) when not in use or implementing timed firewall rules for scheduled scans—are free and highly secure.

  • Beware of “100% Cyber-Vulnerable” Scenarios: Ensure that your safety systems have mechanical or physical fail-safes (e.g., ruptured discs, pressure-sensing valves). If a safety scenario’s mitigations are entirely digital, they can be hacked.

  • Leverage Third-Party Vendor Maintenance: Don’t work in a silo. Check with your contractors; they may already be performing firmware updates or utilizing secure protocols during routine maintenance that you can take credit for in your risk posture.

  • Maintain an Operational Mindset: A security control might look great on paper, but you must verify field feasibility. Always consider physical constraints like DIN rail space, cabinet power, and long-term supportability before recommending a solution.

  • Avoid Analysis Paralysis: Your first cyber risk assessment will not be perfect. The goal is to start the process, learn by doing, and build the organizational muscle memory required to improve subsequent assessments.

Share

Stay up to date

Subscribe to our blog and receive insights straight to your inbox