Cybersecurity Risk Assessment using IEC 62443

Episode 104 of the Industrial Security podcast about the experience of using IEC 62443 for Risk Assessments

In this episode, Paul Piotrowski from Shell explains how and why risk assessments are needed and conducted for OT. IEC 62443 is a family of international cybersecurity standards that address cybersecurity for operational technology (OT)

Within the IEC 62443 family of standards, Section 3-2 (IEC 62443-3-2) covers OT risk assessments. Here, the standard is divided into different sections and describes both technical and process-related aspects of automation and control systems for the purpose of assessing the cybersecurity risks, especially threats and vulnerabilities. 

IEC 62443-3-2 assesses the cybersecurity risks for various roles:

  • Operators
  • Integration service providers
  • Maintenance service providers
  • Component providers
  • System manufacturers.

Each of these stakeholders needs to adhere to a risk-based approach to help prevent and manage cybersecurity risks according to the scope of their activities.

One of the first steps in this process is to understand where vulnerabilities are posing a possible threat and what can be done to mitigate it.

This podcast delves into the details of such risk assessments, 

Listen now or Download for later


Go To The Podcast Channel ️

About Paul Piotrowski

Paul Piotrowski is currently a Principal OT Cyber Security Engineer in Shell’s Global OT Security Discipline.  Paul consults on Global Capital Projects and supports Shell Operated and Non-Operated Assets across all business units.  Paul has spent over 22 years at Shell in various security roles including network operations, risk governance & compliance, audit, incident management, forensics, pen testing and project management.

Paul Piotrowski Shell Oil and Gas Cybersecurity engineer

IEC 62443-3-2 Risk Assessments

The episode starts by diving into the details of an IEC 62443-3-2 risk assessment.

  • Overview of cybersecurity risk assessments and the needs and drivers in that space.
  • Why an organization would do a cybersecurity risk assessment. 
  • Safety discipline, standards, requirements that prompt most assessments. 
  • Not every risk can be attended to, so prioritizing resources to where it will make the most difference.

Here’s transcript of this episode:

Please note: This transcript was auto-generated and then edited by a person. In the case of any inconsistencies, please refer to the recording as the source.

Here’s transcript of this episode:

Please note: This transcript was auto-generated and then edited by a person. In the case of any inconsistencies, please refer to the recording as the source.

Nathaniel Nelson: Welcome listeners to the industrial security podcast. My name is Nate Nelson I’m here with Andrew Ginter the VP of Industrial Security at Waterfall Security Solutions. He’s going to introduce the subject and guest of our show today Andrew how’s it going?

Andrew Ginter: I’m very well. Thank you Nate. Our guest today is Paul Piotrowski. He’s a principal OT cybersecurity engineer with global responsibility at Shell and he’s going to be talking to us about risk assessments. I mean, risk assessments are our first step in a lot of security programs. You know they’re a recurring task, you know once the program’s established. Paul walks us through a deep dive into the IEC 62443-3-2. 3-2 is the standard for risk assessments. You know most of we talk about 3-3 which is the standard for how to prevent attacks. So he’s going to talk to us on risk assessments. Best practices, some firsthand experiences, and some examples.

62443 Risk Assessment

Nathaniel Nelson: Then without further ado here’s your conversation with Paul Piotrowski. I would say that’s a relatively unique thing to say on this podcast, Andrew. In past episodes, whenever it comes up, I’ve always heard you harp on the issue of shaking loose budget that there’s always too little money and how do you convince people of the necessity and the the specifics of industrial security problems. In this case, he’s saying you know you know they had a case where they didn’t need the money so you could go use it elsewhere.

Andrew Ginter: That’s true but um, you know, what was it? Last episode a couple episodes ago. Ah, Matthew Malone was on from from Yokogawa he was talking about shaking money loose and he was saying, if you remember, you know, he’s working in the oil and gas industry but he’s working with the small to medium-sized organizations. He said unlike the majors and the super majors, it’s really hard to shake money loose in these sort of underfunded organizations. Shell is not a small organization! So yeah, they are one of the leaders in the space. They’ve been doing cybersecurity since the beginning. And so this kind of example is kind of what we expect from the leaders in the field as supposed to the smaller outfits that are scraping by on the border of profitability… 

Andrew Ginter: Hello Paul and welcome to the podcast before we get started. Could you say a few words about yourself about your background and about the good work that you’re doing at Shell.

Paul Piotrowski: Yeah, thank you? Andrew thank you for the invite. Glad to be here to talk about my experiences doing cyber security risk assessments at shell. So yes, I’ve been in this industry for a long time now. Twenty plus years at shell. Um, started my career early on in the operations kind of discipline running firewalls and networking and then eventually moved into cybersecurity and the last ten years I have spent my time as a global ot security role working globally. For our joint ventures and non-operative joint ventures and also involved in sands as well helped create to the ics 4 10 class a decade ago I also teach that class work with the sans organization on numerous different initiatives around the world.

Andrew Ginter: Okay, thank you for that. You know our topic today is cybersecurity risk assessments and you know the the needs the drivers that you see in that space. Can you can you? you know start at the top. Um, talk about risk assessments.

Paul Piotrowski: Yeah, so the last few years has been a lot of focus on cyber securityity risk assessments and there’s many drivers from our vantage point on why an organization would do a cyber security risk assessment. Um, so I’ll kind of iterate through you know the first one I see is regulatory requirements so certain industries like the functional safety iec 6 1 5 1 one actually requires it in their standard. So the functional safety discipline is driving a lot of the the work that we’re seeing internally within our organization. But more widely. there’s also an operation there’s ah there’s an appetite to understand your operational risk. We all know cybersecurity and ics is difficult. You can’t patch everything so you really have to hone in on the high risk items in your in your environment. And so doing a cybersec security risk assessment allows you to do that and allows you to apply your finite resources um to where it matters. Um, we also see risk assessments being used for internal and external assurance activities. So you know every organization has an audit arm and we use those as well to provide that necessary assurance across the organizational or lines.

Paul Piotrowski: And in unique situations we use Cyber Securityity risk assessments also to justify and refute investment decisions. So We’ve had a few cases where we do a cybersecurity risk assessment and it actually goes against. What the business capital investment plans are which is a unique kind of observation that we found in doing some some risk assessments.

Andrew Ginter: Okay, and and your other example, you know you you mentioned a situation where um, a risk assessment contradicted an an investment initiative. Um, can you give us just a little more insight into this. Um, you know. Was this a security investment that the risk assessment said was not necessary was this a you know, ah building a I don’t know a facility for processing hydrocarbons that um. You know the design proved too dangerous Cyberwise. What can you give us just a little more insight into what? what.

Paul Piotrowski: Yeah, absolutely so it was more of an obsolescence issue Andrew and so the business decided that they wanted to spend x-mount to update some obsolescent field devices for a SCADA application. And once we did this cybersec securityity risk assessment. We determined that we had lots of ah surplus equipment and our ah our philosophy around sparing was very sound and the overall cybersecurity and operational risks were very yeah. Were very low and so we actually went back to the business and presented that and we’re kind of working through that but that was an interesting kind of observation where the cybersecurity risk risk assessment activity determined that it was a low risk that we were mitigating the risks right of hardware failure, etc. And we deemed that it wasn’t in the best interest of the organization to do a large capital investment and the finite capital could be deployed somewhere else that is more value. Add to the organization.

Andrew Ginter: Well, that is unusual. Um, but Daniel coming back to the topic here. Um, you folks, you know you personally use the iec 6 2 4 4 3 standards routinely in your risk assessments. Um, especially the yeah. What was it the 3 dash 2 standard which is about risk assessments. Um, we’ve never really had somebody walk us through the standard can you talk about what is 3 dash two and and you know sort of what’s in there. What what do you use.

Paul Piotrowski: Yeah, so a lot of people think that ie see 6 2 4 4 3 is just kind of one set or 1 document but it’s actually a suite of documents and the 3 2 speaks to security risk assessments for system designs. And so we decided to use that standard because um, as you could understand. We’re a big organization and the engineering discipline really gravitates to the iec standards and we felt that we could scale it globally. Right? The use of it and as well. It’s more well understoodd than other risk assessment kind of methodologies in the engineering discipline and so there’s numerous cases where you know our internal team we. We’re not big enough to do all these cybersecurity risk assessments across the globe. And so you can easily go to a vendor a third party and ask them to follow the ic 6 to 4 3 methodology and usually you’ll get good results right? because it’s well understood and you have kind of the same basis for discussion. Um, that allows you to you know, execute these ah these risk assessments.

Andrew Ginter: That that makes sense I mean it makes sense that you use the IEC 62443 um, but you haven’t really told us about what the process looks like can you can you dig into 62443 -3-2 for us. You know you show up on site. What happens what sort of big picture? What does one of these assessments look like.

Paul Piotrowski: Yeah that’s a good question each assessment I would say is a little bit unique and you have to kind of customize it right? So usually in our organization a request would come in from a line of business or a particular asset where they would say we need some assistance doing a cyber security risk assessment. We don’t have the skill set on site. To facilitate it and they would call our organization or you know my division right? to help facilitate that so we would have a kickoff meeting. Um you know during covid it was obviously virtual. But now we’re going to cite some more often. You would. You know, sit down and I usually like to have a kickoff meeting you know and bring in all the necessary people that are going to be involved and really frame the discussion of what we’re trying to do and ah what we’re trying to do is identify the system under consideration. So what exactly are we assessing are we assessing a pipeline in SCADA Networks are we trying to cyber assess dcs system by a particular vendor is it a small manufacturing facility and is it a Gulf of Mexico platform. You know you really need have to hone down on where the scope ends. right? You have to kind of draw a circle of which system you’re actually going to be looking at that might include or might not include third-party connected systems, etc. Right? You have to draw the barrier right of what you’re actually going to be looking at….…You Want to do some reviews of previous audits or observations that the site has you need to look at Network Diagrams you want to walk down the site to understand you know how it’s physically connected. Whether it’s serial connections TCP wireless connectivity. You name it right? If There’s any standalone host you really need to get an inventory of what is in your system under consideration. Once you have a good understanding of you know what you’re going to be a cyber assessing. You break it down into logical or physical or functional kind of Zones. So if you have a wireless deployment of field Sensors. You know that might be one particular zone if you have transient assets like technical laptops that might be another Zone. You have engineering workstations, controllers etc. So You need to really break it down into into different zones. Once you have that identified and you have the host names and Ip addresses and vLANs and the technical details of these zones you would sit down with people that are knowledgeable. In those systems and have a discussion about the different threats and vulnerabilities that might exist for a particular zone. So I always use in my kind of initial kickoff meetings. Everybody knows ransomware. You know we’ve seen pictures and lots of publications in the news around ransomware and you have a discussion.

63443-3-2 Risk Assessment for Cybersecurity threats and vulnerabilities

What if this engineering workstation was encrypted with malware, right? Or with a bitlocker. Whatever it might be asking you for, cryptocurrencies right? What would be the impact right to the asset by this machine or this network going down. And then you would start talking about the likelihood of that occurring the consequences of that and then you start talking: How can we mitigate? And then you rank that, right? You have a Ram matrix that will talk about risk assessment matrix for your organization. We’ll talk more about that you know later in this podcast. But once you start assessing the unmitigated risk and of that occurring then you start talking about what controls do you have from an organizational standpoint that would mitigate that machine from being encrypted with ransomware and then you have those discussions. How well are those controls being operated et cetera etc and then you hopefully can reduce the likelihood of that occurring and and and then get validation that your controls that you have in your organization are effective or maybe. Not effective and then you have to put in additional controls in place to mitigate the risk of ransomware encrypting a particular machine is that will that suffice for you. Andrew.

And so some of the main 2 concepts that we use when we do this whole activity is talk about likelihood and consequences right? So the likelihood you have to determine whether it’s going to be a certain likelihood of a particular scenario materializing and then we’ll talk more about that in the coming. In the coming questions and then the actual consequences right? What can happen right? If that engineering workstation gets encrypted with ransomware is there ah a consequence to people to assets to community to the environment. You know to consequential business loss. Right of you rebuilding that machine loss of production. All those questions have to be answered as part of the 62443 risk assessment methodology and process that you will execute in your organizations.

Andrew Ginter:  Um, okay, um so I mean this sounds you’re going to drill down in a minute this sounds like a lot of work. Can you tell me how long does one of these engagements typically take? Are we talking months? Are we talking years?. Are we talking Days?. What’s the scale here?

Paul Piotrowski: Yeah, so it depends on national engagement but let me give you some examples for like a big capital project. Let’s say you’re building a new plant or you’re doing a big brownfield project. It’s not uncommon for it to take quite a bit of time one hundred plus hours OF preparation is key. And doing those workshops either in-person or in a virtual environment does take time. Let’s not sugarcoat this this is doing a proper cybersecurity risk assessment on a larger asset takes a long time. And so we’ve done risk assessments where we’ve identified forty zones. I gave you a few examples of the engineering workstations or technical transient laptops, etc controllers. You know those are some of the zones. But we’ve had bigger. Larger cybersec securityity risk assessments where it’s 40 plus zones that we’re assessing and we had at the biggest one that I have facilitated included ah over thirty five people globally right? We had people dialing in from a Mac vendor main automation contractor vendor in India. Providing feedback and input into their DCS Infrastructure. And then obviously on the smaller side I’m involved in a smaller cyber security risk assessment. It’s still taking quite a bit of time but we only have 6 people right in that particular risk assessment because those 6 people are very knowledgeable. You know on the process on the asset on all the connected systems and they’re able to make those final calls with regards to the consequences of a particular event. A threat scenario materializing and we we felt that we don’t need additional people because the 6 people involved um will suffice all right? but sometimes we do have to call in some other members from other teams to get involved. So. kind of words of wisdom: If you don’t involve the right amount of people and don’t invite people that don’t need to be there that are just listening in. You want people to be active contributors in the workshop or in the conference calls that you’re ah that you’re facilitating.

Andrew Ginter: Another question I had you mentioned sort of the outcome of the process you know was ah about consequence and likelihood of attacks causing consequences. Can you talk about likelihood? Are you talking about common malware winding up on, you know, some salesperson’s machine on the on the IT network when you’ve got 50000 people in your organization?. You’ve got statistics. You’ve got history. You can look backward and say this is the likelihood. We’re likely to have 8 of these next year because that’s how many we had last year um is that what you’re looking at going forward what does likelihood mean when you’re looking at sort of high impact low frequency events.

Paul Piotrowski: Yeah, this is a hard thing right? And like you said, very hard. It’s hard to get good statistics, right? There’s not a lot of data points that are easily accessible right to for and the OT environment that being said, one of the things you know that I see. IEC 62443 kind of states for the risk assessment methodologies to develop your own risk assessment matrix and assign kind of likelihoods or probabilities of certain threat scenarios materializing in your organization and so the way that we kind of have looked at it from a likelihood perspective is how likely is that threat? What’s the likelihood of that materializing? What exploitable, easily available tools exist? And how easy is it for an adversary that system. So let’s use an example: So if it’s a Windows 7 or Windows 10 machine, there’s lots of vulnerabilities. There’s lots of known exploits. Using Kali Linux there’s lots of exploits that exist, right? So the certainty or the likelihood right of that materializing is pretty high, with no controls whatsoever. On the other hand, if you have a switch that runs a proprietary version UX that is kind of locked up or encrypted,  That is very low risk, and low down in your architecture, [bad actors] would have to write custom code to be able to compromise that industrial switch or that PLC and exploit it all. So the likelihood of that occurring is much less? Because they’d have to invest a great amount of time to be able to exploit that so the way that we look at it, the likelihood of that happening in the organization right is much less. Yes, it is possible! We’ve seen, you know, 7 or 8 specific ICS attacks that have occurred but they’d need to be an APT (advanced persistent threat) with unlimited resources to make that a reality so that’s kind of our view on it. 

I know a lot of organizations struggle with the likelihood and how to frame it. But we kind of look at it. You know how easily is it exploitable dependent on the operating system and depending on where in the architecture it is.

Andrew Ginter: So you’ve been using 6244332 the risk assessment methodology for some time and you know that document is a number of years old now. Ah you know people learn by using the document. It eventually gets revised. Have you modified the methodology at all? Is there stuff that that you see that arguably needs to be changed in a future version of the document?

Paul Piotrowski: No, so we we’ve tried not to change it too much because the problem that we’ve encountered is if you change it then you need to explain it right to our vendors and our partners that are doing these cybersecurity risk assessments. We try not to change it as much as possible. Internally we have changed the color coding that we use on the on the matrix on the RAM matrix because there was some confusion with the HSSC RAM matrix and so we made those changes to differentiate between the 2 RAM matrixes. And then we also added a separate column around consequential business loss because our businesses wanted to understand what is the financial cost and they wanted to see it as ah as a consequence if a particular unit or a production line. Right is off spec or is shut down due to a threat etc and so those are kind of the 2 major ones that we’ve made that actually helped us facilitate. Ah the cyber security risk assessments in a better fashion.

Andrew Ginter: Okay, and now you’ve mentioned these matrices a couple of times. Can you go a little deeper? What is this matrix and you know you’ve talked about sort of Network Zones I think you have a matrix per zone. How does this work?

Paul Piotrowski: Yeah, so we actually have one matrix that we refer back to when we do these cybersecurity risk assessments and our matrix is a 5 x 5 kind of matrix that goes into the likelihood and so we have kind of 5 categories: Highly Unlikely. And then Unlikely, Probable Likely and Certain so getting back to my previous comments: If you know ransomware or malware infected via USB a Windows machine, we consider that a Certain that is certain to happen. And so. You have a combination of likelihood and then the consequences right on the on the other axis and so the consequences that we look at are from HSSC standpoint. Is there impact to the people? Is there impact to the physical asset? Is there impact to the community? Is there impact to the environment? And then is there any consequential business loss of a particular unit or process going down. What is it? What’s the cost associated with not being able to produce?

Andrew Ginter: Okay, so that’s an assessment. You’ve mentioned the risk matrix and you’ve talked about zones. Are you talking about zones in the sense of 62443? Network zones which sort of everyone else calls “network segments” or is a zone something else and how do you apply the risk matrix to a zone? Can you walk us through dealing with a zone please?

Paul Piotrowski: Yes, absolutely! From the system under consideration step, once we’ve identified what we want to assess, we break down the ICS environment into different zones. Now the zones could be a functional zone. It could be a network segment network vLAN. It really depends on what the team thinks is appropriate and so it could consist of 1 host or multiple hosts or multiple vLANs etc. Once you have a zone defined…it could be an engineering workstation or workstations. It can be a whole bunch of wireless devices. It can be calibration units. Whatever it might be, even switch Infrastructure. Once you have a zone defined and you understand what the ip addresses are what the physical devices look like. You say we’re going to hone in on that particular zone. We’re going to assess that zone from a consequence and a likelihood perspective and map it back to the risk assessment matrix that you as an organization have picked now. The RAM matrix can be a 3 x 3 matrix. It can be a 4 x 4 hours is a 5 x 5 matrix that has different categories for likelihood. We use Highly Unlikely, Unlikely, Probable, Likely and Certain and then we have different consequences as well, from a severity and impact perspective related to people, assets, community, environment, and consequential business loss. And so once you have a zone you would sit down with your team consisting of how many are able to attend and who are knowledgeable on the system. And you go through different threat scenarios right? What if malware is introduced. Is there any intellectual property that might be stolen from a particular zone? Um, you know those types of things right? If there’s an authentication attack on that particular zone and then you go through he matrix right? What is the consequence of that happening of that materializing and you first assess the unmitigated risk the unmitigated risk with no controls whatsoever applied and then you have a consequence and then you have a discussion about the likelihood. Right? Is it a windows machine is it is It is it a particular you know switch that has firmware on it etc. How exploitable is it and then you essentially you know, find a box right? And you have a discussion. with your with the people and how and find where it needs to be on the Ram matrix and then once you have the unmitigated consequence and the likelihood then you start talking about what controls your organization has do you have Antivirus installed on that Windows machine all right. How are you doing physical access controls to that machine? Do you have proper governance for that particular machine? And hopefully with those controls that you understand then and are operating effectively. The likelihood. Of that threat scenario materializing is going to be significantly less. The consequence will never change right? You can’t change the consequence from the unmitigated consequence but you can reduce the likelihood of that threat materializing by having. By doing patching, by having Antivirus installed by having it physically secure in a data center. Whatever it might be incident management right? All those controls that we live and breathe as ‘Cybersecurity Professionals’ should be implemented in some capacity which will reduce the likelihood to an acceptable level in your organization.

Andrew Ginter: And you’ve said that you try to avoid changing. You know 3-2 because you know it needs to be understood the same way by everybody in a large organization worldwide. Can I ask you maybe a different question. You’ve been doing this for a while you’ve been using 3-2 for a while um you know before you started the process when you got into this business: What do you wish someone would have told you sort of what’s the what’s the big you know the big piece of advice you can give to someone coming into this. Here’s what you want? Here’s what you need to watch out for.

Paul Piotrowski: Yeah I think one of the first things that I would recommend. Organizations do have a really good understanding of what controls your organization has in the OT environment who’s operating them and how mature are they from an effectiveness standpoint. There’s been too many cases where I’ve you know, started off a workshop and started talking about controls and then somebody pipes up and says well it’s not working here because we haven’t onboarded that system where that host is not part of the domain. Or this vendor hasn’t given us the go ahead to maintain it ourselves. They’re still maintaining it, etc, etc and you start going down a rabbit hole and it’s a huge deterrent for moving forward in the cyber security Risk Assessment Activity. So spend the time, front load a lot of this work and do spot checks to ensure whatever controls you have listed are operating effectively in your organization that will make the whole workshop and this whole activity go much smoother.

Andrew Ginter: All right, and you’ve you’ve talked about sort of the risk assessment team a few times. Um, who do you like to have on that team. How do you assemble that team.

Paul Piotrowski: Yeah, so you need to have the right people in the right room and that’s a very broad statement but you need to understand and get views from the operators, from the maintenance teams, from the safety system engineers, and network engineers. People need to be in the room that can talk about how the network is architected. You need Cybersecurity Professionals. You need risk Professionals. You also need a finance view! What if a certain unit goes down. What’s the financial implications to our organization by not being able to produce or have product that is off spec etc etc. So It really depends on the activity but you need to be able to have people that can talk to the people asset community environment and consequential business loss impacts in your organization.

Andrew Ginter: Okay, and when you get this team together. I mean we’ve talked about sort of the big lesson before you start once you get into the weeds. Have you found some surprises down there? What should we be watching out for when when we’re deep into the process.

Paul Piotrowski: Yeah, so certain zones that kind of surprised us. There were USB licensed dongles.You know dongles and USB still exist and if that license is…if that Usb device is stolen, or misplaced. Ah there’s lots of work right? You have to relicense the whole DCS environment., possibly purchase new licenses etc. So from a physical threat perspective the USB license dongles were their own separate zone and that was a bit of a surprise to us. And also handheld calibration units ah were a surprise to us too because the operations stated that these calibration devices in the wrong hands could actually reprogram a field safety system, even if your PLC is locked with the key position etc. So it bypasses a lot of the traditional controls so that was an interesting kind of zone that we assessed as well. Um, the HVAC system is traditionally not part of your OT environment. But in certain refineries and chemical plants. The HVAC system is really critical from a people perspective because it keeps the building pressurized against explosions and toxic gases. So ah, we’ve actually come across that a few times where the cybersecurity controls around the h fact the heating ventilation air conditioning systems is not well understood and so we’ve kind of raised that as well and then physical failures too right? We’ve started looking at that too because. Um, you know HP had a firmware bug after 40,000 hours on some of their SSDs. The hard drives failed because the firmware is not being updated on SSDs, which is kind of unique, not a lot of people think about that. So as you start doing these cybersecurity risk assessments. All these things these interesting kind of scenarios start to percolate up.

Andrew Ginter: Okay, so those are sort of surprises on the nasty side. Have you run into any surprises on the good side in terms of you know, controls or mitigations that you you discovered that sort of were more effective than you expected?

Paul Piotrowski: Yeah, Definitely yeah, So this is one of the reasons why I like doing these cybersecurity controls because I’m always surprised and I always learn from assets on how they what approach and what controls they have implemented so from a cybersecurity perspective most of us gravitate towards buying new equipment or new licenses etc. But. Sometimes the simple controls are the most effective and I’ll give you a few examples here. Yeah, an asset was using a Windows XP machine and they virtualized it so that was great. A lot of assets, you know, and Windows XP machines are still alive and well in many industrial control systems. But the cool control that they’ve actually implemented is they actually shut down this virtual machine that is an engineering workstation when not in use and so engineering workstations traditionally depending on your deployment are not always used. And this asset said “well we’re not using it” so we’re actually going to remove it from our attack surface by shutting down the virtual machine all right? And it’s only brought up when engineering functions need to be used and they put that under the permit to work process. And they also have a testing procedure to make sure that the VM does start up which I thought really cool. Firewall rules: I’ve seen another asset actually implement timed firewall rules. So they have a machine on their network that does periodic scanning and they didn’t want that to be compromised by an adversary to launch a destructive scan of the OT network so they actually implemented a 1-hour timed rule to allow the scan to initiate during the scheduled scan that occurs at that particular asset. So I thought that was kind of cool. And then decommissioning hosts. Consolidating hosts as well virtualizing host dealing with contracts updating contract terms all are kind of simple. You know, implementing dip switches. You know, additional training all those kinds of things are controls that don’t cost anything and are very easy to implement and very effective from a cyber security standpoint.

Andrew Ginter: You’ve mentioned safety a few times. Obviously cyber vulnerabilities can result in casualties or environmental damage are very serious. Um, but you’re dealing with sites that have done hazops and process hazard assessments in the past. How does cybersecurity fit with safety?

Paul Piotrowski: Yeah, it fits very well and there’s more importance on it. Andrew, I think in the coming years. We’re going to see more synergies and more dovetailing of those disciplines traditionally the functional safety teams are not cyber literate.Or that’s not their subject matter expertise right? It’s more of a functional safety but with architectures blending. There’s numerous vendors out there that have integrated sis and bpcs infrastructures. Um, the risk is substantial. And what we’ve tried to do this is kind of unchartered territory currently in the organization. But you know at the ah one of the things that you want to strive for in a mature cyber security risk assessment is understand if any of your hazop situations or scenarios are 100% cyber vulnerable and what I mean by that is you want the initiating event and all the controls that are implemented are cyber vulnerable or hackable. Ideally, you would not want that because you’re putting a lot of reliance on your cyber security controls. And so ideally you would like to have some kind of mechanical device to be able to protect you from a particular scenario or whether it’s over pressurization leaks, explosions etc… So You’d want like a pressure sensing valve maybe a relay or some kind of rupture disk or a mechanical oversea overs speed switch etc but a lot of the Hazops and the functional safety teams haven’t really considered this and do ah put a lot of reliance on Cybersecurity controls in their hazops.

Andrew Ginter: Okay! You said that sometimes when you look at a safety scenario, you conclude that.a cyber attack could, in the worst case, bring about that scenario if there aren’t physical mitigations in place. All of the mitigations in place are cyber. When you hit that, what do you do?

Paul Piotrowski: Yeah, this is an excellent question. The first thing is the hardest thing is finding those kinds of scenarios right? There’s a very time intensive and you have to read the hazop reports which sometimes are in the hundreds of pages. And cybersecurity individuals are not functional safety experts as well. So this is where you need to sit down with your functional safety people and really go through with a finetooth comb to kind of understand those certain scenarios. Once you have those threat lines you can have more in-depth discussions on whether that is acceptable to the organization or not. There’s no real easy answer to this where it’s not an “if” or “then” statement. And so you have to have those discussions with the organization and understanding that. We’re putting a lot of emphasis on our cyber security controls to keep ah our asset in safe production and safe manufacturing scenarios and maybe that will require you to go back and and make a justification as part of your business case to improve or increase your cybersecurity funding. Maybe you would have to go back and look at your hazop and for unacceptable kind of scenarios maybe put in a physical device. Next time you review the hazup with the organization. But this is kind of unchartered territory. A lot of organizations are still at the cusp. Understanding this and looking at this and understanding kind of next steps on how they should be doing this.

Andrew Ginter: So this has been great! And you know we’ve been talking for a while.  I’m going to ask you in a moment to sum up, but before we get there, is there anything we’ve missed? What are the sort of lessons would you like to impart from your long experience using the 62443 risk assessment methodology?

Paul Piotrowski: Yeah, so I think one of the things that was surprising to me when is when we started doing this, is that there’s actually some controls that our vendors were doing without our own knowledge. So yes, we have our own standard and we think we do a good job implementing it and operationalizing it, but we were surprised that some of our third-party contractors were actually doing cyber security risk and controls without our knowledge that we could actually take credit for! So my recommendation is also look outside or don’t have tunnel vision only internally within your own organization. Contact your third parties and hopefully they have an interest in this and are willing to have those discussions and ask: Are they doing firmware updates? Do they have any additional security features in their protocols that can be enabled right, etc etc. And you’d be surprised. Sometimes you will find a gem and you will be surprised by their responses.

Andrew Ginter: Okay, you’re on.

Paul Piotrowski: So one of our third-party contractors actually was upgrading their compressors without our knowledge so they were periodically. on an annual basis, doing maintenance and updating the firmware on their devices. And which was a surprise to us. They were kind of doing it without our knowledge as part of their maintenance cycles which we thought was ah was definitely enterprise first and was ah was very welcome. So yeah.

Andrew Ginter: Well Paul this has been great. Thank you so much for joining us and sharing your insights. Before we let you go can you sum up for us what are sort of the key takeaways here? What should we be be thinking about when we’re doing these risk assessments?

Paul Piotrowski: Yeah, so I think the theme that comes to mind is being realistic, Andrew. And I mean that in multiple ways. So first, I think we have to be realistic that we’re early on in this process or the trajectory that we’re on is going to require. Development of skills development of process, right? We have to build that organizational and industry muscle right to be able to do these cyber security risk assessments. So that’s point number 1. Point number 2 is when you do these workshops. Be realistic and have an operational mindset because when you’re doing these risk assessments in in the admin building or in head office somewhere in a conference room or on Microsoft Teams or Zoom a lot of the things on paper might seem like they’re a great idea at the time. You know, let’s roll out a new industrial firewall or a data diode in the field or whatever it might be, but have you considered is there enough space on the din rail? Is there enough power to the cabinet? Who’s going to do it? Who’s going to pay for it? Who’s going to support it? etc, etc. So you have to have an operational mindset and have those maintenance and operations teams as part of your discussion. Yes, this is a good idea but is it supportable or is it going to die on the vine? So have that at the back of your head don’t start implementing or recommending things that are not feasible to do in the field and then be a realistic. Don’t just talk about it. Don’t be an analysis paralysis! You know it’s not perfect, learn by doing yes, the first one or the first iterations of your cybersecurity risk assessment are not going to be perfect. But at least you’ll have one under your belt or 2 or 3 under your belt. You’ll have learnings. You can roll them into subsequent cyber security risk assessments. The only real way to learn is by doing, so encourage you to get involved and contribute as much as your schedules allow into your organizational cyber security risk assessments and with that… thank you very much for the invite, Andrew, I really appreciate it and I am on Linkedin. And so encourage anybody that has additional questions or comments feel free to reach out to me and make a connection.

Andrew Ginter: So Nate, you know, sort of paraphrasing what what I heard Paul say here in my understanding one of these assessments. You know typically takes a few weeks you know some I don’t know three weeks six weeks at a large site. This is my understanding. Um, you know it’s. It was interesting to me to contrast this with what we heard from Sarah Friedman Sarah was talking about the Cce methodology which is primarily a risk assessment methodology as well. You know, um I regard the Cce methodology as sort of the yeah, what’s the right word. The gold standard. In the industry for you know industrial risk assessment. She said that you know a typical Cce engagement will take you know three or four months which is longer than I understand the longest that you know the 3 dash 2 takes here. Um, and it’s ironic that. Ah, Sarah positioned the Cce Methodology you know I said why was it. Why was the methodology created she said because risk assessments were taking too long because she was talking about you know military grade risk assessments taking you know 1 to 2 years and by the time the risk assessment is done. The world has changed so much. The the results of the assessment are meaningless and so you know they shorten down that you know heavy duty risk assessment to a a mere three or four months yet here you know 3 2 seems to be yeah, what’s the right word. This is what most of the industry uses.

Andrew Ginter: And you know like I said idaho national labs is is trying to persuade people to go to the sort of the more thorough Cce sometime down the road that that you know is is taking rather longer.

Nathaniel Nelson: Yeah, and it seems great in theory that we’re um, implementing so much efficiency that a once to your process becomes three or four months becomes three or six weeks but like what exactly is it that we are.

Nathaniel Nelson: Shortening like what shortcuts are we taking or what are we making so much more efficient that we could make such a vast improvement.

Andrew Ginter: I don’t know if it’s an improvement I think it’s a different approach. Um my and I didn’t ask Paul this question but my understanding of the difference between 3 2 and Cce um is. What Cce calls system of systems analysis and you know and even more thorough mechanism. You know we had the gentleman on from amanaza who does the attack trees. That’s an even more thorough analysis of of attacks systems of systems. Analysis is looking for mechanisms by which you know a sequence of events by which an industrial control system could be compromised think of it as a path through the system to to result in compromise and attack path ah attack trees. Sit down and enumerate all or you know as close to mathematically possible all of the attack pads and we’re talking as many as over a billion you know with 10 to the ninth attack paths for a given site what what Cce tries to do is do that. Not ah. Algorithmically with technology they try to do it sort of manually but they go through the analysis and they try to find choke points so they they don’t try to enumerate by hand all the billion attack paths. You can’t do that you need technology for that. That’s why am a Naza has a market. Um, but you can do it sort of.

Andrew Ginter: Qualitatively and look through and say you know there’s a choke point here. All of this kind of attack. You know there’s only there’s only three ways to get into the system you have to come to 1 of these choke points and then they focus their defenses on the choke points in order to interrupt the largest number of attacks whereas. What I understand 3 two does is a little more qualitative. It looks at an asset like the engineer and workstation than the ransomware scenario that that Paul was talking about and says okay, um, how heavily defended is the asset have we got you know passwords and antivirus. Um, you know is how many layers of firewalls. Do we have out to to the you know the internet the source of all evil um and they they do. Ah you know a more qualitative assessment of how heavily defended is it and the more heavily defended something is the yeah the and in in a sense the more sort of the the yeah. The less common. The codebase is the harder. You know the more code you have to write to attack an asset the the more difficult it is to attack the asset the less likely a qualitative score you get so they they do more of a qualitative analysis Cce does you know attack pads and choke points whereas an attack tree. Will be comprehensive.

Nathaniel Nelson: So then in what scenarios is 3 dash 2 are appropriate versus when is CCE necessary or if if need be the military grade 1 to 2 year kind of assessment.

Andrew Ginter: Um, that’s a good question and that is something that’s debated in the industry. Um, you know everyone that I ask believes that what they’re doing is the right thing to do um but you know and I didn’t ask Paul this question. Maybe I should have. Um, so I you know I don’t want to put words in his mouth. But um, you know my guess is that the the majors generally are always open to new ideas I know these these people some of you know some of them are evaluating Cce they’re sending their people to the training they’re trying to figure out where to use which methodology. Um, I know that ah you know if you have um, an asset where the consequences of compromise are you know, truly unacceptable I don’t know if you have ah ah a power plant or a finery close to a population center. You know. You might want to be even a little bit more cautious and you know take on the the attack tree tool in addition to the Cce I don’t know this is this is not a space that that I’m really familiar with but I know that you know factors that that play into it are consequence. Ah, especially consequences for public safety and just sort of understanding I mean the the space continues to evolve the threat environment continues to continues to evolve you know owners and operators in the space they’re evaluating these tools. They yeah basically everybody in response to the the worsening threat environment.

Andrew Ginter: Everybody I talked to over time is becoming more and more thorough with everything they do everything from the the smallest organizations who say you know we got to get started to the largest organizations who say you know we got to take the next step. Um, you know. Where is the whole industry different industries in that space I I can’t say we we need to get an expert on who’s you know, sort of more familiar with risk assessments in lots of different industries.

Nathaniel Nelson: So it sounds like in his example, Paul’s describing company that sorry so it sounds like in his example, Paul’s describing an organization where security is sort of The process has been started. There’s stuff in place but it’s not actually done yet like the picture isn’t complete is this something that we would expect commonly.

Andrew Ginter: Um, unfortunately I think the answer is yes, um, industrial control systems are frequently really messy. Environments. There’s an incredible number of and ah vendors involved. There’s a huge number of devices involved you know and it’s not you know that. There aren’t tens of thousands of laptops on an it t network. It’s just that these devices tend to be more different than the same. Um, you know out of my own experience. I remember you know sitting in a ah planning process for deploying I mean waterfall sells undirectional gateways we were. Replacing an Ito O T firewall with a unitdirectal gateway and one of the things you have to do is figure out exactly what’s going through the firewall so that you can make sure you replicate all that stuff out to the it network through the gateway. So we’re going through reviewing the firewall rules 1 rule after another that says to me how we talked about it already. You know. that’s that’s the pie system okay um that rule you know that’s a system that we used to have five years ago and we don’t use anymore and you know people are looking at each other in the room saying well are the servers still deployed and they’re going well I think they are but you know. Who’s using who’s receiving that information on the it network are the are the receivers still deployed. You know we might have we might have thrown in the you know, erased them and throw them into garbage and you know they’re trying to figure stuff out on the fly and you know the next rule comes and they say what’s this.

Andrew Ginter: I don’t recognize this at all. No, that’s a leftover from the Cisco firewall that we had in the previous firewall. Not the current firewall and it meant blah blah blah and they try and figure out is it still around move on to the next rule. That’s the aberra kuta rule and you know the the question I thought the previous firewall was a Cisco. No, that’s the one from before the previous firewall that was the barrauda management system I think what can we get written. So you know like I said there this is more common than you might think they they are they can be noisy, messy environments and I think Paul’s point is you know do your homework before you waste the time. But. 12 people sitting around the room trying to do a risk assessment while you figure out in real time. You know what still exists in your plant.

Nathaniel Nelson: Please Andrew correct me if I’m wrong. It sounds like a lot of what Paul’s talking about is operational and has maybe less to do with cyber specifically.

Andrew Ginter: Um, well Ah, you know, you’re right to an extent you know cybersecurity. Um, a lot of the consequences of cybersecurity threats are not. You know they’re not all safety. Some of them are just the plant goes down. And so um, these risk assessments. Yeah in my recollection in my understanding they do tend to drift a little bit sometimes mostly they’re focused on security. They’re focused on deliberate attacks. Um, but you know they’re touching On. Ah. You know other cyber issues operational issues that might take the plant down and you know people generally welcome ah becoming alerted to issues like that that they might not have taken into account earlier. Um, but you know another reason in my estimation. Why people sort of. Don’t mind drifting across that line and then drifting back is insiders I mean Paul gave the example of the Usb dongle you might think hey if someone misplaces the the license donle and the whole control system stops working and the plant goes down. That’s an operational issue is it. That’s not really a cybersecurity issue unless it’s an insider. Who’s deliberately misplaced the dongle to take the plant down so you know the the operational space is where insiders start insider attacks start getting fuzzy is this an operational issue. Could it be a deliberate issue. Um, and so to me it it. It is relevant and it’s a line people don’t mind crossing. Yeah I mean I I got a lot out of this episode I mean Paul took us sort of through the basics and I was grateful for that I mean I’ve never done a risk assessment myself I’ve been involved after the risk assessment planning for mitigations. I’ve been involved in the risk management process sort of higher level up saying we need a risk assessment I haven’t actually done a risk assessment and being through the nuts and bolts of it so that that was useful. Um, you know he gave us a lot of details. Um, you know some of some of the. The learnings he he gave us are you know you can speed up the process you can make it more effective by getting the right people in the room so that when you ask a question about safety. There’s a safety person there who can answer it. You can get you know more effective use of the the 12 people sitting around the the table get more effective use of their time. If you ask them to do their homework and come up to speed on what security mechanisms have been installed what safety mitigations have been installed so that when you ask a question they have the answer on the tip of their tongue and they’re they’re not wasting the time of everybody in the room scratching their heads and arguing with each other. Um, you know. He mentioned a few very simple mitigations that I think deserve sort of better advertising. You know the the one that struck me was um, the yeah you know the engineering workstation turn it off when you’re not using it. Um, this was a piece of advice that I heard first you knowAll right? Well thank you to Paul for sitting down with you Andrew and as always Andrew thank you for speaking with me this has been the industrial security podcast from waterfall. Thanks to everybody out there listening. I’m grateful to Paul. Thank you Nate! It’s always a pleasure.

Previous episodes

Newsletter Signup