Podcast with auto template – Waterfall Security Solutions https://waterfall-security.com Unbreachable OT security, unlimited OT connectivity Mon, 20 Oct 2025 15:33:36 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.3 https://waterfall-security.com/wp-content/uploads/2023/09/cropped-favicon2-2-32x32.png Podcast with auto template – Waterfall Security Solutions https://waterfall-security.com 32 32 Lessons Learned From Incident Response – Episode 139 https://waterfall-security.com/ot-insights-center/ot-cybersecurity-insights-center/lessons-learned-from-incident-response-episode-139/ Wed, 09 Jul 2025 10:53:04 +0000 https://waterfall-security.com/?p=33748 Tune in to 'Lessons Learned From Incident Response', the latest episode of Waterfall Security's OT cybersecurity Podcast.

The post Lessons Learned From Incident Response – Episode 139 appeared first on Waterfall Security Solutions.

]]>

For more episodes, follow us on:

Share this podcast:

If you didn’t listen to a single thing I said, you can listen to these three things: collaborate, plan, and practice.

Chris Sistrunk, Technical Lead of ICS at Mandiant

Transcript of Making the Move into OT Security | Episode 118

Please note: This transcript was auto-generated and then edited by a person. In the case of any inconsistencies, please refer to the recording as the source.

Nathaniel Nelson
Welcome listeners to the Industrial Security Podcast. My name is Nate Nelson. I’m here with Andrew Ginter, the Vice President of Industrial Security at Waterfall Security Solutions, who’s going to introduce the subject and guest of our show today. Andrew, how are you?

Andrew Ginter
I’m very well, thank you, Nate. Our guest today is Chris Sistrunk. He is the technical lead of the Mandiant ICS or OT security consulting team, whatever you wish to call it.

Google purchased Mandiant in 2022, but they’re still keeping the Mandiant name. So he still identifies as the technical lead of industrial security consulting at Mandiant.

And our topic, they as part of their consulting practice, they do a lot of incident response. He’s going to talk about lessons learned from incident response in the industrial security space.

Nathaniel Nelson
Then without further ado, here’s your interview.

Andrew Ginter
Hello, Chris, and welcome to the the podcast. Before we get started, can I ask you to say a few words for our listeners about your background and about the good work that you’re doing at Mandiant?

Chris Sistrunk
Okay, thanks, Andrew. I’m at Mandiant on the ICS/OT consulting team, been doing that for over 11 years now, focused on ICS/OT security consulting around the world with every type of critical infrastructure, doing incident response, strategic and technical assessments, and doing training as well.

Before that, I was an electrical engineer, still am, but for a large electric utility,
Entergy. I was there over 11 years as a senior electrical engineer, transmission distribution SCADA, substation automation and distribution design.

So that’s a little bit about me. And again, just working for Mandiant as part of Google Cloud.

Andrew Ginter
And our topic is incidents. It’s lessons from incidents, but let’s talk the big picture of incidents. I mean, Waterfall puts out a threat report annually. I’m one of the the contributors.

We go through thousands of public incident reports looking for the needles in the haystack, the incidents where there were physical consequences, where there were shutdowns, where sometimes equipment was damaged.

And we rely on the public record, on public disclosure. And so I’ve always believed that we were under-reporting because I’m guessing, again, I don’t have that many confidential disclosures that that people tell me about.

But I’m guessing that there’s a lot more out there that never makes it into the public eye. You folks work behind the scenes, you know, without breaching any non-disclosure agreements or anything, can you talk about the big picture? Do you see incidents, especially incidents in the industrial space with physical consequences, incidents that triggered shutdowns, incidents that are not publicly reported?

How many are there? What do they look like? Can you talk anything about sort of what I would not see by looking at the public record.

Chris Sistrunk
Sure. Thanks for the question. Yeah I think we’re talking about cybersecurity incidents here. And there’s many incidents that happen every day, right? But life goes on. Squirrels happen, right, in the grid.

But for cybersecurity incidents, I do believe we’re seeing an increase I can’t go into how many. We actually have a report that M-Trend’s mandate has put out every year. It’s going to come out later this month and for RSA.

And this is a yearly report. And we report on the the different themes, the different targeted victims, the different threat groups, the TTPs. But for cyber attacks that impact, say, production or cause a company to shut their operations down, I don’t have any hard, fast numbers to talk about, but we have seen an increase. And you can look in not just our report, but also the reports of others, IBM, X-Force, Verizon DBIR, Dragos, others. There are increasing reports of these, and a lot of it has to do with things like ransomware and ransomware either directly impacting the control system environment, which we have responded to, and a manufacturer and a few others. But we have seen in the public news where a company might have to shut down operations due to indirect impact.

Maybe their enterprise resource planning software or manufacturing execution software was impacted, which is an indirect impact to the industry-critical data flowing that was halted, which means I can’t produce my orders anymore or track shipping or logistics, things like that. So we’re seeing a lot of those.

There’s others in the electric sector that they kind of have to be reported to OE 417 reports. If there’s a material impact, obviously they’ll be filed in the, or they’re supposed to be filed in the 8K or 10K with the SEC.

And so, I think if you take all of those sources in and look together and see, we see there’s an increase of operational impact, but it’s I think the engineers are doing a good job of and the folks that run these systems are minimizing the impact in these situations, especially for electric and water and other critical infrastructure. Manufacturing is critical, but I’d say it is probably the highest targeted outside of healthcare and other other areas.

Andrew Ginter
So work with me on on the numbers just for one more minute. I’m on the record in the in the the waterfall threat report speculating as to what’s going on with public disclosures. It’s my opinion, but I have limited information to back it up. It’s my opinion that the new disclosure rules in the SEC and other jurisdictions around the world are in fact reducing the amount of information in the public domain rather than increasing it.

And the reason I suggest this is because it seems to me that with the new rules, every incident response team on the planet, roughly, I overgeneralize, has a new step two in their playbook. Step two is call the lawyers.

And what do the lawyers say? They say, say nothing. Because if you disclose improperly, if you if you if you fail to disclose widely enough, you can be accused of facilitating insider trading.

If you disclose too much information, you might get sued. I mean, people have been sued for disclosing incorrect information about security into the public. People buy and trade shares, and then they they find out the information was incorrect, and they get sued.

And so, to me, the mandate for the lawyers is say the minimum the law requires, because if you say too much, you risk making a mistake and getting sued, and you don’t want to get sued.

And if you If you say too little, you’re going to get sued. The lawyers minimize, and if you have a material incident, you must report it.

If it turns out the incident is not material to the finances of the company, but you don’t have to report it. And again, to minimize the risk of getting sued by reporting incorrect information, you report nothing.

So my sense is that we’re seeing fewer reports because of these mandatory rules, not more them. What do you see? You see this from the other side. Does this make any sense? Do you have a different perspective?

Chris Sistrunk
Well, I can say that, as an incident responder working with, victims in critical infrastructure, but also outside, I think this is a broader question you bring. I can definitely confirm that we work with external counsel that a victim may have hired to bring in to handle a lot of these reporting or not reporting requirements.

I can’t say or confirm that the the lawyers themselves, external counsel, we under reports. I can’t say that. I don’t know.

I’m not a lawyer, nor do I play one on Facebook. So I will just stick to say, yes, we have worked with external counsel. And usually we do not say anything in public for us as the incident responder unless the victim company or our client asks us to. Because sometimes sharing information is a helpful thing, especially if it’s a big breach, sharing that lessons learned about what has happened to them with others, just like we did back in the day when we had the SolarWinds breach. So, there’s two ways of thinking of that. And maybe you can pull on that thread with some other experts, but not me. I don’t know about the external counsel part.

Nathaniel Nelson
Andrew, you’d referenced Waterfall’s annual threat report, a report which I’ve covered in the past for dark reading. I’m not sure I’ve seen this year’s iteration, so maybe you could and tell listeners just a bit about what the report covers and what the numbers are showing lately.

Andrew Ginter
Sure. he report uses a public data set. The entire data set’s in the appendix. You can click through to it if you wish. We cover, we count in in our statistics, we count deliberate attacks, cyber attacks, with physical consequences, Not stole some money, Physical consequences in heavy industry and critical infrastructure, the the industries we serve. In the public record. No confidential disclosures. the numbers last year were 72 attacks I believe with I don’t know some 100, 150, something like that, I forget the numbers, sites affected. Many of the attacks affected multiple sites. This year, we are up from 72. We have 76 attacks affecting a little over 1,000 sites. So there was more sites affected. But the number of attacks did not increase sharply.

And this is why, again, I speculate, why have we sort of seen a plateau? We went up from zero, essentially, give or take, in, let’s say, 2019 to 72, and then in 2024, 76. Why do we seem to have a bit a bit of a plateau? And I’m speculating it has to do with the SEC rules. People are now legally obliged, not just in the United States, the Security and Exchange Commission. They’re legally obliged, not just in that jurisdiction, but in other jurisdictions around the world. There’s similar rules around the world.

If you have an incident that is material’ that any reasonable investor would use as grounds to buy or share or sell or value shares, you must disclose it.

But I have the sense that we are seeing fewer disclosures, because by law, you’re required to disclose material incidents. And again, because I speculate that because the lawyers are involved, we are seeing fewer disclosures. They disclose the material incidents and they squash everything else is the sense I have.

But, you asked about the numbers. Seventy six last year. Nation state attacks are up from, there were two the year before. There were six last year. You know, is this a trend? It’s still small numbers. Who knows?

And industrial control system capable malware, malware that understands industrial protocols and that is apparently designed to manipulate industrial systems, is up sharply. Where there were three new different kinds of of malware disclosed last year or found in the wild last year that had that capability versus seven in the preceding 15 years. Again, small numbers. Is it a blip? Is it a trend? Is AI helping these people write stuff? We don’t know. So these are all sort of, you look at the numbers and you scratch your head and you go, I wonder, what’s going on here? So that’s the threat report in a nutshell. there’s There’s other statistics in there, but those are sort of the headlines.

Andrew Ginter
So that makes sense. That leads us into the topic of the the show, which is lessons learned from incidents. You folks do incident response all the time. Can you talk to me? What are you seeing out there? Is there an incident or three that that sticks in your mind as, Andrew, the most important thing I have to tell you is, and or the most recent, what where would you like to start?

Chris Sistrunk
Okay, sure. we We have been doing OT incident response since I’ve been here. And I can give you a few examples. Last year in 2024, we responded to a North American manufacturing company that had their OT network.

If we’re looking at a Purdue model, it’s the the third layer, or level three of the network. was directly impacted by the Akira ransomware group.

And what had happened was an unknown internet connection was made by this third party who was running the site. They had put in their own Cisco ASA firewall.

And it just so happened to be that there was two critical vulnerabilities in that firewall at the time. And the CURE ransomware group was targeting those exposed firewalls.

So don’t necessarily think this was a targeted manufacturing OT attack. It’s just ransomware gangs doing what they do, trying to make money.

And so they were able to log in and get in through these vulnerabilities and deployed the ransomware on directly on the OT network, which was flat.

And every system, but about five or six or seven were completely encrypted, including their OT DCS vendors. And and there was multiple, not pick picking on any one in particular, but GE, ABB, Rockwell, several others that were there.

And the backups server was impacted and the backup of the backup server was impacted they were all on the same flat network so but this was a really tough situation since the company manufacturing did not have any backups that were offline the OT vendors like I mentioned had to come on site to completely rebuild the Windows systems, the Windows servers, the engineering workstations, the HMIs, all the things that were Windows and or Linux they had to completely rebuild.

Client didn’t pay the ransom, in other words. And so the lessons learned here, work with your OT vendors and OEMs and even your contractors to make sure that your windows systems and linux systems have antivirus make sure that you have OT backups that are segmented from the main OT network and keep offline backups and test them a at least a year basis backups will get you out of a bad day, even if it’s an honest mistake at five o’clock on Friday.

So this is a basic win here, having good backup strategy. And then in the last case here, we we recommended they eliminate this external firewall and leverage the existing IT, OT, DMZ firewall that came in from the the main owner of that site. And so they had a backdoor essentially that this third party contractor had installed in a new internet connection. So get away from the shadow IT, go back to your normal IT, OT, DMZ with jump box, two factor authentication and all those things. But if you do the basics and do them well, keep good segmentation, have backups and patch your firewalls on a regular basis. I think that will go a long way, especially in this case.

Nathaniel Nelson
You know, I feel like I’ve heard of variant of that advice that Chris just gave a million times, and I don’t work in industrial security. So you folks must hear it all the time, or it must just be such basic knowledge that you don’t even think about it.

So are there really industrial sites out there that still need to hear that you shouldn’t be making an internet connection from your critical systems?

Short answer yes. People who do security assessments, not just incident response that we’re talking here, but security assessments come back and say they regularly find connections out to the IT network and occasionally straight out to the Internet.

The connections to the IT network tend to have been deployed by the engineering team or the IT team to make their lives easier. You know, people with gray hair, and enough gray hair like me, they they they about how the systems used to be air-gapped. This was a very long time ago. We’re talking 30, 40 years. The systems used to be air-gapped.

And people with gray hair like me might assume that’s still the case. It’s not. Everybody who does audits reports these connections. The really disturbing stuff, yes, it’s disturbing that there are connections to the IT network that are poorly secured.

But the really disturbing stuff is the vendors going in. If you do an audit on a site, time and time again, I hear people saying, yeah, they discovered three different internet connections the vendor’s stuck in there.

And you’re going, well, what? Wouldn’t you notice if there was a new internet connection? I mean, no internet service provider gives you a connection for free. You’ve got to run wires. You’ve got to pay for this thing every month. It’s showing up on your bill. No, it’s not.

There’s a lot of wires being run while stuff is being deployed. You don’t notice a new wire. And the vendors pay month after month for the internet connection. It doesn’t even show up on the the bill of the owner and operator because the vendors are providing a remote management or remote maintenance service, and they want to minimize their costs.

They want to maximize their convenience in terms of getting into the site. So they deploy rogue DSL routers. They deploy rogue firewalls to the the site’s internet connection. They might deploy rogue cellular access points where there’s not even wires to run. It’s just a box sitting there that has a label on it saying, important, do not remove. And of course, that makes it invisible to everybody who’s looking at it. It says, oh, what? That? Don’t touch that one.

Yes, it’s very common. The advice I try to give people is when you do an like a risk assessment or and a walkthrough or an audit of your site, look for these rogue connections.

Unfortunately, you’re probably going to find one or two of these. Contractual penalties with the vendor help, but they’re no guarantees.

Andrew Ginter
So that triggered something. Let me yeah let me dig a little deeper. You said that the the victim decided not to pay the ransom. You know, do you see victims ever paying the ransom to recover an OT network, to recover the HMI, to recover worse than that, the PLCs and the safety systems?

You know, why would? Does anyone trust a criminal to take the tool the criminal provides and run it on their safety system and, restore it because they trust the criminal? Does anyone trust a criminal that far as that? Does that happen?

Chris Sistrunk
Okay, so good question. So we have seen traditional IT systems where they pay the ransom and get access back to these systems.

And some that are OT adjacent, such as Colonial Pipeline, and in hospitals we do know that those systems and both of those incidents or those examples are colonial pipeline or name a hospital breach. In some cases there were OT’s type data OT type critical information that was impacted and so they of course paid and due to the fact that there’s – if someone one didn’t trust these ransomware gangs to do what they say they were going to do, those ransomware gangs would be out of business. So if you pay them, the decryptor doesn’t work, then it’s no good and their ransomware gang job is over with, at least for this thing.

But for OT, In this instance I talked about, they did not pay. I don’t have enough data to know if OT direct on the control systems themselves, the Windows, HMIs, engineering workstations, DCS servers, SCADA servers, if they’ve paid in those instances.

But I’d say it’s it’s plausible. And it really comes down to the business decision of the plant owner, the CEO of the company based on what the engineers at the lower level, hey, can we get do we have backups? Can we get the vendors to come in?

And so I really don’t have enough information to say about do OT asset owners like a plant, like a site, like a, that directly operates OT, if they trust these ransomware gangs or not.

It may just roll up higher than them about that. Also, there’s usually an advice from a ransomware negotiator that’s a third party that specializes in negotiating with ransom. So they may advise to pay or not to pay or to get a reduced payment as well. So it’s very, very complicated.

I know I didn’t answer your your question directly but in in the instances we’ve seen, we have seen them not pay and we have seen them pay and what’s OT or not.

Andrew Ginter
That makes sense. So coming back to our theme here, lessons from incidents, the lesson from this incident is: get rid of that firewall, use the existing infrastructure, and, look at your backups. I mean, if the backups are encrypted, it’s it’s all over. That makes perfect sense.

What else have you got? What else, have you been running into lately that that’s that’s interesting and noteworthy?

Chris Sistrunk
Yeah, I mean, it basically boils down to either ransomware or commodity malware. So i’ve got I’ve got another example about a ransomware electric utility was impacted by ransomware on the IT side.

But they had a good incident response plan, and they severed the IT and OT connections there. And even down to the power plant type networks. And so that was really amazing. And so that’s a good story. And we were able to actually verify the IT team. We’re able to verify that the threat actor, the ransomware group Quantum, were scanning the OT DMZ, but they didn’t get a chance to get be let through.

We did do a full assessment of their DMZ, and looked at the domain control of the firewalls and even the domain controller and the firewalls and others down inside the OT networks.

And we found like that they were actually pretty lucky because they had some weakness in some of the firewalls. So eventually, if they had enough time, if the ransomware actor had persisted long enough, they could have gotten through that firewall and made it to the DMZ. And the Active Directory had some weaknesses as well, and they could have gotten domain access, domain admin, and pivoted to the OT network.

But the great thing is, to highlight again, they had a good incident response plan, they were able to segment quickly, and then they were able to have their OT vendor – in this case, it was Emerson Ovation – were able to go on site, And they were able to not only take the IOCs that we had from the ransomware, but they were able to sweep because they were this was their contract to do so, to look in the PLC logs, the OT workstations, endpoint protection, and all that stuff.

So we all worked in concert together in this incident. And then actually they hardened the firewalls, hardened the domain controllers, hardened the workstation configurations, Before doing anything else, they did all of that.

When the IT ransomware was eradicated and hardened, then they said, okay, now we’ll reconnect everything back the way it was. So that was a really great lesson learned with another ransomware. And it wasn’t a direct impact to OT, but this was a great opportunity to to leverage that Incident response plan that they had.

Andrew Ginter
So, Nate, the concept of separating IT from OT networks in an emergency, this is a concept that I see increasingly. I mean, I think we’ve reported on the on the the show here a few times that this is what the TSA demands of pipeline operators, petrochemical pipeline operators ever since Colonial, the ability to separate the networks in an emergency so that you can keep the pipeline running while IT is being cleaned up.

I haven’t actually read the the, translated the Danish law, but apparently in Denmark, there’s a recent law in the last 12 months saying exactly the same thing.

You know, the the TSA applies to pipelines and rails. In Denmark, it applies to critical infrastructure. And it says in an emergency, you have to be able to separate. They call it “islanding,” the industrial control network.

And as chris points out it can be effective but it relies on really rapid intrusion detection and rapid response because as Chris said the bad guys had been testing the OT firewall if they had had just a little bit longer they could have got through so, even though it’s imperfect, it is a measure that I’m seeing increasingly required of critical infrastructure operators and recommended to non-critical operators as a measure that that helps, especially on the incident response side.

Andrew Ginter
Have you got another example for us? I mean, three is a magic number. You’ve given us, sort of two sets of insights. What what else have you got for us and in terms of lessons learned?

Chris Sistrunk
Yeah. There’s lessons learned. I can just name a few other lessons learned from just about any attack, right? Making sure that you have these at least window systems with antivirus. In a lot of cases, the OT network didn’t have antivirus, just basic antivirus, not necessarily an agent or EDR solutions.

If you have those great, but if you don’t have any antivirus, that’s, you need to get at least a supportive version of Windows or operating system and with antivirus.

Having good backups, having good vendor support. Now, this last incident we responded to was using a living off the land attack.

So we responded to electric utility in Ukraine and in 2022. And it was a distribution utility that the attacker came in through the IT network, deployed their typical wiper malware. This was the group APT44 or Sandworm team, which has been targeting critical infrastructure around the world for quite a while.

And they were able to pivot to the SCADA system and use the feature of the SCADA system to trip breakers using tool that was built in the SCADA system itself.

So just giving it a list of breakers to trip and and calling that executable in the system to trip those breakers on behalf of the attackers. And so the the lesson learned here is targeted attacks, they’re going to not use malware, they’re gonna use the features or the inherent vulnerabilities in an OT network.

Stealing valid credentials like an operator, workstation, or an engineering administrator account and if you can even spearfish an engineer or an administrator network admin on the IT network and you don’t have good segmentation of roles from IT to OT then that’s that attacker is going to use every one of those tools to evade detection to bypass your normal detections

Because they’re coming in as a valid user. So the lessons learned there is to limit the amount of administrative access. And this is role-based authentication, right? And does the person that got promoted and now is in a different department, does he still need admin rights?

Does this person have enough control for just their area only? Are the area responsibilities too wide? And now we say, OK, we need to reduce the amount of admin.

Do we require two-factor authentication or even hardware two-factor authentication to really reduce the attacker down to an insider threat?

Because remotely, that’s very hard to do, to bypass hardware token-based two-factor authentication. And so there’s some there’s some living off the land guides out there.

The U.S. government DOE has put out a threat hunting guide for living off the land attacks after the Volt Typhoon announcements last year.

But I would also go a step above and beyond that learning good ways to detect anomalous logins even from your own folks. If it’s out of a normal time, out of a normal location you’re really going to have to have some tuning on some of these detections.

And the only way to really test those is with red team that’s trying to be quiet and not trigger your detections. And and That is some of the more advanced asset owners and end users. They’re using leveraging red teams, hiring red teams like what we do at Mandiant to come in and see if we can do living off land attacks to bypass their detections.

Nathaniel Nelson
Since Chris mentioned it but moved on before we can actually define it, let me just, for listeners, living off the land is the process by which an attacker, rather than using their own malicious tooling, would make use of legitimate software or functionality of the system they are attacking to perform malicious actions on it.

It’s been a growing trend in recent years, I believe, because it’s so effective in that it is so difficult to detect.

You know, you could spot malware with certain kinds of tools, but can you spot somebody doing things with legitimate aspects of Windows or whatever you might be using?

It sounds, though, like Chris is talking about detecting living off the land tactics, which seems difficult, Andrew.

Andrew Ginter
That’s right. I mean, I have been following Living Off the Land to a degree. The, the, What’s right word? The short answer is you run an antivirus scan on a machine that’s been compromised by a Living Off the Land attack, and it comes up squeaky clean.

There’s nothing nasty on the machine. And what I heard Chris say is that this is because the the bad guys are using normal mechanisms, especially remote access, to log into these systems as if they were normal users and use the tools on the machine to to attack the network or to wait for a period of time until it’s opportune and then attack the network.

And What I heard him say is that because it’s a lot of remote access, he says you can detect this by focusing hard on your remote access system. You can prevent it by throwing in some hardware-based two-factor. that That will solve a lot of the problem, not necessarily all of it. There’s always vulnerabilities in zero days, but the two-factor helps enormously. It’s way better than not having two-factor.

But that’s preventive. On the detective side, he said, pay attention to your remote access. If normal users are logging in at strange times, that should raise a red flag.

If normal users are logging in from strange places, the IP address coming in is from China. Well, is Fred in China this week? No, he’s not. So, what I heard was one way to, to you know, help detect living off the land techniques is to pay close attention in your intrusion detection system to the intelligence that you’re getting about remote users logging in.

Andrew Ginter
So one more question. We talked to a lot of folks on the on the podcast. A lot of them are are vendors with technology that we talk about. And sort of a consistent theme for most of these vendors, most of these technologies is operational benefits.

Yes, the technology, whatever it is, is helping with cybersecurity. But often, this stuff helps with just general operations in sometimes surprising ways.

We’ve been talking about incidents and lessons and a lot of what you do is incident response. Are there operational benefits that that you run into that people say, I did what you told me and everything is working more more smoothly, not just on the security side? Do you have something anything like that for us?

Chris Sistrunk
Oh, absolutely. And one of the things that I always tout is looking at your network, looking at the packet captures in the network can aid in not just cybersecurity benefits, but these operational benefits.

You can see things like switch failures happening, TCP retransmissions happening, all this traffic, maybe when like your Windows HMIs, maybe trying to reach out to Windows Update, but it’s blocked by the IT/OT firewall or anything else. It may not have a connection at all. It’s trying to reach out. All this unnecessary traffic or indications of improper configurations, misconfigurations, and things like that.

So just looking at your network with some of these tools that are out there, free tools, paid tools, ICS-specific tools, or IT-specific tools, doesn’t matter. If you look at, If you take any one of those, say just even Wireshark, and look in your OT network, you can get an idea on what traffic doesn’t need to be there that you can eliminate it make your improvements to the system.

And now I have better visibility. If there is an incident, I can easier to detect if there’s a cyber incident or if something’s operationally wrong, like a switch failure or something.

And so there’s a really great benefit there. Also helps improve reliability. We’ve done an assessment at a company that had a conveyor belt that they were having problems with. If the conveyor belt wasn’t timed exactly right, if they had too much latency on the network, the conveyor belt would stop and all the things on the conveyor belt would just go everywhere and it was a disaster.

So we just looked in the network, oh, you’ve got all these TCP retransmissions. And you look at the map in the in the software and say, oh, it’s coming from these two IP addresses.

Oh, we know what those equipment is. And we had the network person come, oh, I’ve been trying to figure this out for weeks. And just looking looking, just using a tool like that they were able to find and fix the problem and they fixed their latency issue because of that.

So going back to, incident response, having these things, having an incident response plan, a lot of OT already has this, because of disasters, fire, floods, storms, spills, air releases, safety issues, and that’s all part of their normal disaster recovery or incident response plans.

If you already have one of those, you’ve already done 90% of the work to have a cyber incident response plan. You just now have a cyber incident response added to that. So that’s the whole premise behind things like ICS4ICS, incident command system for industrial control systems.

And having that, say, chief person in charge of cybersecurity for a site, for a paper mill, for a power plant, for manufacturing facility, even though that’s not your day-to-day job all the time, if you’re, say, the lead, that, and you have, say, you have multiple plants, have multiple leads for those plants that still the, every decision will go through the plant manager, the general manager of the plant or site, but at least you have someone that is in charge of cybersecurity.

Just like you have a designated fire watch person or anything else. So if you take safety culture that we’ve known about for over 100 years and mold your cybersecurity culture to fit with that, things will make a lot more sense. We’ve already invented this. is We’re not reinventing the wheel here. Now we’re just including another paradigm of cyber security, network security, and endpoint security into these things that we have been doing.

There’s a fire. Okay, let’s put out. Incident response. And if you have a plan, That’s great. If you don’t have a plan and you run around that’s not good. So if you have a plan, you can at least prepare for it. And sometimes that’s the win. Being prepared is better than not being prepared.

Andrew Ginter
Well, this has been tremendous. Thank you, Chris, for joining us. Before we let you go, can you sum up for our listeners? what What are the key points we should take away here?

Sure. If you didn’t listen to a single thing I said, and you you can listen to these three things. Collaborate, plan, and practice. So collaborate. Get your IT teams talking to your OT teams, talking to your manufacturers, and identify the right roles within each of those.

And make sure you get together and talk about these things. Have some donuts and coffee. So collaborating knowing who is in charge of what is half the battle, knowing who to call when. plan Having an incident response plan or including OT security in your incident response plan and or engineering procedures, that’s going to help when an incident impacts OT directly indirectly.

And then practice. even can start with a simple question. Hey, what would we do in an incident? Or even going to having a tabletop exercise collecting logs from a PLC security logs. How long does that take? How many devices do we have? If the general manager says, how long is this going to take to pull all the logs from all of our systems? You won’t be able to say, I don’t know.

You’ll just know, this will take, two hours and 45 minutes because we’ve tested it. So collaborate, plan, and practice.

If you need help with OT security or IT security, we do that at Mandiant. We offer an incident response retainer that covers IT and OT. There’s no separate retainer. If you have an IT incident and don’t need OT, not a problem. If you have an OT-only incident, not a problem. if it’s IT cloud and OT all at the same time, we can help you, around the world 24-seven.

And lastly, if you want to learn more about this, you can reach out to me. Chris Sistrunk at Google.com. My email, LinkedIn, social media, blue sky. And check out some of our blogs on the Google cloud or Mandiant security blog.

We have, great content out there that is actual actionable not marketing fluff. It is actual actionable reports.

The next M-Trends report is coming out next week, RSA timeframe, end of April. So that’s a free report.

It’s a great report to to look at and gain some insights on what we’ve been responding to over the last year. And with that, I appreciate it. Collaborate, plan and practice.

Nathaniel Nelson
Andrew, that just about concludes your interview with Chris Sistrunk. Do you have any final words to add on to his to take us out with today?

Andrew Ginter
Yeah I mean, Chris Chris summed up collaborate, plan, and practice. What I heard, especially earlier in the interview, was do the basics, guys. Some people call it basic hygiene. It’s basically do on an OT network as much as you can of what you would do on an IT network.

Put a little antivirus in on the systems that tolerate it. Get some backups. Get some off-site backups so that if the bad guys get in, they can’t encrypt the off-site backups. They’re somewhere else. look for the vendors leaving behind internet connections, get rid of them,

And on the in terms of living off the land, he gave some very concrete advice that it that I’d never heard before, saying, look, these people are coming in as users. Get two-factor. Two-factor will do a lot to breaking up living off the land attacks.

And in your intrusion detection systems, look hard at what your remote users are doing. And if it seems at all unusual, that’s a clue that you’re being attacked. And, in terms of his collaborate, plan and practice, I really liked the fire warden analogy.

 Say, look, if you have an industrial site that is flammable, okay, your fire warden does not just sit on their hands until the place bursts into flames. Okay? The fire warden is someone who’s active in terms of actively, looking at managing raising the alarm when they see dangerous practices in this flammable plant. It’s not, it’s not just a reactive position, it’s also a proactive position.

And we need that for cybersecurity, because basically every site is, in a sense, a flammable cybersecurity situation.

So it’s not just that they sit on their hands until there’s an incident and then they’re in charge. They are actively looking around, just like a fire warden would and wouldn’t say, we shouldn’t be doing this. My job is not just to put the fire out when it occurs or coordinate putting the fire out.

My job is to help prevent these things. And so that I love that analogy. That that makes so much sense. Anyhow, that’s that’s what I took from the episode.

Nathaniel Nelson
Sure. Well, thank you, Chris, for speaking with us. And Andrew, as always, thank you for speaking with me.

Andrew Ginter
It’s always a great pleasure. Thank you, Nate.

Nathaniel Nelson
This has been the Industrial Security Podcast from Waterfall. Thanks to everyone out there who’s listening.

The post Lessons Learned From Incident Response – Episode 139 appeared first on Waterfall Security Solutions.

]]>
Risk in Context: When to Patch, When to Let It Ride | Episode 109 https://waterfall-security.com/ot-insights-center/ot-cybersecurity-insights-center/risk-in-context-when-to-patch-when-to-let-it-ride-episode-109/ Sun, 09 Jul 2023 00:00:00 +0000 https://waterfall-security.com/ot-insights-center/uncategorized/risk-in-context-when-to-patch-when-to-let-it-ride-episode-109/ The post Risk in Context: When to Patch, When to Let It Ride | Episode 109 appeared first on Waterfall Security Solutions.

]]>
In this episode, Rick Kaun, the VP of Solutions at Verve Industrial Protection, takes us deep into the world of patching OT systems, how it drastically differs from IT patching, and what the process looks like in action.

We also have a look at how to quantify risk in OT, as that’s part of the process of deciding what to patch first, and how to prioritize a large workload of patching that is heavily backlogged.

LISTEN NOW OR DOWNLOAD FOR LATER



About Rick Kaun

Rick Kaun is the VP of Solutions at Verve Industrial Protection, which is an ICS security provider for both software and services. For over 20 years, Rick has travelled the globe, working with clients of all sizes across many industries in his roles at Matrikon, Honeywell and now Verve. 

Rick Kaun

Risk in Context – When to Patch, When to Let It Ride

Transcript of this podcast episode:

Please note: This transcript was auto-generated and then edited by a person. In the case of any inconsistencies, please refer to the recording as the source.

Nate Nelson:
Welcome everyone to the Industrial Security Podcast.
My name is Nate Nelson. I’m here as usual to Andrew Ginter, the Vice President of Industrial Security at Waterfall Security Solutions.
He’s gonna introduce the subject and guest of our show. today, 
Andrew, how are you?

Andrew Ginter:
I’m very well, thank you, Nate. Our guest today is Rick Kaun.
He is the Vice President of Solutions at Verve Industrial Protection and we’re going to be talking about context for risk we’re going to be talking about. We’re going to be talking about automation to help us figure out when something is worth patching, when a vulnerability is worth patching and, you know, when we should probably let it ride.

Nate Nelson:
Alright, then, without further ado, here is the interview:

Andrew Ginter:
Hello Rick and thank you for joining us. Ah before we get started. Could you say a few words about yourself and your background and about the good work that you’re doing at Verve Industrial.

Rick Kaun:
Yes, thank you Andrew I appreciate the opportunity to share with you today and and discuss it’s always a great time chatting with you ah, as plugged in as you are. It’s always a great discussion. Um, my name is Rick Kaun I’m the VP solutions for Verve Industrial Protection. We’re a vendor neutral multi-function shop that provides both software and services exclusively to the OT space. Um, we’ve been around for 30 years actually privately funded self-grown organically grown I personally have been in the OT business for 22 years um I started at a company called Matricon ah building the ICS or industrial control security consulting space. We then got picked up by a company called Honeywell where I was a global business development for them for a few years before coming back to a vendor neutral perspective where I am now. And again appreciate the opportunity today.

Andrew Ginter:
Thanks for that. Um, our topic today is risk in context. Can you tell us you know what is that and and you may maybe give us an example or 2?

Rick Kaun:
So yeah, this is the the gist of what I want to talk about today. We can we lots of lots to unpack here. Um, the problem that I’m seeing more and more organization not just me that we as a company and in ah our industry clients and their peers are all starting to realize is that um. Ah, the traditional IT approach where you have very cool tools individual specialty tools and specialty people is interesting but it’s a singular point of perspective. So, I can give you a list of vulnerabilities and you’d think that great I now know what my risk is but the reality in OT is it’s not always directly applicable. That’s a single external indicator. It doesn’t include anything to do with whether that’s a big deal for you. The owner operator or on this particular asset or in a particular facility. Um, and so what we’re seeing is that people are needing more and more context and this is especially true on a couple of fronts. Um number one. We don’t have enough staff to address all the vulnerabilities I mean we turned on our vulnerability mapping at a midstream gas company the other day and literally at their flagship site was 23,000 vulnerabilities. Um, and you’re never going to get that to 0 and you’re not sure even where you should probably start unless you have some sort of context. So a singular view isn’t good enough and it’s actually something we’ve been struggling with for many years I remember about ten years ago going into ah an air separation unit and the plant manager said you’re going to do an assessment and you’re going to tell me that I don’t change passwords and they don’t patch how the heck does that help me.

Industrial security man with hard hat

Rick Kaun:
Um, and so on the front of being able to take those scarce resources and focus them on the things we need to i.e the lack of staff the lack of time. Um the ability to look at what truly is effective for OT which isn’t always everything because everything can’t be Windows 11 and patched every Tuesday but also the nature of OT in that. We can’t do plan a i.e. the patching necessarily we need to do plan b and c and d or compensating controls or as some interestingly creative marketing people have started to call virtual patching. If we can’t apply Bluekeep. Can we at least disable remote desktop or the guest account? Well I don’t know the answer to that unless I know what that system does and what function it provides operations. So the gist of this is any singular indicator of risk I e an external penetration. You know denial of service attempt. Or a list of vulnerabilities or a patching tool in and of itself is not enough to provide you know our audience the detail that they need to be able to act appropriately and effectively and efficiently.

Nate Nelson:
Andrew, the way that he’s talking there, it reminds me vaguely of an episode that we did some time ago, with a guest who was talking about like site-specific vulnerability that that you can’t categorize or that you shouldn’t rather categorize vulnerabilities in a broader sense that he was looking into specific sites in specific industries and evaluating how they affected those specific plants.
It was sort of a different alternative to the usual CVE kind of way of doing things.
Can you remind?
What the name of the guest was there?

Andrew Ginter:
Yeah, that that was Thomas Schmidt from the the BSI, which is the German Office for Information Security and he was talking about SSVC.
So Stakeholder Specific Vulnerability Categorization, this was a, a, a new standard this was six months ago, a new standard to make decisions about whether to apply patches for vulnerabilities and you know Verve, Verve Industrial Protection is one of the vendors in this space.
They have automation for SSVC and Rick is going to say more, you know much more about that in just a moment.

Andrew Ginter:
So I mean, I agree with all that in principle. Um you know, but it’s sort of what’s the right word. These are great principles. But if I have you know, ah 30 sites. Each of them has you know 600 PLCs and who knows what other kinds of, you know, industrial internet and you know everything has a CPU nowadays. Um, you know it’s great to say I can make risk-based decisions like that. It’s another thing to say: Do I actually understand all of my tens of thousands of devices and which of them I should manage? How? How can you you know is there is there is there a way to get insight is a way to get automation instead of saying you know evaluate every one of my 50,000 assets myself?

Rick Kaun:
Yeah, no, great question and you know you’re right like when I talk about the risk indexer and an individual measure as not being enough I mean you look at you look at the NVD score and it’s got a ranking and it’s got WHY it got to that ranking there’s components in there but you’ve raised the other point is how do I take that and scale it? Both in volume, but also there are as you know nuances within those sixty sites or 30 sites in the FLCs some of them do critical safety things some of them do other things and so as soon as you start to unpack this a bit. You realize that you need a great level of detail from multiple perspectives. Um, so let me let me just sort of walk through how we help clients to build that perspective and what we call a 3 three-dimensional view of the asset. Um, and so the first and foremost is going to the source. Absolutely has to be an inventory remember our last podcast, we talked about building a proper inventory and it never meant now more than ever. It’s it’s the most important thing we’re seeing that people are starting to realize is um, you need to go directly to the asset. The first source of data in building this bigger picture view to be able to provide context. Beyond a single measure is to pull in multiple measures of the asset and so the first source is the asset itself can we figure out what it is? Windows? Linux? Unix? Networking gear? PLCs relays controllers?

Rick Kaun:
But the manufacturer is whatever we can glean from that endpoint the more data we can get from that the more equipped we are to pivot to different scenarios and situations in ascertaining and actually measuring risk for our organization. Um, that’s the first bit of data. Second bit of data once you’ve built that information and by the way we advocate for a direct approach going right onto the OS and giving right directly to the PLC. He’s not going to intermediary databases or relying on what I’m hearing through traffic I want to go to the source I want to ask and get answered the questions I need to know. Hope that I hear them or hope that I’m listening to all of the devices, so we get all that data. We automate it into a central database at that central database is where we invest to gather and collect and aggregate all the other data sources that are available to these operators the very first one is really quite simple. Take that database once we built it and we work with operations to say let’s put operational impact or user-defined tags on these assets. So at Triconics, a refinery, clearly a safety system. That’s a high impact asset. It gets a high impact label because when we go to look at risk. Whether that’s assessing if it exists and is important to us or whether we want to turn to remediation tasks like patching or compensating controls. We can then decide which ones we test on by either redundant to safety one or not the not the safety ones but the redundant not so impactful ones or which ones we need to prioritize in terms of budgets and resources etc.

Rick Kaun:
So we start with the inventory, automatically from the endpoint. We then add user defined tags and information about it. We can calculate or infer these from the vendor name or from the subnet they’re in or whatever but we usually like to make sure we loop the plant people in that tribal knowledge that those veterans have on the plant floor is invaluable very powerful. Um, so two different data sources starting to build multiple dimensions on the asset and then the third thing we’ll do is we’ll go and get third -party indicators and this is where we get the national vulnerability database. We’ll bring the list of vulnerabilities in and we can map them 1 to 1 at the database offline from the actual operational environment. So we’re very frequently and very thoroughly mapping vulnerabilities against that very detailed inventory. We can also bring in threat feeds and known exploits. So we’re going beyond by the way you’ve got a vulnerability and there’s exploits out there for it right? That may increase or decrease your urgency. Um. And then the third thing we bring in of course is the first line of defense the patches that may be available for those devices so we start with the endpoint data first, the tribal knowledge around the importance or impact or a role of those devices second, third-party indicators of risk in patch threat feeds and vulnerability data. And then the fourth thing we’ll do in that middle layer database that we are compiling all these different dimensions of the asset about we will go east and west and connect to um, basic building blocks for security tools like the status of your backup or your antivirus and in multi-vendor environments the ability to abstract connecting into semantic versus McAfee versus you know defender to see if your antivirus is up to date or your white-listings in lockdown to aggregate that into a central view is where the real magic comes now how we do this to your point in the scenario of 30 sites is we then cascade. Any single site or database to a single reporting dashboard the reporting dashboard is a single It’s a singular view into the entire fleet and it’s a read-only view. So what we can do then is we have all these different data sources consistently from all of our facilities and across all of our assets. We’ve got multiple dimensions of potential um risk but not just the risk but the assets impact in compensating controls that may mitigate risk or or narrow down the concern and we can see this with a small, specialized team fleet wide this allows for an organization. To build a standard response, a standard threshold, if you will, for various risks and activities that they need to act on um and they can then start to remediate with this same technology where it’s OT safe. Um and we can do this in stages. But let’s. Let’s circle back there on that one. This is this is how we want to build the context in the first place and we want to aggregate it across multiple sites and now give all the power to a ah very specialized team and let them start to see what they’re looking at.

Andrew Ginter:
All right? So that’s I mean that’s a lot of data. You know it’s useful to have that database. It’s useful to capture tribal knowledge but still if we have thousands of assets and you know even more data about each of the assets in a database that’s just. More information for you know my poor my poor brain to you know, try and suck in and and make informed risk decisions is you know is there is there a way to to work with the data once you have it?

Rick Kaun:
Yeah, yeah, yeah.
Yeah, no, that’s an absolutely spot on observation I mean talked about 23000 vulnerabilities and now you added 3 other dimensions to those 23000 so what So we do with it? Great question um the most powerful thing that we’re doing and you can actually do multiple things with it but the most powerful thing you can do with it. Is to calculate an actual what we call a residual risk score. So if you were to take those different indicators and we do this with the clients and it’s automatically done in the software to then report on the dashboard. Um, but the premise behind it is we build indicators of risk and we build scores for things we don’t like. And it’s just like my golf game. It gets really big and huge and ugly real quick. Um, so for example, if ah device is considered a high impact. It gets the score of 10 if it’s considered a low-impact it gets a 5 tribal knowledge now has a score um, if it has a critical vulnerability. It gets 7 points for every critical vulnerability. It may have multiples. Um, if the backup has failed right? So all these different data sources can be pulled into a calculation and a score appended to them and then the true what we call residual risk after we’ve accounted for compensating controls like the backup is good and whitelisting is in lockdown so the score comes off. Um we can build an actual risk score that is. OT or end user specific and then those scores can be put into thresholds which then drive governance around behavior so something that’s considered a critical risk. Maybe as an org we decide has to be dealt with within 24 hours whereas something with a low risk can be calculated to do something within a week or a month.

And because it’s automatically updating the data and calculating you are now always looking at a live score so you extract from the noise. The stuff that is a 5-alarm fire. You have to deal with it.

Nate Nelson:
Andrew, does everybody have their own scoring system?
I mean, we’ve been talking about SSVC their CVE.
Now there’s this.
I wonder whether all these different systems get confusing after a certain.

Andrew Ginter:
I think the the short answer is yes, the the long answer is in a sense, This is why in my understanding SSVC was invented.
It’s because the vendors in the space do all they you know they they did all have their own system and. you know the users. the government especially said you know here are some minimum criteria that you should be using for these risk.
You may be able to get sort of finer grained decisions by doing.
You know more analysis of the of the risk in the context.
But you know, here’s a minimum.
And so they put SSVC together.
And of course, you know, why would the vendors do not just SSVC and be done with it?
It’s, I mean. I worked, I’ve worked for vendors most of my career.
One of the ways vendors distinguish each other from each other is the feature set in their product.
So you know every vendor out there wants to say yes.
Of course we do SSVC and we give you even more power because we do XY and Z in addition.
So yeah, it’s not surprising that everyone does it differently.
This is part of why SSVC exists.

Andrew Ginter:
So So just to clarify. Um these sort of characteristics you know did the backup work. You know what’s the what’s the impact of are these in a sense preprogrammed that that users then populate or are they sort of user defined can the Users Define the calculation or sorry the yeah, the characteristics and you know if the characteristics I used to defined I would assume the calculation has to be user defined How you know how? what’s the right word “customized” is this to every site?

Rick Kaun:
Yeah, okay, fair, Um, the calculation itself is something that we ship with a basic sort of structure if an organization wants to change how it’s calculated or ah emphasize or deemphasize certain indicators one way or the other they can the precursor to that question though was the user defined properties. User refined properties are usually set once I E. It’s a high impact asset to operations or it’s ah not inconsequential asset operations. Um, and those don’t typically change what does change and therefore we automate is the presence of vulnerabilities or the version of the software or the the level of the patching. So where we can automate technical gathering and technical data we do where we need user to find. We usually establish that upfront it doesn’t typically change I mean it’s triconics is IT triconics. It’s safety a safety it doesn’t suddenly stop being something. Um and so we have a combination of sort of set up the automation set up the tags and then. The actual how you want to calculate is absolutely customizable as are the labels upfront I mean if if we think it’s a high impact asset the client disagrees. It’s their.. It’s their database. It’s their threshold they can they can customize it.

Andrew Ginter:
So So that makes sense I mean you know you’ve got Automation. You’ve got a calculation you can see in your words where the 5-alarm fires are um, what do you do about those fires? Are we talking you know, patch everything? Are we talking throw some more firewalls rules in there? And you know. If you make those changes. Um, what change. Do you see in your system? How does can you give us an example how this works?

Rick Kaun:
That yeah, absolutely and that’s the natural follow 1 and as you said in the intro you can’t always patch. You need to have be able to be creative. So this is where we sort of double down on the value of that multidimensional view. So I’ve got a risk my risk score is automatically calculated I’ve got a bunch of Really you know, sort of heavy hitters that look like they’re high risk and I need to act upon them. Um, and so now what I do is I go into those individual assets and I look at the components that have added to what the problem is um, it’s a vulnerability, there’s a patch available for it. It’s not been applied well, that’s your first. That’s your first path we want to look at that patch. Can we deploy that patch? Again with our architecture. Um, we can then do a very safe ot stage sort of approach through the first pass of potentially patching. We will patch. You know redundant systems, domain controllers, file servers, and we’ll bake in a certain version like 2012 make sure it works and then potentially pass that on to the HMIs and engineering station so we can do a very staged. Methodical OT approach if we can’t do the patch though we can then look at our systems and say well. What is the risk of this and 1 example we had with a client was blue keep came out and they couldn’t patch for Bluekeep on some other systems but they didn’t want to just live with a risk so the next thing they did was they said well, what’s wrong with blue keep. Um, and so what they did then was that well it mostly attacks, you know the guest account remote desktop. Well their 24 7 staffed HMIs probably didn’t need remote desktop enabled and they certainly didn’t need the guest account. So then we used the technology to start to disable remote desktop and the guest account again in a very staged approach.

Industrial production floor

Now if you build it properly in the 30 site scenario that that you shared with us. You can do this again from a central team that queues this up tees it up for the endpoint and can involve the people at the plants to be able to go ahead. Um and be involved in so it’s not a big brother push but it creates very very precise or or precision in deciding what to do and where to do it and going beyond patching to compensating controls.

Um, and there’s 2 things in there that we may want to dig deeper into, Andrew, let me know if you which way you want to go. There is you know more examples of things you can do beyond patching but there’s also the efficiencies gained by the way that this is architected.

Nate Nelson:
Andrew, perhaps this is my naivete, but coming from more of an IT background, I’m not quite sure why a lot of the nuance in this discussion ends up factoring into patching generally.
I’ve always thought about patching as something that you should do as soon as possible whenever the option is available, but here we’re talking about criticality and severity, and all of these factors that seem to suggest that you.
Wouldn’t otherwise patch in every given scenario.
So why is that?

Andrew Ginter:
I think it’s easiest to answer with some examples.
On your average, you know, IT network, you know, let’s let’s leave the the the the service leave, the specialized IT networks out of it.
The the banking systems and the even the SAP servers, the ones that are are handling payroll. Let’s leave sort of that critical systems out.
You’re you’re you’re just you’re desktop network. You got a bunch of desktop machines.
They are reaching out to the Internet every 5 minutes to fetch e-mail, to send e-mail, to browse the Internet, to download stuff.
They’re really very exposed.
And in a sense, they’re all the same.
They all have the same level of exposure.
Yeah, you use the severity and if it’s a severity 9, you patch it automatically and there’s.
There’s not that much decision making going on on your that that kind of IT network kind of everyone’s most familiar with, but.
So on the on the back end here, safety systems are the ones that prevent unsafe conditions from killing people.
You know, over pressure conditions on boilers from blowing up and killing people over temperature conditions from, you know, causing explosions or fires.
And so let’s say there’s a a vulnerability that lets somebody crash the machine.
You send the message the magic, send the “magic” message to the machine and the machine crashes and reboots and it down for you know, 30 seconds or 5 minutes depends how long it takes to reboot.
How important is that on the main part of the control system, it’s fairly important you’re probably going to take the plant down.
It’s it’s a reliability threat.
How important is that on your safety system?
Well, if your safety system is down for 5 minutes, human life is at risk for those 5 minutes, and so the safety system is more critical than the rest of the plant systems.
Arguably you, you know, even if a, you know, cause the machine to reboot, vulnerability is not as bad as you know, execute arbitrary code and do nasty stuff.
It’s still on the safety systems.
It’s they’re that critical that you should be patching these lower, lower priority vulnerabilities on the safety systems.
On the other hand, if your safety systems and you know many of them are air gapped, you cannot send them a message from any other machine. Well then that vulnerability really isn’t exposed on them, is it?
And so you might say yes, normally I would patch it, but because, you know, these safety systems versus those over there that are exposed these ones are air gapped I don’t need to patch these ones as urgently as I do those ones over there that are on the network…

Andrew Ginter:
…So this is an example of where context drives the decision of a given vulnerability that you you might patch in some circumstances and not in others.

Andrew Ginter:
Well let’s touch on both of them.

But um I guess my my question is let’s say I don’t know I put a firewall rule in or I yeah I don’t know I install whitelisting or something. Um I take some other measure. Um.

What do I see in the system? How does how does the system does it notice that I’ve done this? and recalculate how you know how how does this is what it’s great that the system tells me hey look I’m in trouble but does it come back and say you’re not in trouble anymore. Good job or you know how how does that work?

Rick Kaun:
Sure, um, it’s a great question so you’re right. The basic premise is in that scenario I’m going to look at something and my score and you’re right I overlook some of the more simple sort of approach. Potentially the risk score says that I have a high risk and when I dig in. There’s a patch needed. Maybe I can do a compensated control. Or maybe the risk is because the backup or part of the risk core is because whitelisting isn’t currently in lockdown or not installed on that device or the backup failed last night or you know we got this porch or service that really just need to disable and I can’t do anything with it so to your point let’s install a firewall or inline network access control type of thing or we can go rerun the backup or we can enable and install whitelisting and to your question: Because of the nature of the way we collect the data. we’re continually connecting to the endpoints we’re continually remapping vulnerabilities and we’re continually connecting to antivirus and backup databases you are correct once I make those changes rerun the backup install the white listing configure the registry that will then show up in my dashboard and I will see a reduction in risk I will see less devices in that higher or critical category. I will see a trend over time that I had a whole bunch of 5 alarm fires and then I’m down to a couple or none and you can actually see the improvement and that’s exactly what the dashboard is for is to give you that mere real-time view into the current status which includes improvements as you do these things and recalculations. But it also includes obstacles or challenges as new vulnerabilities or threats are included because that’s also injected. So it’s a continually moving and evolving line. It’s just like looking at ah at a trend line in ah in an operational facility when you’re making power or or oil. You’re doing better and you see the productivity go up and the tank fill up and something goes Wrong. You see it drop down the exact same sort of scenario.

Andrew Ginter:
So that makes Sense. You know I was thinking about an example, you gave a little while ago, you know the the remote desktop example and the vulnerability that you know let’s say in remote desktop. Um I’m wondering is your is your system clever enough to say. Hey the vulnerabilities in remote desktop and look they disabled the remote desktop on these devices therefore it it automatically reduces the risk -is that how it works?

Rick Kaun:
Yeah, to an extent I mean in in the in the scenario you just gave. There’s a couple of different indicators that would reduce the risk score and those which are automatically gathered like the whitelisting status is now in locked down or the backup is now a success as opposed to a fail that’s automatically gathered automatically updated, no problem. When you get to the more creative OT needs around compensated controls like disabled remote desktop. We would once we had done that confirm that it was complete on. However, many assets we did it for and then we would set a flag in the software to then? um, not show those systems as still being vulnerable to that particular vulnerability because. The first passive mapping to vulnerabilities and as a patch there would say no but we know we’ve applied compensating controls. So when I show the resulting dashboard to an OT practitioner. They see their true risk. Not again, a false positive in that here’s all these vulnerabilities because we know we’ve got those those extra controls and again, that’s just another reinforcement for why you’d want to have multiple indicators not just the vulnerability, but vulnerability plus how is it configured well now we’ve got a different answer. We got the true answer.

Andrew Ginter:
And you know. I Wanted to come back you you you said there was another topic I think it was scaling. Um, you know how you use these numbers?

Rick Kaun:
It? Yeah no, it’s this is the really powerful part right? So the architecture that we we instill. Um and we touch upon a little bit earlier but the architecture that we instill means that you can bring in multiple facilities and all of your assets into a single view which means you can have 1 small specialized team look at the fleet on behalf of the whole organization. You don’t have individual sites having to dig in and be security experts and figure out what to do about this patch or that risk um and then with that technology you can then start to queue up activities and I wanted to circle back to this especially because OT always gets really scared and you think some central it guy is going to do something I get it. Like said I’ve been doing this for 20 years I’ve I’ve seen a wrench in ah in a firewall in the field. Um through through the physical hardware through the chassis. So let me just walk you through through a quick case study. Um, we have a generation client that wanted to update a particular piece of software. In fact, they wanted to uninstall the software they were afraid of. Its origins and the risks that were associated with it and I’m not going to name anybody but we all know that there’s been a number of companies that have fallen on the ah fail foul side of of many people’s favor. Um, and so there they are on a Saturday afternoon. They’ve got sixth generation coal fire generation very complex environments. Um and they’re needing to remove this piece of software and they are not sure how they’re going to do it. There’s a deadline! So what they do is they can go to the dashboard and in the software and they can see there’s one hundred and forty six copies of software spread across these 6 heads. Um, the central team is able to then initiate a request to uninstall the software and I say request because.

Rick Kaun:
While I’ve said earlier we can make things like registry changes. Automatically, we don’t always want to be messing in the ot space but to be very respectful and usually uninstall with software requires a reboot so we send a request to 146 systems to unsell the software. We send a detailed list to each site of exactly which physical room and wreck this. Devices in and when they get to the console they will see a flashing light saying would you like to accept this action? They are then able to phone the operator in the control room. They are then able to follow their own local change process. They’re able to uninstall the software bring it back on online and move to the next one and this particular client. Anyone who’s listening this knows that this sort of activity removing 150 copies of software from 6 locations um would take probably weeks or even maybe a couple of months of a lapse time. Not not not expended time but just hunting and searching and finding the time this activity was completed with a support person of 1 at the central and 1 individual at each of the 6 sites in and around their day job. It was completed in 90 minutes, and because of the update mechanism we could see that in the software dashboard updating as they went around the flight. So this allows we have a client with 700 facilities and a team of 8 managing worldwide around the clock. So this is the future to me of where Verve, sorry where OT security is going and how Verve is helping their clients to do it is. There’s not enough people and we need to start doing things and so if we can combine context ability to act and OT safe.

We start to really move the needle.

Andrew Ginter:
So that’s you know and that’s powerful automation is is good automation on the security Side. You know I’ve been thinking about this interview though and I know there are standards bodies. There are governments out there. There’s a lot of debate going on saying what are the right metrics. How do you measure risk in in the OT world? Um, you know I see people out there saying well you know, um, be careful how you measure risk. Ah you know if if your risk just keeps going up your your board is going to be asking you. Why? Why they’re spending money on security if risk keeps going up if ah, you know so people are saying use use measures like how many of the top 20 controls have I implemented when you get to 100%. You’re done again. The the question becomes you know why am I spending money on this? The whole question of how do you measure risk seems. Hard yet. You have a dashboard you are it. It seems to me you are measuring Risk. Can you talk about about metrics and the value of metrics and and you know how your how your your stakeholders your customers respond to being able to see. Sort of a number that moves around in in a predictable way reflecting risk.

Rick Kaun:
It yeah and that’s one of the things we haven’t talked about yet is is the value of this insight it. It. It goes from if you remember the example you really the plant manager who told me you’re do an assessment right? I already know I don’t patch and they don’t change passwords how does that help me. That’s the answer to this is once you have this data and you have the context you can have empirical discussions about trends and you’re right? You know the risk doesn’t always go up the risk does go down. Um, and so you can measure risk with what we primarily focus on today which is pure risk in terms of vulnerabilities on assets and the impact of that asset. But once you go up a level you can start to measure well I need to act in this behavior at this facility. But guess what this flagship facility over here that either makes us more money or is more potentially catastrophic if it goes wrong gets a different level of scrutiny and of support or funding. Um, and it’s all empirically driven now and you you mentioned some of the the regulatory standards. Um outside of the pure risk score we are getting more and more calls and more and more more and more engagements to be helping clients to to track and monitor compliance and are we doing enough. Are we showing the auditors are we showing the board are we showing senior management. Are we using the money wisely and are we making trends go in the right direction um things like the CIS controls we have a dashboard that shows here’s your hardware inventory here’s your software here’s your configuration here’s your users and you can look at it as a glance and show the CIO “look. We’re doing the right things” or look. We’re going the wrong way. We’d need more help.

Rick Kaun:
Um, and the most recent one they were quite proud of is actually that the federal government announced a CDM continuous diagnostics and monitoring that all federal entities shall be reporting. Um, we’ve actually been approved as the OT response for that. So We’re not only doing this internally within an organization or. With an ah with an organization towards helping them comply. But now we’re also starting to share with external and federal sort of entities to really consume this data. It’s really starting to catch on.

Andrew Ginter:
So Nate, let me let me dive into this just a little bit.
Visualizing risk is, you know, a measuring risk is is a is a big debate in the industry.
I gave you know some examples in in my question, but I don’t know if I heard my question made sense.
Now that I listened to it again, Rick obviously got it, but let me go through it once more.
Let’s say you know there, there are people out there saying, you know, risks should reflect, you know, what’s what’s happening in the world.
Well, there’s always new vulnerabilities.
Thousands and thousands of vulnerabilities being reported every every year if you count them, your risk indicators going up, your chart keeps going up.
Qualitatively, we get we get word that you know the sophistication of the adversary is going up.
They’re using more and more powerful tools.
They’re using more and more powerful techniques.
Again, the risk line keeps going up if you show the board a risk line that goes up steadily?
They ask the question why are we spending money on you?
You don’t seem to be having any effect.
So you can’t show them that risk line.
You know I’ve had, I’ve had people tell me what you should do is use something like the top 20 controls and say my goal is to get the top 20 controls implemented on every machine in my network.
So I see a risk, you know, I see a mitigation line going up, but eventually, you know, saying I’m getting better and better.
Eventually I’m going to hit my 20.
And I flat line. And again, the board goes you, the strength of your security posture seems to have flatlined.
Why are we spending money on this again? Have you not solved this problem?
Can we start spending less money on this?
And of course, you know you can’t.
That’s that’s the wrong measure as well.
But what Ricks has described here, you know their dashboard.
Imagine what the dashboard looks like.
New vulnerabilities are discovered in the world, and we do the risk calculation.
And some of them are relevant to our most critical systems and our calculation of risk goes up.
You see the trend line going up.
And the OT security team springs into action.
You know, presses the button, figures out which machines most urgently need to be patched, patches those machines or applies compensating measures to protect those machines and the risk line goes down saying good, good job you’ve you’ve reduced your risk. And then of course, there’s more vulnerabilities over time.
The risk risk is going up and down.
You can see progress.
You can see that the money you’re spending is, you know, in the absence of spending money, the risk would go up, unboundedly, and you’re spending the money.
And so it keeps coming down.
You can see, you know, that something good is happening.
This is a step up from, you know, waving your hands at the board.
It’s still a qualitative metric.
I mean, in the engineering space, you might be used to calculating like safety risk.
Mathematically, you’ve got a, you know, one in a million chance of a death at the facility, you know in the next year.
You don’t have that kind of mathematical precision, but at least you’ve got something and it’s something visual and it’s something that in a in a real sense makes sense.
So I, you know, I think this is a a step in the right direction.

Andrew Ginter:
Cool I mean that that is cool I mean you know risk is a dry topic. Um, a lot of people you know? Ah I I speak at conferences if you know I’ve proposed. Ah you know many presentations with “risk” in the title most of them don’t get accepted if I get accepted nobody shows up.

Rick Kaun:
Yep.

Andrew Ginter:
Because it’s you know people see it as as academic as as not actionable, but you know metrics people are interested in you know and concrete advice as to where you know the the highest value investment next is that’s huge. So um, you know, Thank you for for. For joining Us. You know these are to me These are are very positive developments in the in the field of the technology. Um, before we let you go ah you know did you want to sum up for us is there. Are there words of wisdom you have for us?

Rick Kaun:
Ah, things that I think are important whether they’re wise or not led the audience judge. But um, you know I’ve been in this business as I mentioned the top of the the call for 22 years and I’ve seen a lot of attempts at. You know the silver bullet and and you know was whitelisting in in one nine and its its other promises since then and and even some notable public speakers saying well we can’t patch why bother Let’s just give up and and and hunker down but the reality is that this can happen. We just need to be prepared to roll our sleeves up and get at it. Um. Why we’ve been avoiding addressing the the technical in security debt. We continue to amass and compile over adding more technologies and IOT and then being surprised at how much risk there is is it’s a bit baffling to me. Um, you’ve been on the circuit is long or longer than I have Andrew you know that some of the things we say we’ve been saying for 20 years and so people nod their head and write it down as it’s sage advice right? so. Um, I think that we’re seeing a turn that people are um, embracing the fact that yeah I really do need to get into it I really do need the data because I need to make informed decisions. There’s only so many people so many dollars and there’s so much risk I need to be I need to be creative I need to be precise and I need to be effective and so I’m quite excited at what we’re seeing. Um, and I would I would heartily recommend that you dig into how you actually get to those endpoints because that’s where the data resides. It’s where the risk resides and that’s where the solution ultimately resides as well.

Rick Kaun:
And if you do want to dig deeper. You know we have always taken these Case Studies, User Testimonials, Public presentations of of how some of these things are and from from the end users not from us our resource page. We just recently published a couple of new white papers on exactly these types of topics. But also again, some of the use cases that actual frontline industry peers of yours speaking to the audience here. Um that are doing this and realizing the benefits we have some that are accelerating 5 year programs down into you know, two and a half to 3 years and I think there’s some real valuable insights from people frontline you know not to talking heads like me and Andrew. Um, and you really probably should dig into that and you can also probably sign up for a webinar too while you’re there because we do those every month as well. So I hope this helps and please do dig into the educational content we have up there and if anybody’s curious. Please feel free to reach out.

Nate Nelson:
Andrew, that was your interview with Rick Kaun.
Do you have any final thoughts to take us out with today?

Andrew Ginter:
I guess so. You know, I, I was really encouraged by the episode.
I mean, you know, this is automation.
This is automation for security.
You know it’s, it’s a truism that our enemies attacks are getting more sophisticated because they’re automating their attacks because our enemies are using more and more sophisticated automation.
Here is automation for our defenses.
Here is a way to, this kind of automation can can, take a lot of time off of the analysis part of our defenses off of the the implementation part of our defenses.
You know, automating the defenses, sounds extremely useful.
This sounds like a very effective kind of automation.
So, I see this as good news as as a very positive development.

Nate Nelson:
OK.
Well then thanks to Rick for enlightening us today and thank you, Andrew.
For speaking with me.

Andrew Ginter
It’s always a pleasure.
Thank you, Nate.

Nate Nelson:
This has been the Industrial Security Podcast from Waterfall.
Thanks to everybody out there listening.

The post Risk in Context: When to Patch, When to Let It Ride | Episode 109 appeared first on Waterfall Security Solutions.

]]>
Hacking the CANbus | Episode 108 https://waterfall-security.com/ot-insights-center/ot-cybersecurity-insights-center/hacking-the-canbus-episode-108/ Mon, 26 Jun 2023 00:00:00 +0000 https://waterfall-security.com/ot-insights-center/uncategorized/hacking-the-canbus-episode-108/ The post Hacking the CANbus | Episode 108 appeared first on Waterfall Security Solutions.

]]>
In this episode, Dr. Ken Tindell, the CTO at Canis, joins us to talk about cybersecurity and cars. Modern cars have multiple computer chips in them and practically all use the CANbus standard to connect everything to those microchips. Ken explains and discusses the vulnerabilities and exploits that have been applied by car thieves to cars by hacking the CANbus, as well as what can possibly done to protect against such threats.

Disclaimer:

The actions depicted, and the information provided in this podcast and its transcript are for educational purposes only. It is crucial to note that engaging in any illegal activities, including hacking or unauthorized access to vehicles, is strictly prohibited and punishable by law. Waterfall Security Solutions do not endorse or encourage any illegal activities or misuse of the information provided herein.

It is your responsibility to abide by all applicable laws and regulations regarding vehicle security. Waterfall Security Solutions shall not be held liable for any direct or indirect damages or legal repercussions resulting from the misuse, misinterpretation, or implementation of the information provided herein.

Car owners are strongly advised to consult with authorized professionals, for accurate and up-to-date information regarding their vehicle’s security systems. Implementing security measures or modifications on vehicles should be done with proper authorization, consent, and in accordance with the manufacturer’s guidelines.

By accessing and listening to this podcast or reading this transcript, you acknowledge and agree to the terms of this disclaimer. If you do not agree with these terms, you may not listen to this podcast or read this transcript.

 


 

LISTEN NOW OR DOWNLOAD FOR LATER

https://www.youtube.com/watch?v=uR-tORcHqJA

About Dr. Ken Tindell

Dr. Ken Tindall - Canis Automotive

Dr. Ken Tindell is the CTO of Canis Automotive Labs and has been involved with CAN since the 1990s, giving him extensive experience in the automotive industry.

  • Co-founded LiveDevices, which was later acquired by Bosch.
  • Co-founded Volcano Communications Technologies, later acquired by Mentor Graphics/
  • PhD in real-time systems, and he produced the first timing analysis for CAN and also originated the concept of holistic scheduling to tackle the co-dependency between CPU and bus scheduling.
  • Worked with Volvo Cars on the CAN networking in the P2X platform and was one of the team that in 1999 won the Volvo Technology Award for in-car networking.

Today Dr. Tindell serves as CTO at Canis with a focus on improving CAN for both performance and security with the new CAN-HG protocol and upgrading CAN for today’s challenges. He’s also developing intrusion detection and prevention systems (IDPS) technology for CAN that uses CAN-HG to defeat various attacks on the CAN bus.


Hacking The CANbus

Transcript of this podcast episode:

Please note: This transcript was auto-generated and then edited by a person. In the case of any inconsistencies, please refer to the recording as the source.

Nathaniel Nelson
Welcome. Everyone to the industrial security podcast. My name is Nate Nelson I’m sitting with Andrew Ginter the vice president of industrial security at waterfall security solutions. He’s going to introduce the subject and guest of our show today Andrew how are you.

Andrew Ginter
I’m very well. Thank you Nate our guest today is Ken Tyndall he is the chief technology officer at Canis Automotive Labs, he’s going to be talking about hacking the CANbus and the CANbus is the communication system that is used, almost universally inside of automobiles.

Nathaniel Nelson
All right then with all right then without further ado here’s your conversation with Ken…

Andrew Ginter
Hello Ken and welcome to the show. Um, before we start can I ask you to say a few words about your background and about the good work that you’re doing at Canis Automotive Labs.

Dr. Ken Tindell
Hi yes, ah, my name is Dr. Tindell and I’ve been working in automotive since the mid 90s um I co-founded a company to do real time embedded software. Um, that was ended up being sold to Bosch. And ah since then I’ve been working on. Um um, Canis Automotive Labs and we focus on security of the CANbus inside vehicles.

Andrew Ginter
And we’re going to be talking about the CANbus. Can you say a few words What is the CANbus who uses it where do they use it.

Dr. Ken Tindell
Um, so so CANbus ah is ah I think it was created in the mid 80 s it’s ah it’s a field bus that’s for real time distributed control systems. It was created by Bosch for the car industry and today I don’t think there’s a single manufacturer that doesn’t use a CANbus in the car. Um, but it’s not just ah, just cars. It’s it’s been used in all kinds of places medical equipment, e-bikes, trucks, ships. Um, and there’s even right now CANbus orbiting Mars so it’s a very ubiquitous protocol.

Andrew Ginter
So Okay, and we’re going to be talking about CANbus in automobiles. Um, before we dive into you know CANbus in automobiles and you know some of the issues with it. Um, can you introduce the physical process I mean what does automation. In a modern car look like I mean you know there must be a CPU or 3 involved What what’s being Automated. How is the wiring Run. What’s it like automating an automobile.

Dr. Ken Tindell
Ah, yeah, so that’s a big question. Um, okay, so there’s a lot of CPUs in in cars more than just ah so but basically there are things called ah electronic control units there. The the main boxes that control things so ABS is one, engine management, stuff like that. Um. And then there are lots and lots of other CPUs that are, you know, little tiny processors that are sitting and talking on very low speed communication to those ECUs. So probably most cars have got more than 10, 20, 30, 100 CPUs ah, in terms of the main control units. You’re looking at twenty thirty forty maybe even 100 electronic control units in the car and they’re all connected together usually over multiple CANbuses because there are so many of these control systems and they run pretty much everything. There’s um, and. Each ecu will be connected to a bunch of sensors and then and a bunch of actuators and you may take sensor readings from across the CANbus to implement some application that then then controls local actuators. So a good example of this is the door modules that have control of the wing mirrors. So when you put your car into reverse the? Um, ah the transmission control system is handling all the ah the gearbox when it goes into reverse it sends a message on the CANbus saying what the gear is and then the door module pick up that message see that you’ve gone into reverse and can then alter the wing was to point down to the back of the car to help you reverse.

Dr. Ken Tindell
So it’s ah basically it’s a very very big distributed hard real-time control system.

Andrew Ginter
And I mean 1 of today’s topic is a hack you found somebody who’d hacked the CANbus. Um, can you take us into what you found and you know say a few words about. Why the hack worked how is a CANbus normally protected. How did how did this attack get around those protections?

Dr. Ken Tindell
Okay, so ah, so this this this this hack has been going on for several years. It turns out um that somebody made. Ah um, and understood and reverse engineered ah in the in the specific case we’re looking at Toyota vehicles and ah they made a box that when you plug it into the specific CANbus it. It fires off messages and messes with the bus so that the engine management system thinks the immobilizer has been disabled by the key even though there’s no key anywhere near it and then um. Then another side of messages will um, open the doors and the doors control system thinks ah yep, the key has told me to open the doors and that opens the doors again. Even though there’s no key in around and then they, yeah, they just drive off with a car. So um, it’s it’s less of an attack I would suppose in a security sense. It is but um, it’s a. Theft. It’s a ah device that somebody worked out how to attack the CANbus and then packaged it up and then started selling it to thieves all over the world.

Car thieves hacking the CANbus to steal a car

Andrew Ginter
I mean that’s horrible. Um, how did you come across this I mean how did did you find 1 of these on the black market. How did you stumble across this.

Dr. Ken Tindell
So A friend of mine, Ian Tabor is a cyber security researcher for automotive. In fact, and his car was stolen and I thought at first it was ah it was a trophy hack by someone trying to make a point against the ah Cyber Security Research Community. Ah, but actually it was just ah, a random theft. Um, that is so frequent that eventually it was going to come across someone like that. Um, and so he did ah um, he did a lot of detective legwork to try and work out what they’d done and eventually um, ah. Worked out that they’d broken into the the car through the through the headlights and they’d used a theft device and with some of his contacts. He was able to to find out the ah the theft device specifically and who sold it and then he bought one of those very expensive too. And caught me in to help reverse engineer the electronics and the and the software and the way the yeah it hacks the CANbus to steal the car.

Andrew Ginter
Wow. Um, so you know this thing is is participating in the CANbus. You said it it got in through a headlight I mean you know is every part of the do you need. Headlight on the CANbus. Do you need every part of the car I mean why is there a CANbus running out to the headlight. Why is there not just power running out to the headlight.

Dr. Ken Tindell
Yeah, because ah our headlights have not been on/off lights for probably 30-40 years. So the headlights have got multiple light bulbs and they dip and they have full beam. Um, lots of modern ones have motors that steer the headlights as you’re going into a corner. Um. Then there’s ah diagnostics. So if your um headlight lamp is failed the car knows this and can tell you as a driver that you’re driving around with a broken headlight. Um and then modern really modern. Headlights are actually led based with a grid of LEDs and they’re sent commands from. Another unit in the car that’s got a camera looking out to see where oncoming vehicles and pedestrians might be and then the beam is altered by changing this Ah this matrix of LEDs to not dazzle oncoming motorists. So headlights today are not, you know, a lamp with a switch. They are ah extremely complicated systems. Um, and because they are also sitting taking power um part of the power management of the car. You’ve got to be very careful where you use the battery. So when you turn the engine over um, an enormous drain is taken off the car battery. So one of the most common features of CAN is to say I’m just about to turn the engine over. Everyone reduce your power consumption as much as possible and then they all go into low power mode. The engine is cranked and then they all come back up wake up again so CAN yeah headlights are complicated things now and that’s why they’re talking digitally to the rest of the car.

Dr. Ken Tindell
Ah, and fundamentally this this this applies across the whole car. So many functions are now talking to each other I gave the example of the wind mirror and the transmission gearbox talking to each other and this is why CAN ah came in into being in the first places um in the old days. Um, if you wanted to do that wing mirror function. You’d run ah a piece of copper wire from the transmission box to each door module so that the electronics in the door would would move the wing mirrors and then there will be a wire for almost every every signal and ah.

Dr. Ken Tindell
In the early days of can I saw some charts from from Volvo with their projections of number of wires needed and the growth in the functionality of the car and they they worked out that by by the turn of the century then um, their cars would be almost solid copper because one of the wires clearly something had to give. And either you can just give up trying to make any functions in cars or you have to find a different solution and so the CANbus came along as a way of um, um, grouping all of these wires and then replacing them with a digital wire and in fact in the early days it was called multiplex so CANbus was ah was a multiplex solution and you had car departments called multiplex departments and that’s what CAN does is it. It goes around and ah 1 wire is used to to provide all of the yeah, the information exchange that used to be done with with separate wires. So instead of there being massive bundles of of cables everywhere which are not just heavy. And expensive. They’re also all the things that break and they fall the ends fall off and the connectors break out and the cables snap and so on so cars were going to become even less reliable as as these functions grew so CAN was a way of reducing cost and increasing the reliability and so that’s why it goes everywhere across the vehicle from. every single place where there’s a sensor to every single place where there’s a motor or some kind of some kind of actuator.

Nathaniel Nelson
I see Andrew I follow with it. You know you can’t have hundreds of thousands of wires running throughout the whole car until it becomes totally unwieldy. But it also sounds like we’re making things very complicated by having so many CPU. So what exactly is the the thing that reduces all the need for wires that makes things less complex here.

Andrew Ginter
Well I’m you know I’m reading a little bit into what Ken said, but you know in in my understanding of sort of automation generally um, his extreme example was if every signal that has to pass from any part of the car to any other part of the car is done over a separate wire. If. You’ve got you know a thousand. Ah um, sensors you know, monitoring stuff and actuators you might have a thousand squared wires. That’s the worst case I think a yeah, perhaps a more realistic example would be well. Why can’t we put just one computer in the car in you know. The yeah, the engine compartment and run all of the thousand sensors and controls into that computer and have that computer you know sense. What’s going on and send signals to the rest of the car saying turn the you know turn these lights on activate that motor in the in the the mirrors. Um. And I think the answer is that even if you did that that would reduce the the wiring but you know not enough so take sort of the the example of the light bulb that that Ken worked he said look it’s not a light bulb it’s it’s leds maybe you know I’m making these numbers up but let’s say it’s 75 leds and you need to control the leds. You know you turn on different leds when you’re cornering cornering versus when you’re you’re not actually moving the light with a ah little motor. You’re just turning on different leds in the bank of leds so that the light you know points in the direction. You need it to point.

Andrew Ginter
But if you’ve got 75 leds in the worst case, you’ve got 75 wires one running from each led back to the computer because the computer is controlling the power. It’s sending power over those wires to the leds. You might be able to reduce that a little bit because you might observe that you know there’s only. You know in the hundred different configurations of the light bulb. There’s only 23 banks of leds these leds always you know these four leds always come on at the same time those three leds always come on. You might reduce it to 24 wires carrying power that’s still 24 wires now if instead of carrying power from the central computer instead of that you stick a tiny little computer in the headlight now you need only 2 wires going into the headlight headlight 1 sending power to the headlight and the second one sending messages to the computer in the headlight. Saying activate this bank activate that bank you know and and you know you’ve you’ve gone from 28 wires carrying power to 1 wire carrying power and a second wire the CANbus wire carrying messages to the computer. And the computer figures out for itself where to send power within the like the the headline.

Andrew Ginter
Okay, so so I mean you folks investigated this. Can we talk about the solution? I mean if the solution is not running more wires? Um, you know if the hack you know did not actually exploit a vulnerability so there’s you know there’s nothing we can patch. How do you solve this.

Dr. Ken Tindell
That’s that a good question too. So I since since this story went to went crazy around the world I’ve had a lot of people suggesting their solutions and of course they they don’t understand the the car industry very well. So someone said well put a separate wire out to the headlights and then them. And then a gateway box that will that will route them and then then it will not allow non headlight messages but the trouble is um, you know, even if you do really? well and you get 1 of these little boxes added in which of course costs money it it might cost even as low as say $20 but if you’re making a million cars a year that’s $20000000 of cars. You know so over the lifetime you could be losing in money expense if you designed it that way of of you know, significant fractions of $1000000000 over the lifetime of the the car model. So that’s that’s why they didn’t do that kind of thing because it’s just just not cost effective. Um. But the CANbus has to go everywhere. So so the the kind of fundamental weakness is there’s very strong security between your key and then the smart key ecu as they call it um to authenticate the key so you can’t spoof a key and and so on which used to be a much more common hack attack. Um, but then the the the message from the smart key receiver to say I validated the key and now you can deactivate the immobilizer that’s unprotected and and goes on the CANbus. Um, so if you want to to address that it’s possible I guess to do some kind of special wiring in in.

Dr. Ken Tindell
In some very special circumstances. But that’s not a great solution because it adds up cost and and there’s reliability problems every time you have a cable like I said ends of the cables have to be crimped and put into connectors and that’s where they fall out and break. So So it’s not ideal. So um. Fundamentally the yeah the way to to address this is through through encryption of the of the messages on on the CANbus at least the the security ones so instead of sending a message to say to the N Engineer management system to say deactivate the immobilizer you send an encrypted message with a key. Not a driver’s key but and a cryptography key. That’s ah, that’s unique to every car and is programmed into ah the wireless key Receiver and is programmed into the energy management system and is programmed into the door controllers and then when it says um, ah the key has been Validated. You know that it must only have come from. Um, that that that ECU and it’s not some criminal push pushing fake messages in in through the headline an actor.

Nathaniel Nelson
Andrew what do you just mentioned there. It reminds me of the ad debate over ah encrypting messages from PLCs and why we maybe do or don’t do that.

Andrew Ginter
Yeah I mean in the you know in in heavy industry. Um, there’s ah people arguing about whether it makes sense to to encrypt messages. Ah you know, deep into control Networks Um, the usual arguments against encryption.

Are things like well you know to do strong encryption The you know the the tls style encryption. Um, it takes cpu power and these cpus are underpowered and they can’t do it. Um, you know or you know the cpus are focused on real-time response and if you distract them with. You know, crypto calculations. You’re going to impair real-time response. Um, you know a ah second criticism is hey you know we need to diagnose problems on these networks and if we can’t see the messages because they’re encrypted. We can’t figure out what the message is are we can’t diagnose the problems. Um. For the record the standard answer there is don’t encrypt the messages so you can’t read them but do and use what’s called ah a cryptographic authentication code so instead of a checksum saying is the message Authentic Did I lose any bits you know on on the wire because of electromagnetic noise. You do a a cryptographic. Authentication code which is like a cryptographic Checksum. It’s longer than a regular checksum and it not just detects missing bits because of electromagnetic noise. It also diagnoses whether someone is trying to forge a message so you can still see the content of the message for diagnostic purposes. But the ah you know the the authentication code is where the the bit of crypto happens. But there’s still the question of you know is the CPU powerful enough to do modern crypto but in my estimation you know the the real problem with crypto NPLCs has to do with managing the keys and.

Andrew Ginter
That’s actually my next question to Ken so let’s go back and and listening Paul’s

Andrew Ginter
So that’s I mean that’s easy to say um I mean it it it it. It actually sounds a little bit manageable I mean keys keys can be a real problem and if you’re a bank and you’ve got 12000000 customers how many you know keys have you got on your website.

You’ve got one really important key. That’s it um, because you’re authenticating to the customers in an industrial control system. You know if every programmable device has its own key. We’re managing thousands of keys in like a power plant. It’s ah it’s a nightmare.

Here It sounds like you’ve got one key in the automobile which sounds manageable, but you’ve got millions of automobiles you know driving the roads. Um, if you if you have ah a problem with ah you know one of these electronic parts in an automobile, you’ve got to replace it.

You’ve got to sync up the keys. You know what does key management look like? How big a problem is this and how’s it been addressed?

Dr. Ken Tindell
Ah I think that’s well that’s actually always the problem. Um that you’ve got to fix there’s there’s a saying that says. Ah um, ah crypt cryptography yeah is ah is a machine for turning any problem into a key management problem. Um, and that’s really true! Is ah ah these. Ah, the electronics in the cars has got most most microcontrollers they’re using in inside these ecus that they have hardware security modules that will do secure key storage and securely programming keys so there’s like a master key and you can program application keys in by proving that you know the master key. And then somewhere and in the the car makers’ infrastructure is ah is a database of all the keys. But obviously you know you can start to see some of the problems there if who has access to that database. Um, you know someone coming and cleaning the office can open the ah the draw and get out a USB stick and and that’s where the keys are stored well obviously that’s a terrible problem is the secure machine room and who has access to that and if you leaked all of the keys to all of the cars. Um in the world and that got out it would be a horrific problem. Ah. You you can see these kinds of problems already happening today. Um, and then you’ve got the other problem. Um, um, like you said with spare parts if you if you have a brand new spare part from the OEM. It’s come through. It’s in a cardboard box it goes through to the yeah, the workshop guys. They’ve got to program that with the key. Um.

Dr. Ken Tindell
Ah, for the vehicle. It’s going to be put into and um, that means they have to have some kind of secure programming system that connects them to the infrastructure of the car manufacturer and ah to the vehicle and then typically over the CANbus. We’ll be sending in key reprogramming commands. Um, that’s that’s traditionally not how cars have been maintained, not with live connections back to to to the vehicle Manufacturers own systems and if you’re if you’re building a car that can be serviced by anybody and spare parts put in from you know. When you’re out in the desert somewhere doing some kind of thing like that you haven’t got a live internet connection back to to anywhere. That’s a big problem. Um, it’s It’s quite hard to solve these problems. Um, and so I think in the end. Easy bit is the ah is what goes on inside the car for protecting these messages and the really hard bit is is managing those keys in a secure way that doesn’t open up um enormous risk for for compromising all of the vehicles on the road.

Andrew Ginter
Okay, and you’ve mentioned you know the the issue with insiders in the manufacture. Um, you know we talked about the ah the hardware in the car. Um, what About. Technicians I mean that’s another class of Insider I mean you know in in the past I thought you really you have to trust your mechanics I mean in the world of of you know Spy solar espionage. The mechanic is touching the vehicle if. You can touch the vehicle then to me you can do anything to it. You can plant a bomb in it. You can sabotage the brakes you can. You know you have to be able to trust your mechanic is that another threat vector here.

Dr. Ken Tindell
Um, yes, so so yes sort of obviously yeah, the mechanic can do all kinds of things cut your brake cables or break pipes or stuff like that. So. So yeah, so there’s a level of trust that’s inherent. Um, but 1 of the problems. Ah so certainly historically has been these tools are trusted to do things like um, create new clone keys when the customer comes in and complaints. They’ve lost a key or um, we’ve reflashed the firmware in in an ECU. Um, and what we have seen in the past is a spate of crimes where somebody in the workshop has a criminal friend and lends them a laptop and they go out on the street and they’ve been breaking into cars and cloning keys and stuff. Um, so the car manufacturers over time have first started to close that to. Loophole. Um, so now these tools have to authenticate themselves with the car manufacture’s own infrastructure. So your laptop will have a certain number of um accesses to a vehicle and it’ll be preauthorized for that and then um…

Dr. Ken Tindell
…that will expire So if if the physical laptop’s been stolen then eventually it stops but there’s also um, the keys because of the way the key management is done now for um, for for cryptography The you can secure end to end from the car manufacturers. Um. Ah, infrastructure right through to the little tiny piece of Silicon in the microcontroller in the ECU and nothing in between can snoop on that or um or fake messages through that. So It’s ah it’s a very nicely designed physical piece of silicon hardware. Um, and that that was designed exactly that way so that you can take out of the loop. Um these workshop tools to a certain extent. Um, so that if the if a laptop is stolen. It can be shut off from accessing the infrastructure database. So, I think to a certain extent. That that attack surface if you like of the workshop is has or is being closed as these as these tools and infrastructure is being rolled out.

Andrew Ginter
Well, that’s good news. Um, but you know help me out here I mean these hardware security modules I know them as as trusted platform modules TPMs. I thought that TPMs were only available in in the high end you know Intel and and AMD and, you know, competing CPUs um, not in something small enough to fit into a headlight controller. How universally are these are these TPMs available.

Dr. Ken Tindell
Okay, so so the automotive industry calls them. Um hardware security modules HSM and they developed a standard for these called secure hardware extensions SH so it’s an SHM, and that’s available on a lot of microcontrollers that are used in automotive so nxps automotive parts have them Renessance parts have them Infinian’s parts have them. Um, now they’re not available on the very very lowest end cheapest parts that you might use in some. Very very small application. But for most um, most CPU intensive ECUs. Um, these are available on on on chip. Um, and they um I’m not sure exactly how the the TPM concept is structured but the way the HSM in them. In automotive works is is it has a secure key storage so you can secure you can store keys such that the the software in the microcontroller can’t read them out and it performs a bunch of operations on those keys so you can say please make me an encrypted block please verify this authentication code is is correct. Um, and it also handles things like secure boots so you can store in there. The um, the expected authentication code when you run all of the firmware in the system through the the HSM. So then you can make it so that no hacked firmware will will run. You can only run authorized firmware that matches.

Dr. Ken Tindell
The numbers that have been programmed into that HSM. Um, and then it also includes this ah this end-to-end key management so that it has ah several types of keys inside the hardware Security Module. So. There’s like a master key that should never normally be used for anything other than programming New Keys in so the application keys. Are all different to um to the master key and the master key is used to authenticate messages to say please change the application keys to to this now there is an issue when you have that needs to participate in the encrypted communication a microcontroller that doesn’t have a hardware security Module. And so one of the things we have at Canis Labs is a software emulation of a hardware security Module. So It’s a software hardware Security Module. Um, so you could use that in ah something where you cared its not too much about the ah the security because the tack type is going to be um. Very limited So these hardware security Modules they’re so secure that if you took um the electronic control units out onto a bench top and you put all kinds of debug gear around them and stuff it’d be very very very very difficult to extract the Keys. Um. Now No, there’s no thief by the roadside trying to plug into the headlights is ever going to be able to dig out the ECUs and put them on a benchtop and stuff so for for this kind of CAN injection attack that that we discovered probably you don’t even need a hardware security module probably just just encrypting the messages is enough. Um…

Because there’s no realistic way that they can break into the unit to to decrypt the stuff.

Andrew Ginter
And a clarification there I mean um, you’ve talked about taking it out and actually extracting the key. Um.

In your estimation. You know how robust are these keys because you know what we’re walking around with in our pockets today in the form of a cell phone. The CPUs in those cell phones are more powerful than the supercomputers of ten or twelve years ago um you know how how strong are these keys? Is it. Is it possible to just brute force them?

Dr. Ken Tindell
No no that they’re using um a yes, um, with 128-bit keys there’s no practical way to bruteforcing a..and even if there was some some kind of brute force thing that would after so many weeks of service CPU time be able to do that. Which. And the future there might be um, that’s completely impractical for for um, the kind of theft attacks on cars. Um, so the application keys I think are are in practice very um very secure um the weakness I think is at the infrastructure end of somehow. Protection of that key database being um, being breached and then all the keys splurge out I think we had a recent attack with them where Intel managed to to leak the private key used ah to sign some of the firmware in their chips. So um I think in the end attacking the algorithm directly is usually. Not very effective. It’s going around the sides into the into the weaknesses there.

Andrew Ginter
Okay, and you know I study um heavy industry control systems in heavy industry but I occasionally dabble in the automotive space. I remember five six years ago I read a standard came across my desk for ah over the air firmware updates in automobiles was a new standard for from the industry and it talked about encryption from one end to the other and crypt this and crypt that here’s how do you do? The encryption. It’s got to be this strong and so on. Not a word about how the vendor the automobile vendor is protecting those keys and I’m going what? yeah I mean we might trust GM we might trust the vendor should we trust their website. You know, somebody breaks into gm. Ah, you know signs a dud piece of firmware and now you’ve you know you push that firmware over the air into millions of vehicles that just stop because you know the firmware is all Zeros but signed or something horrible like this um you know is anybody talking about you know to your example. The issue of stealing the keys from the vendor is anybody talking about how to secure those keys at the vendor.

Dr. Ken Tindell
I I don’t see ah a lot of that. Um, and and I think this is a general problem in in securities that we all have visibility of a piece of the problem. But um, very few people necessarily of course have expertise in every part of that. Um, and unlike. Lots of computing where abstraction is used to um to simplify problems so that you abstract away the complexity behind some black box. Ah in security it it doesn’t work that way very often and that that tends to be a problem is is is people have abstracted away from the problem of key management. You know. Ah, Canis Labbs were focused on the CANbus and protecting that and then um, yeah, somebody else has to worry about another part of the problem and you see you see this in standards quite a lot where they just say blah bla blah is out of scope. Um, because sometimes because it’s it’s too prescriptive to solve it in that standard. So it’s out of scope. So that the the baton is passed to somebody else to pick it up and in taking that kind of whole view. Um with the necessary level of details that you know goes below in and tick problem solved as well actually is it really is this and it’s it’s those it’s those gaps. Um, that that I think is where where lots of the um, the real vulnerabilities lie like I say to attacking an algorithm head on is ah is is rarely going to solve anything but attacking those gaps of like well this this thing was handed on to that person because it came from this thing here and this system picks up….

Dr. Ken Tindell
…something trusts it but I actually shouldn’t because this tiny tiny tiny thing was overlooked and you see this all the time in vulnerabilities is is that one little tiny particular thing I think we had 1 of a WiFi protocol Exploit recently where one particular tiny obscure part of the protocol. Didn’t specify that certain things should should have encryption and I think that’s that’s the biggest issue I’m not sure how to solve that though.

Nathaniel Nelson
Andrew feels like we’re drifting into the technical here. Is there. An example, you could give to sort of anchor this conversation.

Andrew Ginter
Yeah, sure. So you know the the question I asked was about a standard I saw a handful of years ago talking about how automobiles communicate in real time over the cell network with manufacturers. And the standard had to do with firmware updates so sending new software into you know some of the various hundred controllers inside the vehicle. Ah the attack scenario that I worried about is you know there’s a war in the Ukraine you know Russia’s invaded the Ukraine. Let’s say the Russians get it into their head. You know they’re a nation-state. They’ve got money. They’ve got talent they can launch you know, essentially arbitrarily complex and sophisticated attacks. Let’s say they get it into their head to ah cripple the transportation infrastructure in in. In the United States because of you know the United States support for the Ukraine. How would they do that they could break into one of the car manufacturers you know, pick your favorite car manufacturer that has a lot of vehicles in the United States and if they’re able to steal. The keys if they’re able to break into the part of the manufacturer’s infrastructure that creates new firmware. They could create a firmware of all Zeros so that you know when the CPU reboots it. It’s dead. Um, they could sign that firmware with the stolen keys…

Andrew Ginter
…they could push that firm or over the cell network into the vehicles and cripple. You know all of the vehicles that have that sort of generation of firmware from that manufacturer millions of vehicles. These might be trucks. They might be cars. They might be anything. And you know do it when the vehicle’s GPS  when the the you know the the controller that they’ve compromised senses that it’s in the continental United States you know this is the kind of really nasty attack that I worry about and Ken’s answer was yeah, that’s. Ah, piece of the puzzle that we’re not really talking about. He’s an expert on what happens inside the vehicle. The CANbus the standard I mentioned was a standard for communicating between the vehicle and the vendor and his answer was yeah, that’s that’s a different piece of the puzzle. What happens with keys inside the head of the vendor inside the development systems of the vendor is a different part of the problem as well and he’s saying there’s almost nobody in the world who understands the big picture and there’s probably gaps in there that need to be addressed. So that’s the bad news but you know we’re drifting out of both. Ken’s sweet spot. Expertise-wise and mine. So you know with that sort of example to get you worried. Maybe we need you know another expert on in in another episode but you know let’s let’s go back to Ken and talk about what’s happening inside the vehicle…

Andrew Ginter
So I mean it. It sounds like there’s good news and Bad. We understand the problem. There’s technology out there that can solve a lot of the Problem. What’s the status of this I mean for those of us who would like to avoid having our vehicles stolen Um, you know what?? what?? how. How high should we should we hope for this problem you know being solved either in new vehicles coming in the future or you know retrofits for our existing vehicles.

Dr. Ken Tindell
Yeah, that’s probably the key question here. Um, so so even if you solve everything in the future. There are many many vehicles on the road today. Um, and if they can be um, reprogrammed over the air so that they all roll to a halt at the same time on all the roads.

Dr. Ken Tindell
This is kind of neutron bomb effect of test destroying infrastructure. Um, so ah,  today there are some standards around that are being deployed. Um, so one of them is um is called secure onboard communication. Um, and this doesn’t do encryption but it does add authentication because encryption is hiding the payload and authentication is is validating it that it it came from the right place so they’re doing the important part first is they’re these um these messages are being validated. Um. And that’s being rolled out um cars there are cars on the road that are using this new and SecOC standard for for encryption of messages. Um, and ah most cars in the future I think are going to be using something like that or very similar. Um. So I think that part of it is probably fixed and as I said hardware security modules have been in silicon for a while now and um, you know this the second seat uses uses that. So I think I think on the target end. That’s okay, um, and then um. Ah, the infrastructure end and the the problem is I don’t know very much about the infrastructure end because I’m focused on the the embedded software and electronics end of things. Um, but we know how to manage keys ah to a certain extent. Obviously some very embarrassing exceptions making in the news…

Dr. Ken Tindell
…so I find it I find it very difficult to understand. Ah just how risky and vulnerable. Um the infrastructure end is going to be um I mean I’m not hopeful generally about IT security in this this space because we’ve seen so many of these things and these are just the ones we know about. With key leaks. Um, and what’s different between this and you know your login was compromised type thing is there this is hardware that that physically moves in the real world and has ah has very severe consequences if been attacked. Um. And particularly if you can do a mass attack where you can as as you said, just brick ECUs in ah in millions of cars at the same time because of some tiny tiny ah detail that was overlooked in the infrastructure end. So that’s where I am most worried about all of this It’s less to do with the ah the target end because thieves stealing your cars is um is not scalable. You know you’d have to have a million thieves all coordinated to try and to break the system and road network. Um, so. And your other question about what what’s going to happen to cars on the roads today that are vulnerable to being stolen. Um, that’s probably the question that most owners have the front of their minds. I mean I’ve seen suggestions that you should do steering wheel all locks like it’s 1999 again? Um, which I don’t like very much….We ought to be able to have nice things without them being stolen. Um, so that’s these these physical kind of things and there are immobilizers. Third -party immobilizers. Um I I haven’t seen immobilizers that are that that that the manufacturer approves of because if you start. Jamming things into the electronics of your car you can cause all sorts of problems. Um with that and then I have seen summer mobilizers that are smarter mobilizers that are connected to the internet through um through 3G and 4G modems and things. Um, and then you are now relying on the third-party sir. Ah, security measures to stop people getting into your vehicle remotely so you can end up causing a bigger problem than you fix with that. So so the real solution is the the the OEMs need to take something like our software hardware security module for things that were made before these chips existed put that in place. Um, and then issue a firmware update um, now that is not like an easy thing either when you pushed out firmware say into an engine management system and it’s got to have our software in there for example, um, everything has to be retested. Um, you know these are critical pieces of software. You don’t just make a change the code compile it and then and then send it out to all the workshops to be burned into all the cars around the world. That’s that’s not how it’s done. So um, we wouldn’t we won’t expect ah a software update to be very quick because responsible car makers take a long time to revalidate all their software…

Car Thieves using the CANbus to hack into the vehicle to unlock it

Dr. Ken Tindell
…but in theory it should be possible and um I’m I’m really hoping that this can be retrofitted to existing vehicles.

Nathaniel Nelson
You know I thought this is a fun topic but the way that Ken is putting it. There sounds rather grim.

Andrew Ginter
Yeah, well I asked I asked Ken a hard question. Um, you know it’s the kind of pivoting attack. You know, bad guys taking over a cloud service using the compromised cloud service to get into power plants to get into railway switching systems that have you know industrial internet connections. This is the kind of question that I face with my customers in heavy industry all the time and I thought it was probably relevant to this industry. But um, you know, Ken’s answer basically was yeah that sounds worrying but he’s an expert on what happens inside the vehicle you know I study what happens in other industries. Neither of us is really qualified to comment on whether this is a realistic attack in this industry or whether there’s mechanisms in place that we’re not aware of to deal with these risks. Um, so you know to me it’s an opportunity to get someone from the manufacturers on the how and maybe speak to that.

Nathaniel Nelson
Yeah, I’m actually surprised that I can’t recall top my head anybody from the manufacturing side of the automobile industry that we’ve had on in recent history.

Andrew Ginter
We may have had a guest many episodes ago. But yeah, it’sah, not an industry we’ve dived deep into and I would welcome an opportunity to do that. You know we’re past 100 episodes now bluntly when we started this podcast! You know I had my own sort of little specialization of of you know, heavy industry power industry rail switching. Um and I thought naively that you know that was most of what there was to talk about and you know it’s been 100 episodes I’ve learned stuff in every episode the elephant that is industrial security is bigger than I thought it was.

Andrew Ginter
A word of clarification on the software. Update if you push out a software update that ah you know does this Authentication. You would have to hit every device in the vehicle at the same time would you Not? Or could you do a partial update and hit you know 90% of them and if you miss 10% of the CPUs it’ll still work but you would you know a it might work would it be effective.

Dr. Ken Tindell
That that’s very good questions is for for anti-theft. Um, it’s a very very small for example in the total of 4  you would need to update 3 you use the the doors the key radio key receiver and the engine management system or. Possibly instead the gateway that relays the message onto the engine management system so that would be 3 ECUS. They’d all have to be updated together. Um, because otherwise they need to be running on the same versions that would that had that and it needs to tap into the the key management infrastructure. Um or else some very lightweight version of key management that would. Ah, be good enough just to stop thieves. So but the car manufacturers as I said they’re already rolling out some of these more advanced things that already have the key management infrastructure as part of that solution. So I think you could probably just connect up to that that key management infrastructure. And then make a software update that would go to 3…. 3 ECUS in the 4 case. Um in general this this is of course ah a problem in general of software updates when you’re updating a distributed real-time control system if you put firmware um into some of these issues and not into some of the others. Um, and then something on the network has changed to add a message or to add some content or change the meaning of content. Um, it’s a complete mess! Um and and updating all the firmware so that it all is all updated or none of it is updated um is actually a real problem and….

Dr. Ken Tindell
…this is another reason why? yeah manufacturers have kind of been reticent about over the air updates is because there’s a lot of ways. It can go wrong. Horribly wrong? Um, and so they’re very very cautious because the consequences of it going horribly wrong at the same time everywhere are potentially enough to to sink a company. Um, if you think about um, a piece of firmware that’s gone in that has ah a date or a mileage related bug that somehow causes the over-the-air flash programming to fail and to get triggered and erase the flash firmware. but not have new firmware then um, you’ll find that cars are just rolling to a halt as with with like broken engine management systems all over the world all at the same time. Um, it’s a very serious problem. So. If you start to do a risk analysis of of over the air updates. It’s not an easy thing to to fix with without risk I mean obviously we don’t care about risk and you just want to do things for publicity or whatever then you just go ahead and do it and see what happens but responsible manufacturers really are very concerned about how to do over the air updates very carefully. You’ll see that there was a story went around, um, everyone was laughing I think it was BMW wouldn’t do ah ah so over there software update um without the car being parked. Um on the flat if it was parked on an incline the software update refused to work…

Dr. Ken Tindell
…and everyone thought this was very funny but actually it’s a sign that of just how seriously they’re taking it when you’re doing a software update the firmware update process. Um, ah might go wrong. Kind of catastrophically crazy wrong because it was a bug. And it might start randomly writing to IO ports and one of those io ports might be the um, the parking brake release. So either: You have to engineer the entire firm or update process to a safety critical level or you have to make sure the car is in a safe state before you start that process and in a safe state means not parked on a hill wherever if the software went wrong and the car would roll down the hill. Um, so that’s just 1 example I think of people that take it very seriously and have done their risk analysis. So. It’s not really anything to be laughed at although I can see it is is amusing.

Andrew Ginter
Wow Um, you know it’s It’s a big problem. It’s good to hear that there’s progress. Um, and you know I’ve learned a lot. Can can you sum up for us though. What what should we take away? What’s the sort of what’s the big picture here.

Dr. Ken Tindell
I Think that the real thing I want I wanted to get across is that the car industry isn’t stupid isn’t full of dumb people making dumb decisions. Um, all these decisions are made for very good and practical reasons and if you think a problem is easy then. Probably you don’t know the constraints. Um and ah, ah these things All all are are being put in place with a measured level of risk knowing what could happen if things go wrong. So I Think that’s the the big takeaway is that um. It’s It’s a very hard and difficult problem. They’re trying to solve.

Dr. Ken Tindell
Yeah, so if people want to understand these constraints more and understand the automotive industry I write a blog. So I recently posted ah um about how over the s software updates work and the particular problems of the current industry. So if you want to learn about that. Um, and how CANbus works and the constraints that it has to to meet are very very very different to what people are used to in computers and surfers and ethernet switches and stuff so have a look at my blog site. Um, if you want to find out more about the car industry. And you can contact me say on on LinkedIn very easily if you want or you could visit the Canis Labs website at canislabs.com and have a look at our encryption software.

Dr. Ken Tindell
Andrew that was your interview with Ken Tenddall let’s take us out here. I’ve got 2 questions for you: Number 1 how much do I have to worry about my car being cyber stolen? And number 2 how much do I have to worry about everybody’s in general?

Andrew Ginter
Um, well I heard sort of good news and bad news on that front. The the good news is that you know Ken is reporting that in his experience. Manufacturers are very cautious about updating firmware in vehicles because of safety concerns. Um, you know and you know in terms of sort of sort of mass firmware updates malicious firmware updates you know, hopefully the vendors are just as concerned about controlling access to their keys so that. You know, malicious actors can’t use the firmware update mechanism against us that that whole process is so safety critical that you know hopefully they’ve got that under control, but we would need a sort of a guest from the manufacturer to explain that part of the world to us. Um, the bad news. Sounds like in the short-term um the manufacturers because it takes so long and it’s so difficult to you know, prove the safety of these firmware versions. They might be reluctant to issue a short-term.

Andrew Ginter
Software update to try and solve. You know, try and insert some of the the crypto even on a software level um to deal with this theft problem. You know it might be that by this time they get that whole business tested and ready to roll out. It’s 2 years from now and well bluntly the thieves aren’t stealing these cars anymore. Are going to be updated and the new cars are coming out with the the hardware authentication built in. So um, you know, maybe people with new cars today worried about theft need to use the immobilizer for a year or 2 and you know then by then hopefully we’ve got the problem solved. Oh.

Nathaniel Nelson
All right? Well thanks to Dr. Ken Tindall for speaking with you Andrew and Andrew as always thank you for speaking with me.

Andrew Ginter
It’s always a pleasure. Thank you Nate.

Nathaniel Nelson
This has been the industrial security podcast from Waterfall. Thanks to everybody out there listening.

The post Hacking the CANbus | Episode 108 appeared first on Waterfall Security Solutions.

]]>
Saving Money and Effort Automating Compliance | Episode 107 https://waterfall-security.com/ot-insights-center/ot-cybersecurity-insights-center/saving-money-and-effort-automating-compliance-episode-107/ Mon, 05 Jun 2023 00:00:00 +0000 https://waterfall-security.com/ot-insights-center/uncategorized/saving-money-and-effort-automating-compliance-episode-107/ The post Saving Money and Effort Automating Compliance | Episode 107 appeared first on Waterfall Security Solutions.

]]>
Apple Podcasts Google Podcasts Spotify RSS Icon

In this episode, Kathryn Wagner, the Vice President, Industry Solutions, Energy & Utilities at Assurx, joins us to explore the ways we can save time and money by automating compliance processes such as NERC CIP, the TSA Pipeline & Rail Directives, and other regulations.

Listen now or Download for later
https://www.youtube.com/watch?v=kv_RJSaMRNU

THE INDUSTRIAL SECURITY PODCAST HOSTED BY ANDREW GINTER AND NATE NELSON AVAILABLE EVERYWHERE YOU LISTEN TO PODCASTS​

About Kathryn Wagner

Kathryn Wagner is responsible the development, strategic vision, and tactical product roadmap for Assurx’s ECOS products which focus on compliance within the energy and utilities sectors. She also develops and manages partnerships and represent AssurX at industry events and conferences as a subject matter expert.

In addition, she’s also responsible for guiding the strategic development and expansion of the ECOS product into other regulated markets within the energy sector, such as companies involved in the exploration, management, and production of critical resources such as water, oil, and electricity.

Kathryn Wagner Assurx

Saving Money and Effort Automating Compliance

Transcript of this podcast episode:

Please note: This transcript was auto-generated and then edited by a person. In the case of any inconsistencies, please refer to the recording as the source.

Nathaniel Nelson: Welcome everyone to the industrial security podcast I’m Nate Nelson I’m here with Andrew Ginter the vice president of industrial security at Waterfall security solutions who’s going to introduce the subject and guest of our show today Andrew how you.

Andrew Ginter: I’m very well. Thank you Nate our guest today is Kathryn Wagner she is the VP of industry solutions for energy and utilities at Asurex and our topic is compliance. I mean compliance can be a very expensive process. We’re going to be talking about automation. How to automate some or most of this this compliance work so that it doesn’t cost us so much.

Nathaniel Nelson: All right then without further ado here’s your interview with Kathryn.

Andrew Ginter: Hello Kathryn and welcome to the podcast. before we get started can I ask you to say a few words about yourself and about the good work that you’re doing at AssureX?

Kathryn Wagner: Yeah, good morning Andrew I’m very happy to be here. I am Kathryn Wagner the vice president of industry solutions for energy and utilities at AssurX so I have a background in engineering and and also software development and management.

I have nearly thirty years of experience with systems integration and compliance and a bunch of industries used to be a bunch of manufacturing and now it’s mostly energy for the last eleven years I’ve been with AssurX helping our customers implement solutions for NERC and other quality and compliance related requirements while. Being a product manager for our for NERC compliance and related systems that focus on reliability and resilience I also help guide the strategic vision and seek expansion opportunities into other regulated industries within the energy sector or even other critical infrastructure sectors. A no doubt AssurX. So AssurX has been a leader in quality and compliance management systems for over twenty years we operate in highly regulated industries such as energy and utilities which is my part of it. So pharma and biotech medical devices manufacturing and food and beverage and and those are things I don’t really deal with all the time, but our company does.

Regulation and Compliance

Andrew Ginter: So thanks for that. we’re going to be talking about compliance management in just a minute, but I understand that you folks got started years ago in the space of quality management are the 2 fields related.

Kathryn Wagner: Yeah, Andrew their natural their natural evolution from one to the other so quality management involves things like managing documents processes procedures issues. Non-conformances CAPPA which are corrective and preventive corrective and preventive actions, audits suppliers customers changes risk workflows approvals. So all those things to meet regulatory obligations and optimize quality. So compliance management has similar elements but with a different language. That’s way I like to think about it. For example, in quality space. Manufacturers must manage their suppliers. They have supplier risk assessments contracts contacts what parts they supply communications with those suppliers and in the utility world. We must manage vendors so that requirements defined in 13 supply chain risk management, but it includes things like vendor risk assessments vendor contacts vendor contracts so the hardware software and service that that vendor supplies and vendor communications. So. It’s a different terminology but very much the same.

Andrew Ginter: So that makes sense and this brings us to our topic which is compliance management I mean I was introduced to the idea of compliance and compliance management with NERC CIP. Back in the day and of course you’re in the electric sector you know much more about this than I do can you talk a bit about NERC CIP. You mean you read NERC CIP and it on the surface. It looks like any other security standard. What is compliance in the NERC CIP context? How does this work?

Kathryn Wagner: Okay, Andrew so NERC CIP is all about cybersecurity. It’s cybersecurity as it relates to, energy folks so that is making sure that you have controls so that your power facility or your substations and your control centers so that they’re all secure so they’re not going to get hacked so that the grid stays up ultimately, and it really involves protecting the people processes assets and data that keep the grid running and the NERC CIP is really about the cybersecurity but compliance with the NERC CIP is what happens when the auditors show up.

You’ve got to be able to produce all the data and evidence that those auditors want as it relates to those CIP standards. So that’s going to involve. so that involves this thing called the ERT. So it’s evidence request tool the auditors over recent years they’ve produce the spreadsheet so that everybody reports in the same way. So it makes it easier both on the regulators and on the entities that are being in compliance. So that evidence request tool people spend hours and hours filling it out the first it’s it comes in two parts. So. The first part is a bunch of lists of data. So they list out their sites. They list out the cyber assets at those sites they list out all the people interacting with that who has access to those things and  then a couple other parameters. Physical security parameters electronic security parameters data storage locations. Vendors and other things but all that data is supplied in lists at the first part of the audit and then the next thing that they do is the auditors pick sample sets from those lists, and then they request more data.

So they have these they have at least 75 different reports that they ask for which is very detailed data on every single requirement that’s in the NERC CIP standards So some examples of those. Like was this location commissioned or decommissioned during the audit period. They might want to know all of the access authorization records for a set of individuals. All security patches that were released evaluated applied for set of assets evidence that the full configuration change process was followed for any kind of installation. Evidence that incidents cybersecurity incidents that the response plan was followed for each incident that they asked you about so all this data is very challenging to organize. It’s not trivial at all to pick out this data I’ve heard some horror stories from. Our customers – it takes weeks and weeks of pulling data from different systems. You know the learning management system The HR system the asset management system. Pulling all this data and then they have to manually. Cross-reference it and reformat it so that it fits in the ERT so that the auditors are happy.

Nathaniel Nelson:  Andrew I got not only bored but a little bit vex just like listening to that answer and all the steps involved all the files all the – you know it’s a lot is everything that Kathryn just described really like all the bureaucracy the documentation necessary. and it kind of makes sense that we’re now talking about this in the context of maybe automating some of these processes.

Andrew Ginter: As far as I know it is necessary, I’ve heard countless complaints about the amount of paperwork involved in NERC CIP and I mean. I struggled just when I heard or describe the spreadsheet much less the other 75 documents that have to go along with it. I have a little personal experience very recently with spreadsheets we were doing the threat report for the annual Waterfall threat report I needed to put a spreadsheet together my colleagues and I of really only like 100 incidents, security incidents, with a dozen columns and it took just forever to get that 1 rotten little spreadsheet organized and here in the NERC CIP spreadsheet, we’re not talking a hundred rows. We’re talking more than like several thousand if you have 700 substations and you know how many computers and network devices have you got in each substation. That’s a lot of data to be to be dealing with in a spreadsheet much less the other stuff. So yeah, it’s – you know – I’m, I’m feeling the pain here.

Andrew Ginter: Okay, so there’s a lot of data and it makes Sense. When you’re dealing with large amounts of data makes sense to automate that process. but can I ask you sort of a subtlety here. The people who are looking at automation for compliance. Is the main motivation here saving money reducing the cost of gathering all the data or is there something else that work. Does the machine – would a machine gathering of the data sort of do a more thorough job and, I don’t know, reduce your compliance risk somehow the risk of an auditor saying you have missing data.

Kathryn Wagner: Well absolutely companies want to save money and that is a huge motivator but there’s a bunch of different aspects to that the first aspect is kind of obvious they want to avoid regulatory penalties and I think everybody knows that with everybody in the industry knows that NERC CIP noncompliance can cause fines of up to $1,000,000 per day per violation. So that’s a lot of money and there has been some examples in the past there was one entity that got charged something like $1,000,0000 for cyber security noncompliance. The second motivation is really what is the cost of poor cybersecurity and that really says if you’re not secure then the hackers can get in and those hackers cost money whether it’s ransomware or they start controlling your equipment like they did over in the Ukraine couple years ago they can cause damages which of course costs money to go and fix that up. But not just the fixing the problem that those hackers caused but also it damages the utilities reputation and that’s a really subtle cost. It’s hard to put a finger on a number but it’s out there. The last thing that affects the cost and why we want to do some of these better management of compliance is a desire to reduce workload and improve efficiency without a good program people spent hours and hours preparing for audits and then doing compliance tasks.

I’ve heard over and over again over the years how their users hate doing compliance. They don’t want to do it. They save it to the last minute the compliance teams it’s hard to force them to do that work if you get a system in place then you make it minimal impact on the seams and then the compliance team has everything in a central location so that yeah I’ve heard that there’s been incredible savings preparing for audits because of having a good program Yeah, so there’s the 3 different ways that I feel that utilities are saving money with a program avoiding regulatory penalties having good cyber security and then reducing the workload of their employees.

Andrew Ginter: Okay, so automation makes sense. You know saves money makes the job more thorough, makes us more secure actually, but it’s 1 thing to wave a magic wand and say let’s automate the whole thing. It’s another thing to actually do it. What does this automation actually look like? How does it feel to use it?

Kathryn Wagner: Well Andrew the real goal is to make sure that you stay in compliance year-round. Not just waiting and till the audit to go find out if you were in compliance or not you need to be able to prove it at any point in time on short notice and that’s why people use compliance management software. Now any good compliance management software is going to include features for managing the compliance data and protecting it so that the right people get access to it and the wrong people stay out tracking responsibility knowing who’s responsible for what tasks for what regulations and then documenting that. Managing documentation and evidence managing risk issue tracking incident tracking and then the mitigation plans or corrective action plans to  resolve those issues task management and especially important is the notifications reminders and escalations. So if. Those tasks those compliance tasks are not getting done or not getting logged into the system that people are reminded and and people are aware and there’s visibility to those tasks so that they do get done on time. Audit reporting is the output of the compliance management system when. You’re dealing with NERC you know there’s 2 pieces of it. There’s the CIP evidence request tool that I talked about earlier for all that CIP data. But then there’s also the management of the RSAWS which is the older.

The others So the other NERC standards have to do these RSAWs. They’re reliability standard audit worksheets and they’re really filling in a narrative and listing out the evidence that they’ve collected to meet that requirement and those are time consuming so software will help pull that data together. And help you report on it when it’s necessary.

Andrew Ginter: Okay, so there’s a lot of stuff that needs automation. But how do you actually do the automation I mean these records Do you pull them from I don’t know the brains of the PLCs or do you, You know? How do? How do you? What does automation actually do in terms of gathering and organizing the data for you?

Kathryn Wagner: Ah, well there’s a lot of different ways that automation can help you and there’s a lot of different forms that that can take so let’s look at an example. Okay, so for one of the requirements says that. you have to verify at least once every calendar quarter that the individuals with active electronic access or unescorted physical access have authorization records. So you’re comparing. What they have access to based on access lists to you know what? They’ve been authorized to do. So that might really involve 2 different systems while many different systems because they have access it to many different networks or OT devices or 2 devices and so on and the access card system to get in the different areas of the plant so you can do you can set up the automation to help with that in a couple different ways. So one of those is a very manual way if you set up some sort of a scheduled task that wants a quarterer somebody is going to be required to remember to go out and pull the asset list manually from the devices. And then pull the authorization information from the that system and then manually compare those 2 lists together and see it took for any anomalies. So that’s awfully manual, but it is automated because they’re automating that task every quarter. You could also set up something that you have a quarterly task initiated.

But it uses integration to automatically pull that data from the various networks and other software out there or the devices themselves to get those asset lists and automatically pull the data from, whatever is tracking the authorization records and then either you could have a person. Do the comparison between the 2 now that they’ve automatically got the information or maybe you’re clever enough to put together some sort of computer program to do the comparison and perform that validation automatically as Well.

A last good example I have setting up automation to help you out is setting up a daily feed or daily pulling of information from those other sources pulling it into the compliance management software so that you always have the ability to report on or see. The 2 different things and make that validation and you could even go further than that and set up controls so that the system can detect some discrepancy between the two and it can alert on it send out emails or show it up on a dashboard or even initiate other tasks and workflows to get that accomplished. So that’s a good example of a couple different ways that you can do automation with within the NERC CIP environment, and I have a list of examples here things like polling the network for asset lists and open ports querying assets for baseline information. Connecting to an HR system to get your up-to-date employee information on the learning management system to get your training information patch discovery services to obtain patch information and then things like scheduling document review when evidence collection and tasks so a number of different ways to leverage automation.

Nathaniel Nelson: So Andrew it sounds like luckily a lot of this long and arduous process can be automated but is there anything outside of the scope here like what do you still really need to do by hand.

Andrew Ginter: Ah, a part so Nate there’s that’s a good question There’s there’s a couple of answers to it in terms of what’s possible today and what could be possible in the future. let’s take, just a simple rule. There’s a requirement to change passwords every I don’t know twelve months or eighteen months or something like this. and if I mean if a Plc even has a password but network switches have passwords firewalls have passwords a lot of gear nowadays. Has passwords may not be per user maybe be shared but still a password is a password and if it exists in the in the CIP world. It has to be changed periodically it’s 1 thing to ask the question of the device.

Andrew Ginter: Do you have a password who’s got accounts list the accounts on the device that that’s sort of a more common feature of devices that you’re able to figure that out but trying to figure out. When did the password change I mean does the device even keep track of when the password changed the last time is that even something you can ask the device so, some of the data can is some of the data is there Some of the data you just have to. Keep track of manually you got to make a note in your you know compliance tool or something saying I change the password because the device can’t tell you when things happened when was the last patch Applied. You might be able to ask the device which patches are applied but can you ask it when they were applied. so the that’s a long way of saying you know some of it. You can automate some of it. You have to keep track of yourself in your system. You can either keep track of it on a sticky note or you can keep track of it in in a software system but down the road. It seems to me that all of this stuff can be automated in the long run Now you might need the cooperation of the device Vendors. You might need to upgrade the versions in the device vendors. it. It seems to me. There’s sort of nearly infinite opportunity to like innovate and create new software to simplify this process here and it strikes me that over time you’re going to see more and more of that happen.

Because there’s just so much money being spent by the electric utilities on this compliance task and if they’re spending the money doing it manually. They are open to spending less money getting it automated and spending more money on automation On newer versions of devices that keep track of some of this stuff automatically newer versions of automation tools that can pull the data from devices. So it sounds to me like it’s an area that’s sort of ripe for innovation.

Andrew Ginter: So That’s a lot of stuff that that a compliance manager could do and you folks produce these products you produce and sell a compliance manager for NERC CIP among others. can you talk about? sort of not just what does your stuff do. But. In a sense. How does it do it. I mean if I say yes I’ll take 3 of the assurance things. What am I buying are seats in the Cloud or agents that that snuggle up to the plcs to to gather Data What does your system look like?

Kathryn Wagner: Yeah, so AssurX software we do have cloud options and on-premise options I will say that most of the NERC entities that use our software have it on-premise due to the sensitive nature of the data that they’re trying to manage and it is. Probably a little bit easier to secure the integration with those third party devices and so on and other software if you’re all on-premise. So what does it look like so we have a user interface which is browser-based and behind there. There’s a database and a server. you can configure those in all different architectures so that you have load balancing and failover and all sorts of things we typically have things like a development environment a testing environment and a production environment and our software. Yeah, we have the AssurX platform. Which has all the features to create solutions any solutions whether they’re energy solutions or life sciences manufacturing solutions that platform gives you the ability to create unlimited dashboards and forms but has the security it has the database layer and it does all the code or has all the code in it. Everything that you do with the AssurX is point and click drag and drop easily configurable etc etcetera and then we use those features on our platform to create the whole suite of these NERC compliance management solutions. So we call that eco a AssurX energy compliance system.

Compliance rubber stamper

And that is a full suite of solutions that does both the op NERC compliance management and the CIP compliance management and then it can be extended to do a bunch of other things as well. So what our customers do is they install, they get the platform installed then they load up our solutions some of them focus in one area some of them focus in a different area. We provide all of them and then the customers configure our system. AssureX is highly configurable, and they adapt the forms and the workflows. To meet their needs. Okay, so and that’s where you can do all of it without integration – a human is interacting with things tasks are assigned to humans to go and do things or you can start plugging in that integration to pull the data and interact with all the third party software. So that eco solution is focused on NERC compliance and other compliance management aspects and I do want to say that we’re expanding our offerings. So not only to do with NERC CIP but things like the TSA pipeline security directives. A lot of our customers are energy customers but they also do gas and that makes the gas pipeline very applicable and those TSA regulations are similar enough to the NERC CIP regulations that our existing solutions can be easily adapted to meet those needs.

Nathaniel Nelson: So in this episode we’re talking about Kathryn’s specialties energy. We talk about NERC CIP but what about other industries Andrew.

Andrew Ginter: Yeah, you might imagine that NERC CIP is what 12 or 15 documents by now with a lot of detail in in them. You might imagine that if you have sort of a compliance system set up for NERC CIP. You could use the same system for, other industries because if you’ve already got 15 standards in the NERC package is that not everything you might need for everyone and the answer is no I mean the TSA?

Andrew Ginter: You know like six weeks after the colonial incident came out with a new security directive for pipelines. and it was only I don’t know it was as long as 1 or 2 of the NERC CIP of the 15 NERC standards put together. So it was like only a fraction of NERC CIP but still it covered different stuff. Concrete example it talked about dependencies. It said if your OT system depends on your IT system then you have to get rid of those dependencies and if you can’t get rid of them. You have to document them. You have to report them to the TSA. Because every one of those dependencies means that if you cripple the IT network and you cripple the systems that OT depends upon then you’ve crippled the OT system because the OT system needs the crippled IT systems to work none of those words exist in NERC CIP this is sort of a new concept. In the TSA yeah in spite of the NERC CIP documents being much bigger than the TSA document.

Nathaniel Nelson: So then that might beg the question. if we have these characteristically different regulatory needs and standards and whatnot. are they equally or more or less automatable. You know like for talking about. Power versus water treatment or whatever would Kathryn’s kind of approach work in the equivalent way elsewhere.

Andrew Ginter: And that’s a good question and in fact I asked Kathryn that question so let’s so let’s go back to her and see what she says

Andrew Ginter: And you did mention the TSA directive and I mean I’ve been looking at the TSA directives over the last several weeks they seem very different from NERC CIP I mean they’re structured differently. You know the TSA directive has for instance, a section in the requirements that says your goal as a pipeline operator is to keep the pipeline running at necessary capacity. Even if the IT network is crippled and they don’t define necessary I assume it means necessary to the business or necessary to the society. A lot of these pipelines are critical infrastructure. you don’t have to keep it running at full capacity have necessary capacity I’m going how can you audit against that? But I look at the thing and it has that’s sort of a high-level requirement and then it’s got a bunch of much more specific requirements that seem sort of much more auditable. Can you talk about? This seems like a fairly different animal from NERC CIP. Can you talk about what can you track in that space.

Kathryn Wagner: Well Andrew I want to say that we’re not trying to control the OT nor the IT network or any of the devices that operate on it. We’re really focused on pulling in and gathering that data that we’re going to need for compliance purposes, and we also are able to coordinate activities that may result from. The interruption to the network or even just some changes to the network like firmware updates and security patches and access changes so on among other things. The TSA security directive mandates that you must have a cybersecurity incident response plan. Okay. This is very similar to CIP 8 which is the cybersecurity incident reporting and response planning so same idea you have to have a plan for dealing with things. Both of them require an update up to date documented plan for responding to cybersecurity incidents that includes the procedures for what needs to happen. But it also includes the roles and responsibilities of all the people that are going to be dealing with those incidents and then of course notifications to whoever needs to be notified after the incident.

And then it says within ninety days you must document the lessons learned from the instant and then update the plan accordingly making sure that each person who has a role in the plan is notified of those updates.

Andrew Ginter: Well thank you Catherine this is this has been great before we let you go can you sum up for us I mean what’s the most important thing to remember about the yeah the world of compliance automation.

Kathryn Wagner: Well compliance automation especially with cybersecurity things like NERC CIP. It’s challenging. There’s a ton of data to coordinate. There’s a ton of people to coordinate and it makes sense to automate those tasks and gathering up that data anytime you can take the human element out of it. You’re improving things so we do of course have the software to help you with that if you’d like we also have experienced people. You know we’ve worked in a lot of different industries to help with quality and compliance. So please reach out to us our website, of course is www.assurx.com and then you can always reach out to me on Linkedin if you want I’d love to hear from people and talk about how we can help solve your problems. So. Thank you Andrew for having me here today on your podcast I really enjoyed it. It’s been a lot of fun.

Nathaniel Nelson: All right? Andrew that was your interview with Katherine is there anything that you can take us out with today.

Andrew Ginter: Yeah I mean I remember seeing the very beginning of the compliance automation space when I was at industrial defender over a decade ago and I have to confess that at the time I really did not recognize sort of the business opportunity that that the space represented. I thought the big challenge back then was designing the security system making things secure not proving that you’re following the policy that that you’ve set up I thought that’s yeah I Just I was dismissive of it, I recall a younger man. but. You know this space to me is not going to go away. This is a space that you’re just going to see more and more demand for as regulations increase as the cyber threat environment gets Worse. We’re probably going to see more and more governments all over the world issuing more and more regulations and I’m sorry they’re all going to be a little bit, or a lot different but every one of them, every regulation is going to demand that you prove that you’ve complied with the regulation and as I said it’s not just a matter of sort of housekeeping, put some automation in there so that you can get rid of the horrible spreadsheets. But there’s opportunities to gather the data automatically from a huge variety of IT systems of industrial systems. This to me sounds like a space with a lot of opportunity because businesses are going to spend money on reducing their need to spend labor and money doing this stuff manually. So yeah, I think this is a piece of the industry that’s got a bright future ahead of it.

Nathaniel Nelson: All right? well with that thanks to Katherine Wagner for speaking with you Andrew and Andrew as always thanks for speaking with me.

Andrew Ginter: It’s my pleasure. Thank you, Nate.

Nathaniel Nelson: This has been the industrial security podcast from Waterfall. Thanks to everybody out there listening.

Previous episodes

The post Saving Money and Effort Automating Compliance | Episode 107 appeared first on Waterfall Security Solutions.

]]>
How Cyber Fits Into Big-Picture Risk | Episode 106 https://waterfall-security.com/ot-insights-center/ot-cybersecurity-insights-center/how-cyber-fits-into-big-picture-risk-episode-106/ Mon, 22 May 2023 00:00:00 +0000 https://waterfall-security.com/ot-insights-center/uncategorized/how-cyber-fits-into-big-picture-risk-episode-106/ The post How Cyber Fits Into Big-Picture Risk | Episode 106 appeared first on Waterfall Security Solutions.

]]>
In this episode, Dr. Janaka Ruwanpura, Vice-Provost, University of Calgary joins us to look at where cyber risks fit into the “Big Picture” of overall risk at industrial operations.

Listen now or Download for later


https://www.youtube.com/watch?v=ChRdWIGy0D8

SUBSCRIBE

Apple Podcasts Google Podcasts Spotify RSS Icon

THE INDUSTRIAL SECURITY PODCAST HOSTED BY ANDREW GINTER AND NATE NELSON AVAILABLE EVERYWHERE YOU LISTEN TO PODCASTS​

About Dr. Janaka Ruwanpura

Dr. Janaka Ruwanpura is currently the Vice-Provost and associate Vice-President of research at the University of Calgary and also a professor at the Schulich School of Engineering, specializing in project management. You can read more about Dr Janaka Ruwanpura on his Wikipedia page, as well as his LinkedIn profile.

Dr. Janaka Ruwanpura

How Cyber Fits Into Big-Picture Risk

Transcript of this podcast episode:

Please note: This transcript was auto-generated and then edited by a person. In the case of any inconsistencies, please refer to the recording as the source.

Nathaniel Nelson: Welcome everyone to the industrial security podcast. My name is Nate Nelson I’m here with Andrew Ginter the vice president of industrial security at waterfall security solutions who’s going to introduce the subject and guest of our show today Andrew has gone.

Andrew Ginter: I’m very well thank you Nate our guest today is Dr Janica Ranura he is a professor at the University Of Calgary he is the vice provost of the entire university. You know an associate but vp of of research. Um and you know he’s he’s a professor of engineering and project management and Janaka does a lot of work with risk very generally and so we’re going to explore today. How cyber risk fits into the big picture. Of risk you know inside of of engineering and construction and and other kinds of projects and and organizations.

Nathaniel Nelson: Um, then without further ado let’s get into the interview pause.

Andrew Ginter: hello Janaka and and welcome to the podcast. Um, before we get started can I ask you to say a few words about yourself and about the good work that you’re doing at the University Of Calgary

Janaka Ruwanpura: Um, thank you Andrew um, my name is Janaka Ruwanpura I’m currently Vice-Provost international and associate Vice President of Research at the University of Calgary at the same time I’m also a professor in the Schulich school of engineering. Specializing in project management. Um, if I were to give ah about my connectivity with the university of course I look after the global engagement of the University of Calgary which includes every aspect of it in terms of academic. Ah, research mobility and industry connections. Um in terms of the University of Calgary I think you know we are very proud that being a young university um, and and particularly last year we became truly number five in Canada as a top 5 research university. Um, at the same time I think the other key element that I want to talk about and and might be very interested to um to hear for for your audience is that for two consecutive years University Of Calgary is the number one startup companies produced which is actually a tremendous. Recognition and reputation for a university like us whereas when you look at even the top 5 the remaining 4 ah in terms of the scalability and the size are much bigger and then also they are older more than hundred years old.

Andrew Ginter: Yeah, cool I mean I am an alumnus of the University of Calgary I’m ah I’m a great fan of the university but um, our topic today. Our topic today is risk I mean you are an expert in ah, risk in the context of sort of engineering project management. Um.

Janaka Ruwanpura: This is the problem.

Andrew Ginter: You know we’re of course on the podcast interested in Cyber risk but cyber risk fits into sort of a ah bigger. Ah bigger picture of ah sort of overall risk Management. You’ve got the risk of I don’t know hurricanes and fires and who knows what? so um. You know you’re an expert on risk. Can you start us at the top. You know what is Risk. What’s the big picture of risk what? what are we worried about what should we be worried about.

Hacker Cyber risk

Janaka Ruwanpura: Yeah I mean Andrew the key element of the the way that I look at the risk is that I always use the word risk with opportunities. Um, and my expertise is mainly in the project risk management side of things and. Yeah I think the key is that we look at every possible thing for a project that what are those elements challenges that uncertainties that could create challenging and problems. Moving ahead with our projects. So I think that’s where we look at and say how do we convert some of those negative things such as the negative risks into a better opportunities where we will handle them. We identify them upfront we come up with a ah. Come up with good solutions to ah deal with them so that we can run projects with minimum impact of risks and uncertainty so that the projects will be successfully planned designed and implemented and I think that would apply. In in in your domain in the cybersec security area about how do we identify these risks um in advance and then how do we come up with a sustainable practical solution that would benefit the key stakeholders.

Um, and to to ensure so success at the end and say we have done a good job.

Andrew Ginter: So so that makes sense at a high level but I thought I heard you say that? Um, if you look at them hard sometimes risks turn into opportunities. Can you give me an example how how does I work.

Janaka Ruwanpura: Yeah Andrew like come in when I dealt with few risk analysis sessions with industry folks. Um, and I can tell you that in 1 occasion that we were doing a project in Fort Mcmurra and then we went through the complete risk analysis process which I’m going to explain to you. Ah, later ah been we identified few risks and then when we look at the impact of the risks to the project schedule. We realized that we could not maintain the time challenge in terms of you know the number of weeks or the or the months to complete the project.

And at that moment we felt that I think we need to look at some alternate designs so that you could cut down the time duration of the project. So the team was very committed. They look at some alternate designs and as a result of that. Ah, then then we did the same thing we we. Look at them and we simulated to find out what’s a new projectration and then we realized yup we could achieve the time du ratio right? And similarly um I can think of another example that which I can even speak about it. You know, openly the Olympic over. Restoration that happened about twelve years ago at the University Of Calgary um and when we look at the risks we had a big challenge in Twenty Eleven September because the facility was already committed for other ah clients. Ah, for practicing as you know that this is the place that we call the fastest size and when we look at each of the risks we were very very determined. The project was so committed and they came up with some creative solutions to ah to reduce the time duration of the project right. That’s why I’m seeing it so sometimes you look at the risk risk in a negative way. But if you’re committed to come up with ah a better solution to deal with the risk it created an opportunity to come up with a better design. Maybe more efficient design maybe a more sustainable design and maybe a more creative design that.

Help the project team ah to to achieve the outcomes of the project in terms of reducing the duration reducing the cost maybe enhancing the efficiency. Um and things like that. So. That’s what I mean by, don’t always look at. Ah, negative side of the risks look at how the risks can create additional opportunities.

Andrew Ginter: So Nate Janakov was saying there. Um, you know he gave an example of of you know, physical design physical risk simplifying designs. Um, you know in the cybersecurity space. A lot of people face that same problem when it comes to patching. Ah, you know imagine I don’t know a a power plant with you know, 4 generating units each unit has I don’t know one ah hundred Plcs um, if your plcs are on the same network as your control system which is on the same network as your plant system which has a firewall. Going to the it network and the it network in turn as a firewall going out to the internet. That’s a very highly connected environment. Um, any security assessor coming in is going to look at this and say you really need to patch really aggressively everything on your industrial network. Because it’s so exposed to the it network and to the internet. Um and patching is really expensive I mean you would have to take the plant down in order to to change the firmware on the Plcs. Um, and you don’t want to do that you want to keep producing power and you you need to test. These new firmware images. Extensively you know a lot of people look at this and say let’s not do that I know that was sort of the risk that was identified and the the obvious fix to you know we’re exposed to to attack is to fix the vulnerabilities. But what lot of people do is what’s called compensating measures.

They will put additional layers of firewalls in they’ll put additional layers of security. They might throw a unitdirectional gateway and they might air gap the safety systems so that they simply cannot be compromised and in this way they reduce this you know they reduce the risk by. Ah, you know changing the design in such a way that you don’t have to do the really expensive patching thing anymore. so so yeah you know what? what Jaakka saying here but makes a lot of sense.

Andrew Ginter: Okay, so so big picture risk. Um, when we’re when you’re looking at a project. How do you get started with with risk management.

Janaka Ruwanpura: The main key ah component is that um we you know, especially the risk management and I’m talking about large capital projects. Um, we always look at um at at the very beginning of the project How we. Come up with a better risk ah process to identify the risks and then we can quantify the risks so that we can come up with a better risk response plan and that’s where. It’s important to bring the key stakeholders who are involved in the project. Um, and who has expertise and knowledge about similar projects in the past and so that they could actually provide the input for the current project. So. Before I get into the steps of that I think we we always emphasize share the current information that that that you know about the project so that the the participants involved in the risk analysis can understand it. Come up with a better risk identifiification. Um, and so when you look at the risk identification we want to be you want to ensure that everybody is committed to identify the unique risks that are relevant to the project right.

Map of cyber risks

And that’s where we you know ask the question I mean like for example, when we identify the risks we have to think about right away how we’re going to deal with the risks. So sometimes we we ask a question the 2 important things we ask about how urgent how important.

So I can refer to as Steven Coy’s time management marix where um, you know Stephen Cove mentioned about 4 like a 2 by 2 metrics. Um in terms of one side is on the urgent that the other side is on the important.

So For example, we we get what we call the reactive quadrant where is actually urgent and important that means there’s an important risk.. There’s an urgent risk and there is a different way of dealing with that then there’s a quadrant number 2 which is. Not urgent but important that means we have time to really identify Them. We cannot You know we we can actually proactively deal with that risk so that that the team is aware what to do. And then comes a quadrant three which is not important but urgent right? Which is also you know I mean it’s ah it’s a bit of a reactive nature at the same time. It is difficult for us to reject it. We have to deal with it right. How to deal with it because it’s so urgent. Um, and then so we need to deal with it and then comes the um, the fourth Quadrant which is like not important and not Urgent. So There’s not no drivers here right? and Then. Do. We really want to spend time in in looking at them So coming back to that step number one as I said the identification is really really key if the team is not identifying the risks properly then we won’t be able to come up with a robust risk management plan.

Nathaniel Nelson: Yes, what he’s referencing there I’ve heard this for years now as the the name of it being the eisenhower matrix and it being applied to the trades of successful people. But I guess the point he’s making is that it could be applied. To risk more easily because it’s sort of universal.

Andrew Ginter: That’s right I think he attributed it to Stefan Covey who I think documented it in in one of his books I don’t know was it 7 habits of highly effective people I’m not sure. But yeah, it’s it’s a you know I I think of it as a you know. I I recall being introduced to it as a time management matrix but it applies to risk as well. I mean you know in the cyberspace. Um, you know what are we talking about something that’s both urgent and important there’s ransomware on the ot network this is an emergency all hands on deck fix this problem. You know that’s that urgent and important not urgent but important is the risk assessment just came back the security assessment just came back. Ah, you know we’re in trouble we have to fix these problems before ransom work gets into the control network. An example of you know, not important but urgent. Um, we urgently need to change all of the passwords in all of the devices in all of our substations. Why ner cip says you have to do this? Yeah, but those substations they’re heavily defended. We’ve got. We’ve got security and they’re 9 ways to Sunday nobody can get in there with a password and and mess with the devices. It doesn’t matter if we breach the standard. We risk a million dollar per day of non-compliance fine fix this problem fix it now I don’t care if it’s not important securitywise it’s urgent compliance-wise. So yeah, this this matrix you know.

This very much applies in the cyberspace.

Andrew Ginter: OK so we’ve identified the risks. We have our matrix of what’s urgent vs important. What’s the next step?

Janaka Ruwanpura: The next step is like this is interesting when I when I’ve done many other facilitation of risk analysis sessions. People come up with all kinds of risks right? Then the question is do you really understand the risk. If somebody ask you about this particular risk. Can you justify that This risk is relevant to this project. So um, so then we ask questions like do you have background understanding about this particular risk have you seen that happening in other projects. Do You think it’s relevant. Ah for this project. If it happens for this project would you be able to really analyze the problem because the reason why we asked a question is questions are if you identify a risk and say hey this is a high risk. It’s going to be like you know the impact is going to be quite significant. How do you determine those if you don’t understand the risk So. That’s what we call the qualification so we go through what we call a like a step after the identification to qualify the risks. Are you really champions of this risk so that we can. Take the identified risks into the quantification stage. But before that we need to make sure that you understand the risks and if someone else in the team ask a question would you be able to defend whether these risks are relevant for the project. Okay.

And once we pass that stage then we can go to the risk quantification stage to determine 2 things. What is the probability of occurrence of this risk and if that risk occurs what would be the impact of the risks. Various aspects right? and I can say that one for example, as I said um, you know my background is in in more on the capital projects. The 2 key things. We always talk about the risk management is how does this these risks impact. Ah, cost of the project how they impact the the time of the project or the duration of the project but you can also look at other things you know the impacting categories could be a reputation a safety the performance right. So you can actually say okay, if this risk happens let’s quantify to say that. What’s the what’s the probability or occurrence of these risks or or the impact. Ah, if these risks happen right? And that’s where um, especially when we are dealing with. You know risk analysis with you know, um, stakeholders involved in there. We want to make sure that everybody really understand the process of the quantification and and that’s where we always adopt a standard methodology to look at the probability of occurrence. For example.

If if ah if I say oh you know what I have this risk which is you know, um, very likely that it’s going to happen so somebody would ask the question What do you mean? likely can you define what’s likely so is your like for example, Andrew in but even between you and i. If I use the word likely in a subjective term what what does that mean to you and then if I look at that likely. How do I interpret likely so that’s where we always look at ends and come up with a. Ah, standard methodology that can say you know what this risk is you know, um, likely means it is 40% chance happening or is it 50% chance happening. So we come up with a ah quantitative ah methodology. To define what do we mean by a subjective meaning and then convert that subjecting meaning into a quantitative meaning.

Andrew Ginter: So that makes sense. Um, but we’re talking about you know risk we’re talking about things that might not happen. Um, you know I might say um, you know you’re operating a yeah ah large. Consumer Goods factory and competing with a ah you know the same kind of factory in um, in another country and that country you know has ah an active industrial intelligence ah wing in their in their government and. I Think it’s very likely that the large consumer Goods Factory you know Laptop Factory is is going to be targeted with a nation state grade and Intelligence Agency Grade Cyber attack. Um, you might disagree. How do you?? How do you resolve these things about events that haven’t happened yet.

Janaka Ruwanpura: I mean I mean this is where Andrew like there’s 2 things sometimes we you know when we look at the risk management and identification we identify which ones are the strategic risks which ones are the tactical risks in the project management domain. We consider the tactical risk management is available at the projects for the project people to handle whereas a senior management will determine the strategic risks even the existence of a project depends on how they look at the strategic risks and then if they think like the example that you have you given. Is actually more geopolitical type of thing which is actually a strategic type of risk which would decide whether we want to go ahead with the project or not. But anyway the challenge that I have faced in the quantification of the risk is that you know do we think the same way like for example, Lamina. You know I I sometimes use a criteria like it says likelihood of occurrence. We define them in 5 different subjective ways almost certain now what is almost certain means to you and me so for us to really understand the same consistency then we define and say. Almost certain means that it’s going to be anywhere about 90% probability. Ah likely means it’s a higher risk that we can say between 70 to 90% a possible means a 30% to 70%

Unlikely means 10 to 30% and rare means 0 to 10 percent so we come up with a framework that everybody is thinking along the same ah definitions so that when we identify risks and when we quantify. That we get consistency. Um from everybody and I think that is also important in terms of when we look at the impact. so so so I’ll I’ll give you an example on that as well. Like if you if you were to come up with a criteria for. Impacting a simplest way maybe on a range of 10 we can say you know what a 10 plus means it’s ah it’s ah it’s a catastrophic impact in terms of time impact or a cost impact. Um, and you can say a serious means. On a scale of 10 maybe 8 to 10 a moderate means anywhere from 4 to 6 a negligible negligible means 0 to two. So for example, we could come up with a criteria that actually has the words called catastrophic serious severe moderate minor and negligible. But then we can say what do you mean by catastrophic impact catastropphic impact means you know, depending on the project value like we could say that means we are talking about um a 10000000 additional cost to the project.

And we are also talking about six months delay by versus a negligible means you’re talking about maybe up to $10000 in our cost impact with ah ah one week of delay you see. I think we need to come up with a a subjective nature of the impact and also put a value associated with that one in terms of the cost and the time so that everybody in the team when we analyze the risks that there’s a consistent mindset about. 2 things the probability of occurrence and the impact and I’m sure Andrew you could think of many examples in in your domain in terms of how you define the probability of occurrence with relevant to the risks and then also how do we see the impact of the risks.

Andrew Ginter: So Nate the you know the key word I took out of that ah was was strategic sort of strategic versus tactical risks. Um, you know in in ah a large organization think I don’t know a ah power utility with 40000 employees. Um, lots of different people are. Involved in lots of different kinds of risk management at lots of different levels. I mean you know individual technicians who drive out to a high voltage substation. They do not touch anything in the substation unless they know that it’s been de-energized ideally that you know they’ve de-energized it themselves so that they don’t you know, get. Two hundred thousand volts you know flying through them and and killing them on the job whereas you know senior management would tend to deal with risks of I don’t know. Ah you know, ah an earthquake. Collapsing the the head office and having to relocate you know the the functions of the head office to a backup office um on an emergency basis. But you know at what level of an organization should you be dealing with cyber risk and I think um. The the answer that I heard sort of in terms of general principles is that ah the highest levels of the organization have to be dealing with strategic risk and you know strategic risk is risk that puts the entire existence or the mandate of the organization at risk. So.

You know in the example of the ah the computer factory that I gave to to Janakka Um, and you know the the yeah the interference with the factory by a foreign intelligence agency that’s trying to give their own factories in their own country a competitive advantage that interference. Could be existential. It could drive the the computer factory out of business.

For example, if ah, if pricing information has been stolen from the it network in this in this factory and you know this allows the the factories in the other country to you know, buy ten cents by a dollar undercut the price of the the products produced by by this factory. Or if you know they’ve the the intelligence agency has wormed their way into the operations network and has been tampering with the the devices. The plc is controlling production and you know introducing flaws defects into the product that have to be repaired at ah, a massive cost. You know you could. This with this kind of interference you could drive the the factory out of business. The company out of business that level of threat is something that needs to be discussed at the board level in my understanding that’s a strategic threat. You know, lower level threats of you know I’m sorry if we mess with our if if we don’t. Comply with with the law regarding I don’t know. Ah you know, electromagnetic emissions or different kinds of compliance risks might be dealt with lower in the organization. Um, but you know strategic lift risk has to be dealt with at the highest levels and lesser risks are dealt with you know. Elsewhere is is what I took away here.

Andrew Ginter: Um, okay, so so we’ve identified our risks we’ve in a sense prioritized them. We understand which are strategic. You know we’ve we’ve quantified them. What’s next. How do we deal with these.

Janaka Ruwanpura: So so now you could actually you could come up with a nice risk matrix and the risk matrix will tell us based on the probability of occurrence and the impact which ones are high risks which ones are low risks which ones are in the middle. And that’s where you look at and and say hey I mean we have a high risk which is the probability of occurrence is very high. It’s a catastropphic risk and then do I want that risks to come all the way down to a low level. We are. Want to make sure that you know it’s a rare occurrence of that particular risk or the the impact is going to be very negligible right? Or somebody said you know what? no let’s also look at it in the alternate scenario. We won’t see that that risks could. Could could oca like you know it could be possible to oca if that happens that maybe there’s a moderate impact because of that risk so that’s where we look at now a framework about risk response planning and that’s where the two keywords come back again. 1 that I mentioned earlier called the proactive versus reactive right? So and actually you know my domain when I do things I actually have a kind of a decision tree built into to both proactive risk management versus reactive risk management.

So what are the different options available when you’re dealing with a ah proactive risk management because we see that potential risk coming in but we do have time to eliminate the risk or to mitigate the risks. Or to accept the risks or or to transfer the risks the the 4 things that I can I can elaborate on that. But if you’re now dealing with proactive versus reactive. How do we deal with you know I’ll give an I’ll give you like you know, ah kind of a simple ah decision tree. We can actually say you know what. The current probability of a particular risk is about 80% but we have 3 choices we can eliminate it that means there’s 80 % chance we want to eliminate it like we want to make it into a 0% that we will never see this risk. Okay. Or we can say you know what the current probability is eighty I mean let’s try and mitigate to about ah a 20% ah probability a 10 % chance of this risk happening right? So we will. What can we do proactively to mitigate this risk. Oka oh we can say know what I think this is kind of a risk that um I mean in in a project environment. There are various key stakeholders in India let’s say we have an owner or a consultant or a contractor or are the parties and say you know I think for this particular risk.

It may be better for us to transfer the risk to a party that could better handle this risk and so we can think of 3 options eliminate mitigate or transfer depending on the nature of the risk. But if you want to look at a reactive nature of risk the word eliminate does not exist because you know reactive means that something has already happened and you cannot eliminate it now. So your choices are either to mitigate the impact of the risk which means that you know. Through the risk Analysis. We identified if this risk occur. It’s a $100000 impact but I can mitigate this one by maybe spending maybe $60000 so that the the impact could be cut Down. We can even think about it and say how do I mitigate the impact of it. Maybe we do something that that it will not have the same $100000 impact or you know what? Yes, we can see the signs of this risk. But I think rather than. Me as a stakeholder handling the risk I could probably transfer these risks into another party who has a better authority or the accountability to handle the risk and we could do it in our transferring the risk and then handle it that way. Oh you know what? the risk has already happened.

There’s nothing much we could do it. Let’s accept it and deal with the problem right? I mean when you are you know I’ve also done some work in the disaster area right? You know particularly the natural disaster area with respect to you know tsunamis and then also um. The Tornadoes um and and that’s where sometimes you know you have to accept the impact of it I mean it, you know it happened and how do we deal with it now. Um, so so depending on the nature as I Said. Proactive was as reactive you could come up with a ah decision tree that will that will show different options and also will show the consequences of of those options to the project so that you can make it successful and dealing with. The risks flow.

Andrew Ginter: So I mean one of the things that you know now that we’ve had some of the big picture here. 1 of the things that always always puzzled me is when you’re doing um you know I I get deeply involved in cyber risk management but not so much you know management of the risks of earthquakes or of you know. Fires or of you know pandemics who knows what um and so you know if you’re let’s say you’re building I don’t know a hospital the systems that you’re putting in place have to protect the confidentiality of patient information. The design for the structure has to address the risk of earthquakes in the region because we can’t have the structure collapsing on all the patients. The design of the electric system has to ah you know allow for backup power supplies if the the main power supply fails because you got to keep your patients alive and electricity is is used for that so you got. You got different kinds of risks that you’re managing. Do you ever have to trade off 1 against the other and say this one’s more important I’m going to focus on it. Um, you know the other ones I’m just going to accept. Ah or you know is is something else going on here.

Janaka Ruwanpura: I mean Andrew I think it it took 2 different ah things to look at it one is that um, if we identify exactly the same 2 risks that you mentioned if they are important if they are. Um, that been identified in in our risk matrix through the probability of occurrence and the impact has been critically that we need to handle it how we handle it proactive versus reactive with 2 different things. But also the second one is at what stage this could happen like you know. Is it happening in the design stage. It could happen in the in the construction stage or is it happening in the commissioning stage. So if they’re both important and we need to tackle them. We don’t tradeoff we deal with different strategies to to deal with it right? You know you know one could be be proactively trying to eliminate that. Maybe the other one could be. We will be reactively mitigated right? So that 2 different things. Um I mean I think you know, um as the time goes like you know cyber securityity related risks are really being critical. In many of the engineering and construction projects because I mean the example you gave in in hospitals ah research facilities universities are becoming really critical now. So that you know we don’t trade off but if it’s important and if it’s high, then we we must find solutions to deal with that.

Andrew Ginter: So Nate, the question that sticks in my mind, at at waterfall we work with you know, heavy industry we work with people who are are dealing with you know, powerful, dangerous physical processes. You know they deal with risk every day. Um, and what I’ve heard from time to time from from different stakeholders in these organizations. You know, depending on the organization is you know Um. Andrew we’re we’re not going to worry about cyber for now you know we have bigger fish to fry and they talk about other risks and this was in a sense. You know my goal in in bringing janaka on is to try and understand how does cyber fit into the bigger picture and what I what I just heard him say was look. Andrew if you’ve got a strategic risk if the existence of the organization if the mandate of the organization is you know has faces a serious threat look you have to deal with that. The board has to deal with that. The executive has to deal with that. You cannot ignore material risks.

It doesn’t matter if you have lots of risks on the table you have to at least think about every one of these risks. Um, and you know that’s an insight I didn’t have before that you know. You know the the folks deal with you know, senior decision makers that deal with the risks you know major risks due to fire due to earthquake due to cyber you know, sort of independently. But you know it still begs the question where did that question come from and this is what you know? let’s let’s listen back in again sort of my my next question is is a little bit clarifying in terms of when can you trade stuff off and it you know it turns out it has more to do with different threats that have the same consequence in a sense. It’s the same risk as opposed to different risks. But you know if you’ve got different important risks. The lesson here is you have to deal with each of them.

Andrew Ginter: Instead of talking about risks with very different outcomes leaking patient information versus the building collapsing um, can we talk about risks that in a sense have the same. Consequence. Ah you know a solar farm might have motors to move the solar panels to track the position of the sun and they might have those motors because ah if you if the motors are working properly. They produce. You know the the farm produces twice as much power in a day if ranssonware gets in there and cripples the the computers that control the motors. And the the panels freeze you only produce half as much power as you expected for the day but you also might have mispredicted the weather I mean the weather is variable sometimes it’s cloudier than you expect and you only produce half the power in the day that you thought you would um, you know you might. Have a cloudy day dozens of times in the year you might have a ransomware incident once every two or three years when you have in a sense the same outcome of different causes of risk. Is this a time where you might legitimately say I’m going to trade off how much money I spend on one versus the other is the you know when is this what makes them comparable?

Janaka Ruwanpura: Yeah mean and I I think that that’s where um I give that scenario called if then scenarios like. For example, you could isolately look at each one of them individually or you can look at them in a combined way like for example, you know what? if a ransom were. As well as a cloudy nature would have a more cumulative impact um to the to the far right? versus you cannot look at individually in all the the cloudy situation. I mean as you said the weather is very random. Maybe we don’t know that one versus run somewherem. So that’s where we have I think that’s where the team needs to look at by looking at all those possible risks coming of its scenarios and you look at those scenarios and then. That’s where the tools like simulation or decision trees or as I said this analytical hierarchy process like Hpa we can evaluate each of these scenarios and see what’s the impact and then maybe as a result of that you could even come up with a better risk management. Strategy and so and that’s a beauty about but the key is it’s a committed effort to identify these scenarios when you identify the scenarios you can actually um, you know, analyze it and then come up with the better ways of handling.

And then that also will determine. You know what we probably have to practically deal with these things. Maybe we need to invest upfront to deal with it versus you know, um, looking at the reactive scenarios of managing risks.

Andrew Ginter: Well thank you Janaka this has been this has been educational for me. Thank you so much before we let you go um, you know. What should our listeners take away from this episode. What’s sort of the the number 1 takeaway for you.

Janaka Ruwanpura: I mean the key message that I want to make I want to pass that one as an academic as well as somebody who had dealt with industry and work with industry on the risk management side of things. I’ve seen people are making a commitment to do a proper job of a proper risk management process where sometimes I see them as a procedureal thing or a ad hoc thing. They won’t have the commitment they did simply doing it because they have to do it. So therefore my message is that if you it’s really really important and particularly in your domain about the cybersecret area to make sure that we do a proper risk analysis to ensure that we identify them. Really understand them. We qualify them. We quantify them. We come up with a better risk management risk response options. We look at various scenarios of if then scenarios to see whether like what’s the best way of handling them and that’s where we can help from the University Of Calgary I mean we have. We have experts here in terms of the cybersec security area at the University Of Calgary that we have you know a 2 of a computer computer science department. Um, and then and then to our um schlix school of engineering and we have experts actually in other areas in the faculty of law faculty of arts.

Um, in terms the policy side of things as well as we have experting experts in the risk management through the project risk management site through um through the shulik school of engineering with center for project management. Excellent. So There’s lot of things we can. We can help but to support. Ah, the Cyber security area and then I hope that my message is properly relate to you in terms of make a commitment to do a better risk comprehensive ah process and you will be happier and at the end of the day.

Nathaniel Nelson: So that was your interview with Janica rawaurra andrew do you have anything to take out this episode.

Andrew Ginter: Um, yeah I mean I’m I’m very grateful to you know? Dr Janaka Ruwanpura for joining us. Um, you know I don’t know I might have mentioned I’ve been writing a book on you know, ah 1 of the big topics in it is cyber risk ah for years now I’m I’m hoping to be done by october. Um, but something that had confused me. Ah you know time and again is is talking to people doing risk management and you know hearing stories like look um you know we? Yeah, we have bigger fish to fry than cyber. We’re not so much worried about cyber taking down one of our high voltage substations. Ah, you know we worry more about squirrels eating through the insulation getting electrocuted frying themselves and shortcircuiting everything and shutting down the substation and I’d always tried to you know understand how does how does that fit into the big picture does this really make any sense and what you know? Ah what? Janaka cleared up for me was look strategic risks important risks you have to deal with them independently if they’re important they’re important you have to deal with them. You can’t trade off you know the risk of a fire against the risk of an earthquake you have to deal with these. Um where you can legitimately trade off is when you have multiple threats. That have the same outcome. So if you’re if you’re so if the cyber scenario you’re looking at is one that would take down one substation the same way that a squirrel would eat through the insulation and take down one substation.

It’s reasonable to say how often do squirrels do this How often do Cyber do this is is really worth is this a problem worth solving if instead your cyber scenario could take down the entire grid. You know that’s a different Animal. You can’t compare that to squirrels. It’s a different consequence so that that bit of clarity is something that. Had you know, confused me for a very long time and I’m I’m grateful to Janaka for for you know, clearing that up for me.

Nathaniel Nelson: All right then with that thanks to Dr Rawanpura for speaking with you Andrew and Andrew as always thanks for speaking with me this has been the industrial security podcast from waterfall. Thanks to everybody out there listening.

Andrew Ginter: It’s always a pleasure, Nate. Thank you.

Previous episodes

The post How Cyber Fits Into Big-Picture Risk | Episode 106 appeared first on Waterfall Security Solutions.

]]>
Six Steps to Integrating IT & OT in Mining | Episode 105 https://waterfall-security.com/ot-insights-center/ot-cybersecurity-insights-center/six-steps-to-integrating-it-ot-in-mining/ Wed, 03 May 2023 00:00:00 +0000 https://waterfall-security.com/ot-insights-center/uncategorized/six-steps-to-integrating-it-ot-in-mining/ The post Six Steps to Integrating IT & OT in Mining | Episode 105 appeared first on Waterfall Security Solutions.

]]>
In this episode, Rob Labbé from Mining and Metals ISAC explains how and why risk assessments are needed and conducted for OT.

OT systems are critical to mining safety. Rob Labbé, the chair of the Metals and Mining ISAC joins us to look at six steps to integrating IT & OT networks and security programs in this very sensitive environment.

Listen now or Download for later

Apple Podcasts Google Podcasts Spotify RSS Icon

THE INDUSTRIAL SECURITY PODCAST HOSTED BY ANDREW GINTER AND NATE NELSON AVAILABLE EVERYWHERE YOU LISTEN TO PODCASTS​

About Rob Labbé

Rob Labbé is a proven cybersecurity and business leader with a focus on proactive security policies, processes, and cybersecurity tools that help enable business outcomes. He’s been the founding chair of the Mining and Metals ISAC (Information Sharing and Analysis Centre) for the past 6 years, which started out with just 5 Canadian mining companies, and today boasts 18 global mining and metal companies headquartered in Canada, US, Europe, South America and Australia .

Rob has specialized in the development of integrated IT/OT security programs, with a demonstrated ability in supporting and enabling digital transformation through effective security integration across IT and OT environments, even at a global scale.

Rob Labbé portrait - cybersecurity for mining and metal industry

Securing Metals and Mining OT/IT

Here’s transcript of this episode:

Please note: This transcript was auto-generated and then edited by a person. In the case of any inconsistencies, please refer to the recording as the source.

Nathaniel Nelson
Welcome. Everyone to the industrial security podcast. My name is Nate Nelson I’m here as usual with Andrew Ginter the vice president of industrial security at waterfall security solutions who will introduce the subject and guest of our show today Andrew, how’s it going?

Andrew Ginter
I’m very well, thank you Nate. Our guest today is Rob Labbé. He is the chair of the Mining and Metals ISAC. Which is an “information sharing and analysis center” and he’s going to be looking at industrial cybersecurity from an IT perspective. He’s going to be taking us through 6 steps to integrating IT and OT you know networks. People processes. Everything!

Nathaniel Nelson
All right! Then without further ado let’s get into your interview. So Rob covered a lot there but there is 1 point he made relatively early on that I’ve been sort of it’s but it’s been mully in my head in the time since. Rob was mentioning how automation is taking over for a lot of jobs in mining that would have otherwise historically been done by people now Andrew you and I talk a lot on the show about safety. What I would wonder then is if the jobs. Are being taken by machines and there are fewer people at these sites you know mining is if anything compared to industrial security sites around the world. Some of the most safety critical industries. You know it takes a lot more to be. You know and I’m starting to write but let me try this again. All right? So Rob covered a lot there but there’s been something that he said early on that’s been sitting with me which is he was talking about automation and how automation is increasingly taking over for jobs in mining that were historically done by people. Andrew you and I talk a lot about safety on this show. How does the industrial security calculus change though if in 1 of the most safety critical safety risk industries in industrial security namely mining. Ah.

Nathaniel Nelson
You start to find fewer people at these actual sites. Maybe then safety becomes a lower rung of the toning pole because all the jobs be done in machines.

Andrew Ginter
Ah, that’s ah, a very good observation and you know the the short answer is yes, um, you know, ah to me. It’s a very good thing that jobs that were you know, historically putting human lives at risk if there was any kind of malfunction are being. Automated to the point where robots are taking the risk not people anymore. Um, but ah, you know all of this increased automation is is sort of being coupled with remote operation and remote means you know you’re communicating through the internet you’ve got. Software that’s protecting. You not hardware. It’s um, you know you’re increasing the risk the cyber risk rather you’re you’re increasing the opportunities for attack by operating everything remotely and by automating everything but you’re taking the safety ah consequence off the table. And in a sense this makes the the cybersecurity calculus easier. Um, you still have very consequential potential outcomes but they tend to be dollar outcomes. You know large dollar outcomes rather than human life outcomes. And so that does it does simplify your your your cybersecurity equations. Um, it’s still a very big deal though we’re talking multibillion dollar investments in one of these big mines and you know, ah. so you’re still talking about potentially you know, very serious consequences dollar-wise and in a sense reliability-wise um, if if you if you compromise these systems. Um, but in a sense. It’s easier to to design security mechanisms for very large dollar losses. Ah, than it is to design them for you know large you know human casualty losses if you have a whole crew. That’s you know, underground and is at risk because you’ve messed with you know, somebody is messed with the ventilation. Um, that’s very very bad whereas. Ah you know. You know a couple of these 700 ton trucks colliding and you know suffering massive damages with no human operators inside of them is very bad. It’s not very very bad. So it it it does help in in my understanding.

Nathaniel Nelson
Now Sure now when you talk about it being easier to design security around money problems rather than human life problems. Are you talking about just the sheer severity of the consequences like the risks evolved like it’s so much more important to protect human lives and so it’s easier to just be talking about machines or is it that the nature of protecting against financial losses against machines. Is characteristically different like the kinds of security you might otherwise be investing in talking about putting in place is easier to implement.

Andrew Ginter
So Nate let me you know, let me chime in here and and just sort of remind ah you and our our listeners um of something I mentioned in the introduction you heard Rob give his his sort of you know description of his background. He came from the it t space into cybersecurity and mining. Um, which is sort of a bit of an unusual perspective on the space I’m thinking the last hundred episodes of of the the show here most of our guests have been from the engineering side talking about cybersecurity. You know in the first 15 years of the industrial security discipline. It was mostly engineers who were responsible and stayed responsible for ah the yeah, you know the industrial automation for the the physical process and you know we’re sort of. Pulled into the cyber security because they had to it was a new risk. They had to deal with it. This is what engineers do they took advice from the it people what we’re seeing in the last about 5 years is that increasingly enterprise security teams are being told. You’re now responsible for industrial cybersecurity. Go fix that problem. This is something that you did not really see you know in the first ten fifteen years of the discipline we are seeing it very recently and so ah, you know Rob’s perspective here really is he’s giving advice. To it teams to enterprise security teams who are sort of going through the the same lifecycle heated which is here you’re responsible now you figure out how to work with the engineers you figure out how to make this happen because you’re the ones that you know are going to be held accountable if if there’s an incident at the mine. And so ah, you know Rob’s perspective here is a little unusual because he’s giving advice to I t people sort of as an I t person who’s made the transition not as an engineer saying here’s what I wish you would do you know here’s Rob is is telling us what actually worked for him.

Nathaniel Nelson
Pause You know that sounds relatively opposite to how we usually conceive of things at least on this show. We’ve spent plenty of conversation talking about how to convince boards of the necessity of Cybersecurity. How to shake loose’ budget but here he’s saying that. The initiative comes down from the board to this cyber security people.

Andrew Ginter
Um, yes, and that that is unusual but you know I have to wonder is this another example of scale because when we’re dealing with you know the oil and gas majors. They’ve been leaders in cybersecurity from the beginning it was you know it was always understood that they had to do cybersecurity right? when we’re talking smaller operators that are much more cash constrained. You have to worry about shaking shaking budget loose. You know to me. What I see in the industry is more awareness awareness across the board in the board. Yeah sorry you know board level you know feet on the street level and everything in between um, it sounds like you know what? I’ve observed in the rail industry you know rails was sort of head down. For a century safety safety safety safety and it’s like they looked up recently and said oh shoot cyber we have to do that too. You know without without cyber security. We don’t have safety and they’ve embraced cybersec security. There’s standards coming out this progress. It sounds almost like that might be what’s happening in mining but I don’t have enough data points to be to be confident to that. Well I do have one data point in addition to you know Ah the the information Rob’s giving us I was talking recently to a the seeso of ah a large mining operation. And he said Andrew you know, ah the investment in mines is is in a sense cyclic. You have a big massive Upfront capital investment and then once you’ve built the mine at a cost of you know, $2000000000 it starts producing. And then there’s a cash crunch and you need to produce extremely efficiently to be competitive in the marketplace and so you know the the place to put you know the the opportunity the easy opportunity to get cybersecurity or anything into your mind is during the capital phase of the project. Not the you know stretch every dollar operations phase. But um, you know a if there’s a new awareness of risk at the board level. that’s ah that’s a factor and b what he said was look Andrew I’ve been in the industry a long time. He said um. You know after you’ve operated a mine for 101215 years um there comes a point where you say you know we have to modernize because we can become more efficient as a result because we can exploit another part of the of the resource. You know if we modernize and so you know after. After a period of of running you know 1012 years there tends to be another massive capital injection again, it’s an opportunity to get your new automation new security knew everything what he said was and what he’s observed in the industry is that from time to time. There’s sort of episodes where. Almost every mine on the planet looks around and says there’s new stuff out there. We have to do it and there’s over a period of 3 and 4 years there’s ah an episode where most of the mining operations on the planet upgrade and he says in his estimation one of these is coming up. There’s Ai, there’s cloud-based systems. There’s new kinds of automation. There’s new kinds of efficiencies that everybody needs to leverage in their mining operation and so it looks like you know these? ah you know these these stars might be aligning. The boards have become aware of cybersecurity and they want to fix it and. You know if if the gentleman I was talking to is right? There’s an opportunity where almost everybody is going to be investing a large number of hundreds of millions of dollars or more into their mining operations to take advantage of of modern automation and you know we can. We can. Do the automation and this cyber security at once if if this is what’s coming out so that’s in a sense. Good news. We can look forward to.

Nathaniel Nelson
Pause. So Andrew that was your conversation with Rob Lebay do have any final insights to take us out with today.

Andrew Ginter
Sure, um, you know I’ve been learning a lot about the mining industry I’m I’m very grateful that Rob was able to join us. You know he’s the CEO of the mining and metals isac. Um, you know the ISAC. Is ah, comparatively new. They’re you know they’re looking for new members. Um I think there’s like 2 years old um some other information about the isac if if you want to get involved in the metals and mining industry. Um, you know waterfall is getting involved. This is how I know Rob um. Rob didn’t mention it. But um, he is also hosting a podcast for the ISAC. The first episode is up on the ISAC. It’s ah mmisac.org and the the first episode is up there. So I’m going to be listening to to his podcast as well. Um. There’s other opportunities to get involved in cybersecurity at the ice act the one that that I’m thinking of and that I’m looking to to become involved with they have a committee that is working out how to interact with cloud-based ai programs the the concrete example was look every shovel. Of ore that comes out of a mine you know is ah is sort of a quantum of 7 or eight hundred tons of ore every shovel is different and so they take the the shovel of war they dump it on the truck they’re driving it to the primary processing facility and they’re analyzing the stuff you know in the course of filling the shovel and dumping it on the truck and driving it away.

Andrew Ginter
Hello Rob and thank you for joining us on the podcast. Um, before we get started can I ask you to say a few words about yourself for our listeners and you know a few words about the good work that you’re doing at the metals and mining isaac.

Rob Labbé
Sure absolutely I’m Rob Labbe I’ve been working in cybersecurity for just over 20 years the last 10 or so of them focused on securing in the mining industry. In particular operational technology. Um through that process. We started the mining metals ice act to have a place where mining companies can work together and collaborate not just on intelligence but also best practices and processes to secure. Yeah, operational technology in our plants and the autonomous systems in our minds.

Full fledged mining operation

Andrew Ginter
Thanks for that. Um, our topic is cybersecurity in in mining we’re talking about you know 6 steps to to integrate it t and o t in the mining space. Um, before we dive into those details though. Um you know it’s been a long time since we’ve had anyone on the show from the mining industry. And you’re with the metals and mining isac. Um, you know I’m not sure we’ve ever had anyone on from metals I’m not even sure what that is so before we we dive into security can you can you give us sort of a ah big picture of the the physical process. What what is metals what is mining. Um, you know how are these systems automated. What are the kinds of of cybersecurity concerns that you see in in this industry.

Rob Labbé
Sure so metals and mining are 2 separate but very highly integrated and interdependent Industries. So if we think about what this world needs as our economies Change. Um. As we worked forward. Move forward with decarbonization with you know, clean energy. The reality is everything we have on this planet as a building block is either grown like our food or trees or lumber or it’s Mine. So all the material we need to support this transition. Whether it be you know Copper for electric cars or electric infrastructure and yeah cadnium and Lithium for batteries as steel for wind turbines. All of those commodities have to be Mined. It’s the only place we know how to get them. And so when we think about you know the metals industry. That’s the next step in that process as that ore gets ah dug up and mineed from the Earth Then it’s the process of refining that and turning it into. Usable metal. Um, that can be used to build you know your tesla or your your winter turbine or your power Grid. So those those industries as global industries are critical to. To where we need to get to as a society as a planet and so if you think about what a mine might look like um you know at its core to simple process right? It’s you know. Taking big rocks turning them into little rocks and and extracting the metals from those but the process to do that is huge from a scale perspective. Um. When you look at open pit mining which is commonly used in things like you know copper and gold and Zinc and and other other base metal commodities. These are mines you can see from space. They are you know several kilometers you know wide some of them. Several kilometers deep and in those environments you’ve got you know fleets of huge trucks. You know these are trucks with 700 ton capacity of rocks and while traditionally those are maintained and and operated by drivers. We’re at the point now where very quickly those are being passed over to autonomous systems and so those vehicles are becoming autonomous you know combined with the processes required to you know, drill into the earth blast the rock away. So. Again, traditionally done by people but because we have a need to derisk and make that safer and more efficient. Those tasks are being turned over very rapidly to autonomous automated systems and then at the mine. You know other mines are underground and these are mines that have shafts running in a lot of cases dozens of kilometers ah deep into the earth and then we’ve got you know automated systems again for drilling blasting and hauling rock. But then we also have systems that are providing. Ventilation and fume control for the people that are working there. You know a lot of battery electric vehicles underground and so we have charging infrastructure and electric infrastructure and so you know. Scale and the complexity ah of these Well they vary widely from mine to mine in all cases. There’s a significant safety like safety risk component. Um, there’s a significant environmental component to ensure that we protect the environment. Um. While we do this and have left us in a position to reclaim that area of Earth Once we’re done and return that to nature when the work is finished so huge challenges from a safety. From obviously from a production from an environmental sustainability perspective that goes into into mining and then you know on the plant side. We have all the control Systems. You would see in any other manufacturing electric Environment. So. You know data systems process control systems you know Plcs Motor control units all of these systems are there and then you remember these mines are typically not located in Urban centers. Yeah you don’t put a mine in the middle of a major city. They’re located in remote areas. Ah, difficult to get to areas which requires a lot of remote control and remote access to enable remote support in those places so you take that complexity and you layer on an industry that’s rapidly changing. It’s an industry that’s that’s discovered the power of machine learning um and artificial intelligence to optimize and make their their minds and their operation safer more sustainable um and to and to allow the mining of the posits that. Might not have been economically feasible using you know older methods and so we’ve got these control systems and operational technology systems. You know based on old technology in a lot of cases that was you know designed several years ago or decades ago. And we’re layering on top of that modern remote control modern cloud-based Ai and machine learning on top of systems that were never architected or designed for that. You know we’re taking. Autonomy and looking for ways to automate equipment that maybe wasn’t originally architected for that and so about the challenge is how do we do that in a safe way. Um, you know how do we protect an environment that’s rapidly commoditizing and you know specialist dual concentric. Bring modbuts and you know specialist dedicated systems have been replaced by pc infrastructure windows networks and you know commodity Cisco switches on top of that. Yeah legacy ot environment. So. There’s a lot of challenges in the space to ensure we can keep production. But more importantly to make sure that the teams at that site. Go home to the families every day to be sure that the environment’s protected so that um. Those areas are there for us to us and nature to to use and live in and enjoy for the next hundred years. So you know it’s a challenging space makes an exciting space as we go forward.

Andrew Ginter
Wow, there’s you know there’s a lot going on there. Um, it’s a complicated space and you know we’re going to be speaking to ah you know 6 steps to integrating it and ot you know, security wise and otherwise in the the metals and mining industry. Um, you know. It integration is a phrase that was coined in like I think 2005 or so by an analyst at at the gartner group. Um, you know it can mean integrating networks you know, connecting up networks. It can mean integrating technology stacks. It can mean integrating teams and business processes. Um, and you know like I said this this kind of all started almost twenty years ago so what does IT/OT integration mean to you and to the industry and you know sort of what’s the state of that process 20 years after it was kind of invented.

mining trucks

Rob Labbé
Yeah I think when I look at mining and I think the same is true in a lot of the resource based operational technology spaces when I look at mining even ten years ago or even looking five years ago at most miners you had your corporate network corporate systems and they looked very much like any other business and then you had at each mine an individual operational technology system. Unique technology, unique design unique architecture that is really driven by that site at that plant that general manager owning that um, really in a lot of ways each mine each operation being its own pstone and. What we’ve seen over the last ten years rapidly accelerating I would say over the last five years is as technology changes and we have the ability to commoditize a lot of those systems that can drive down cost which is. Wonderful. But it also opens up the door now to you know using things like ai machine learning to um, help optimize and so what it means to us is you know. A site now instead of some one of authentication solution. It’s active directory instead of um, unique linux distributions driving of the brains behind the ah process control system its windows instead of. Specialized, expensive unique industrial switches it Cisco. You know we wrote were really seeing um the techniques used in it pushed down to those sites. Um. And that’s starting to enable initially centralized management. You know does every site need an active directory expert. Well they’re hard enough to find you know in a city. They’re really hard to find in the middle of nowhere. Why can’t the corporate services provide that so we started to see shared services. Um. And then with that we start to see a major change of risks. So I think itot integration from a security perspective is really enabling that commoditization enabling the use of things like cloud while adjusting to the unique risk and operational requirements of a. Operational technology and a safety sensitive operational technology setting.

Andrew Ginter
So I mean that makes sense you know business wise every every you know industrial automation operation wants to improve their efficiencies. They want to use you know, cheaper stuff. They want to use the standard stuff to reduce training costs but you know the more that. In in my books you know the more that ot systems look like it systems the more you can attack the ot systems the same way you attack the it t systems and you know there’s thousands of ransomware incidents on on itt systems every year we really can’t afford that on the ot side I mean what? What do you do about. I don’t know risk ah sort of the yes you’re migrating technology are you not also migrating risk.

Rob Labbé
You are and you’re introducing new risk to the operations that maybe you know somebody who’s been mining for 20 years hasn’t had to think of before at the site level. You know we’re seeing now if you open up the news. You know, almost. Quarterly if not more often. A mining company needing to shut down operations because of a ranoware incident or the ransoware incident shutting it down for them now there was one in Canada in January for example, yeah, another one in Germany in March and so. Those are becoming exceedly common the other challenge riskwise is with the geopolitical situation globally which you know I’ve got no idea where it’s going, but it’s certainly not going to get less complicated mining. And the metals industries sit at the beginning of every single global supply chain. So if we have adversaries that want to disrupt those supply chains targeting mining in the metals industry is a great opportunity to interfere with. International and you know national international supply chains at a macro level very efficiently because yeah, that’s the commonplace they’ll start and so the risk is going up and the accessibility of those systems is going up and so because of that. We’re starting to see over the last say three years the corporate it teams being asked to step into operational technology to you know start to manage that risk to secure those systems ah to secure those plants and and quite frankly will last. Few years. Um, those teams are struggling and so one of the focuses for me and for the isac is to come up with a plan or a a cycle or or a model that those teams can apply and use in order to secure. Operational technology from the IT site and secure that shared technology.

Andrew Ginter
Okay, and and our you know our topic is 6 steps to integrating IT and o t this was an article you wrote recently sort of summarizing your your experience in the field these 6 steps that you’re talking about what what are they. So really, it starts with with people so the the steps the first stages really are learning and building relationships with the people at site and establishing trust the next the next 2 really a bit understanding the technology. It’s understanding the assets and understanding the risks and quantifying those risks and then the last you know then we get into deploying tools and ttps. Ah and and practices to that environment. Ah, followed by testing and validation of what you’ve done to be sure. It’s actually working and so you know those are those are those 6 steps going through. Okay, and and it all starts with relationships. Can you can you take us little deeper What does building relationships mean?

Rob Labbé
So when I started pushing or deploying you know into operational technology. You know a lot of teams make the mistake of thinking that the technology is the same windows is windows, active directories active directory. Ah, Cisco switch is a cissco switch but really in the environment of mining where safety is such an issue reliance on such an issue. The environment that you run in is very different. It’s not a. It’s a not a. It’s an environment where availability and safety are king not data integrity. For example, you need to take the time to learn the process to learn the business to build relationships with people to understand what’s going on really that phase is really about. Learning that operation how it works and earning your right to be at the table up site building the relationships you know from the top down. So if you’re sitting in in a senior mind managers meeting with the gm you at least understand all the words that are being spoken at that meeting that is. It’s no longer a foreign language to you? um that you know all the people that are there. You understand what their priorities are what keeps them up at night. So that’s that’s that first step and really I consider that earning your rate to be at the table. So once you’ve earned that right to be at that table it then becomes an exercise of establishing trust in your abilities and your teams abilities. Not just to secure stuff but to deliver on and protect what’s important to that site and it’s in these conversations that a lot of security people make mistakes. Um, because they’ll go in and say well here’s the right way to do something the right way to patch is to patch monthly. You know right? after patch Tuesday we have to get rid of these legacy you know windows ninety eight or Windows Xp systems that are kicking around. We just. Have to do that. But the reality is in operational technology. The right thing to do is not often correct is not always correct and so by building trust and understanding. How. You can adjust security posture to what you learned in the previous step. Um, you start to be seen as somebody who has the priorities of that operation as your priorities not trying to just push security best practices from the it side.

Andrew Ginter
Okay, so that’s the first 2 steps build relationships establish. Trust they both those those both sound like sort of people ah focused tasks you know goals. Um, you know you’ve got 4 more steps are we you know it. Do you dive into the technology next what comes next.

Rob Labbé
So You know once you’ve taken the time to build relationships and trust the next piece is to really understand the technology landscape and the assets of that site and it’s a bit different the IP world where I t. And we often have a cmdb that lifts all of our systems or we can run a scanner to discover all the systems on the network and get us started on that process. Those don’t work in the operation. And we can’t use scanners to find things. Oftentimes assets are listed in maintenance systems or on spreadsheets or sharepoint lists um very nontraditional places So you’ve got a process to go through to not only find and identify all the assets. But also have a great sense of what they do what part of the plant. Do they drive are they safely sensitive what happens as we go down is something were to Fail. What’s the worst thing that could happen does the protest stop or somebody get hurt. And that process will take a while as you dig through. Not only the maintenance systems but a lot of things you’re not going to find um you may have to walk the site and identify and find assets yourself. You might have to use passive Network monitoring pulling data off. Ah, switches and logs off switches to to find assets and and you know you might have to do a search on showdown and find those assets that might not even be connected to your network but might have sell sim cards in them and actually be connected to the the public cell network in those environments. And so you can’t underestimate the challenge of identifying those assets and then the fall. Is actually quantifying the risk so that’s where we take the assets. We found the process. We’ve learned the um, the actual threat information in mining that would come from the mining and metals isac or other sources and actually begin to use a model to. Quantify that risk and communicate it. Um I once spent two months shadowing a mind general manager and everybody going into his office told him how the sky was falling and the world was going to come to an end. They become. Very quickly immune to the the chicken little sky is falling 5 so you have to take the risks quantify them into dollars with some model whether you use a fair or some other quantification model and and actually quantify the risks so you can prioritize. Um, where you’ve got to focus and where you’ve got to work based on actual risk numbers.

Andrew Ginter
So quantifying risk I mean that’s in my understanding that’s hard to do I mean. Can you talk more about that. Are we talking sort of ah a qualitative thing where where you know it’s low likelihood high likelihood or you know are we talking you know dollars and cents if you’ve had. 2 mines in you know one one mine per quarter shut down from ransomware. Do you calculate the the dollar impact of that and say well you know assume it’s going to continue at 1 per quarter. How do you get numbers.

Rob Labbé
You know you have to get it to a number the the high medium low that sort of um I call it the word math risk quantification where you take you know medium likelihood. Ah times high impact and you get purple. Um, really doesn’t resonate in the operational technology space because so many people are saying the world’s going to come to an end and guess what it rarely if ever does instead though there are. Ah, number of risk quantification models your goal should be to get it down to dollars or get it down to production hours or get it down to tons lost per year. Get it down to a quantifiable number that can be supported. The model I typically use is known as fair ffa ah fair and it takes in to account the the likelihood the probability the business um impact if it if it happens and we’ve you know using. You know, measurable and you know quantified estimates and you know uses a Monte Carlo stimulation essentially to get you to a risk range. You know a range of what what this is like you know what? So. What’s a likelihood of. Um, it costs you money and how much and you know that resonates well and that works. Well once people understand the model. You don’t have to use fair. There’s you know a half dozen of other equally good models but find and identify and pick one that works for you and resonates with your business.

Rob Labbé
Okay, so you know quantifying risk though that that you know I can see how you do that with let’s call it low frequency sorry high frequency or or you know medium frequency like you know, ransomware once a quarter. Um. In a mine somewhere on the planet you can run the numbers you know medium impact or medium frequency low impact or low medium impact. You know if you lose six days production on the on the site. What is that? That’s 2% of your annual production. That’s it’s um, it’s a medium impact event but you know. Can you quantify Um, low frequency high impact events I mean what’s the the how often does someone hack into the you know your your 700 ton trucks automated truck systems and cause trucks to collide and you know cause you massive. Physical damages to your your physical infrastructure. You know that’s in my estimation. My understanding never happened but it’s not impossible. How do you How do you can you can you quantify very low frequency high impact events as well. And yeah, yeah, to my best of my knowledge. That’s never happened either and I’m looking for some widt to knock ons that don’t jinx it but you actually can quantify that that risk and you have to and the model you choose has to accommodate that. So for a low frequency event you might end up with an analysis result that says something like there’s a 6 to 10 percent chance of that happening this year so it most likely will not. However, if it does it will cost you between. 20 and $30000000 all in by the time you think of the equipment equipment-damage loss production you know and so on all the other costs that get related to that then you have something you can go back and discuss with the business and say are you cool with rolling the dikes at. You know six to 10 percent chance of it costing 20 to $30000000 that’s a business discussion and a risk tolerance discussion. Um, the answer could be yeah, we’re cool with that and you know you can say peace on that risk we’re good or we can say. Ah, don’t like that so much and you can start to discuss technical and not or you know cybersecurity and physical control that could either make it a 2 to 3% chance of it happening or maybe should it occur the impact being instead of 20 to 30000000 5 to 10000000 because you know you’ve got a spare truck sitting around or you’ve got a bigger stockpile of finished commodity so you can absorb the production loss. There’s all kinds of ways we can work with the business to mitigate the risk once we identify it. It doesn’t have to be a cyber control but you have to have that risk measured to even start the discussion.

Rob Labbé
So The next thing you’re going to do is for the risks and the controls you identified you identified as things you’re going to help mitigate with cybersecurity Controls Anyways, then you’re going to start to look At. Sending your security measures from it T in Co O T Space. So you’re going to do things again, starting from looking at your existing run bookss and playbooks for things like into response. How do you have to update and modify them to work. Well. At site in the operational technology space. What about the security technology that you have your endpoint solutions. Your network solutions can they be safely extended into operational technology or do you have to procure something different. You know I’m a big fan of finding an endpoint edr or I guess the buzz word these days is Xdr solution that can work across ItenOt. You may be able to do that. You maybe not, you’re going to look at things like extending logging coverage and log aggregation there. So then you’re going to start to look at extending your base controls so that you can start to mitigate that risk and you start training the team.

Rob Labbé
And that brings us to the last step of the cycle which is testing and validation. Once you’ve worked through that process and you’ve done the testing you’ve done the validation. And that might look like some automated control testing but it should also involve an incident tabletop at that site to be sure that your plans are working that it integrates with the site emergency procedures and processes that all those things are working. You know before you can kind of hang the mission accomplished banner and you know have the team barbecue make sure you take that time to test and validate that what you’re doing is actually working and that well that’s the last step um in my article identified. it’s a it’s a cycle and it’s a cycle because as the site introduces new technology same machine learning or or Ai as the people change a new general manager gets appointed a new plant manager comes in. You have to start back with the building relationships. Building trust all of that cycle starts again. So it’s not a one and done. Um, you don’t get to finish and go oh that was really hard I’m glad I got that done I don’t have to worry but this again as soon as you get there, you’re you’re starting it again.Um, with the new people. The new technology at site.

Rob Labbé
Here we go. So Rob you know, reflecting on what you’ve said in the episode here. Um, we’ve had a lot of people on sort of from the engineering perspective in various industries talking about. Um you know the engineering teams initially. Not really being aware of cybersecurity and becoming aware of it and eventually engaging with the the I t teams and sort of the ot perspective on itot integration the ot perspective on sort of driving cybersecurity into ot over time. Your perspective seems to be the other way around. You know you started personally in the it space this this whole discussion has been it. T you know is or at least you know enterprise security is coming in and and saying you know guys we have to do something and you know here’s advice to to the security team to you know. Make that process more effective. Um is this sort of unique to your experience or is this sort of a more common experience in the mining industry who’s leading. Ah you know the the the charge in terms of of driving security into ot is this something that. Tends to be happening from the the it side in mining more so than not so it’s interesting in mining. Um you know because if you look at a mining company. This is an organization that will derive ninety plus percent of their revenue and carry most of. And all of their safety and environmental risk at site yet traditionally up until I would say you know five years ago at most mining companies. The cyber security information security function only impacted I t. The the whole ot stuff was at a scope and it’s yeah, it’s almost comical to think that you know you’re in charge of securing an organization get you’re out of scoping 90% of the revenue generation and 100% of the safety environmental risk. Yeah when you think about it. It. Seems a little bit nonsensical because it is and so now as we’re starting to see increased attacks affecting mining companies. You’re starting to get boardrooms and senior leadership teams going to their. You know those type of secure leads or seesos and going hey are we good and the answer they’ve been getting back is a very unsatisfactory version of I don’t know which is typically followed by some kind of a direction to go figure it out and where this process comes from is now we have these it security teams being chartered um instructed. However, you want to call it to go take care of this ot thing so we don’t become like you know, picked your company in the news and they’ve been getting a lot of pushback. And a lot of struggle. Um, overall and so really, what this process is designed to do is to overcome that that pushback and struggle that they’re they’re they’re hitting at site with the engineering teams with the mind management team who who may be at the Beginning. Don’t see didn’t see cybersecurity as critical to their world. Um, you know I’ve got bigger issues. Life would be great if you know the biggest issue I got down to was you know cybersecur I’ve got you know Union contracts and communities and I’ve got. You know, supply chain issues on fuel and parts and all kinds of issues in my world that are yeah potentially going to get in the way of my success. You know they didn’t see Cyber. They don’t see at that site level always cyber security being. 1 of those things that can get in their way of their success until it is and so the drive is coming from the boardrooms um in a lot of cases and the senior management teams ah to the csos and to the security teams to you know, go figure this thing out.

Andrew Ginter
This has been great Rob thank you for joining us. Um, before we let you go can you sum up for us what what should we take away. What’s the you know the number 1 thing to to remember about this this advice you’re giving us.

Rob Labbé
I think it’s really 2 things. The first is for me, especially as a security professional the opportunity to reach into the operational technology. When I started working on it ten years ago is a golden opportunity you you shouldn’t pass up. It’s a chance to really work on securing what matters to really work on helping protect people’s safety helping protect the environment helping protect sustainability. It’s a great opportunity when you get. You know, asked to go do that. But unlike a lot of other areas. You can only move as fast as you build trust operational technology security is the most human of all security disciplines. Because you’re affecting. People’s safety. You’re affecting. People’s sense of well-being unlike almost any other discipline of security you have to go slow take the time to build the trust. Um, those first 2 steps could take a year sometimes more um, take the time that those need to get right? and you’ll get it done safely. You’ll get it done efficiently and you’ll end up with something that gives you a secure sustainable reliable operation going forward. If this is something you’veur been charged with I do encourage you to to pop onto to the mining and metal ISAC website or Linkedin um, you can you can find the article that ah that we referenced here today and you know I spent an hour in a webcast. Digging into this a bit deeper so you’re you’re more than welcome to download and watch that webcast as well.

Andrew Ginter
And they’re sending all the analysis into the cloud and the cloud-based ai figures out how to process that shovel of ore optimally and sends the optimal processing instructions back into the mine so in a real sense. You’ve got cloud-based ai controlling part of the mining process part of the the primary processing system. And you know how do you do that safely. This is something I’m keenly interested in and you know I hope to to be working with the committee. So if you’re interested in the the metals and mining isaac again, it’s comparatively new. It’s a couple of years old and there’s opportunities to get involved to learn more about it on on rob’s podcast and to contribute to the process. And isacdotorg.

Nathaniel Nelson
All right? Well with that thanks to Rob Lebbe for speaking with you Andrew and Andrew as always thanks for speaking with me this has been the industrial security podcast from waterfall. Thanks to everybody out there listening.

Andrew Ginter
It’s always a pleasure. Thank you Nate.

Previous episodes

The post Six Steps to Integrating IT & OT in Mining | Episode 105 appeared first on Waterfall Security Solutions.

]]>
Jesus Molina on the Biden National Cyber Strategy https://waterfall-security.com/ot-insights-center/ot-cybersecurity-insights-center/jesus-molina-on-the-biden-national-cyber-strategy/ Wed, 26 Apr 2023 00:00:00 +0000 https://waterfall-security.com/ot-insights-center/uncategorized/jesus-molina-on-the-biden-national-cyber-strategy/ The post Jesus Molina on the Biden National Cyber Strategy appeared first on Waterfall Security Solutions.

]]>
The Biden-Harris administration recently released their National Cybersecurity Strategy. This week, Jesus Molina of Waterfall Security joined ICS Pulse Podcast to discuss the new policy’s impact on critical infrastructure and the increasing physical consequences of cyberattacks.

Speakers: Jesus Molina, Director of Industrial IoT, Waterfall Security

Listen now or Download for later

Jesus Molina shares some fascinating aspects about industrial security, including a regaling story of that one time he hacked into his hotel room’s control system with a simple python script that enabled him to adjust the room’s lights, curtains, and HVAC, and he then expanded that capability so that it could do that for any room in the hotel, as well as the exterior and interior common areas.

The podcast then delves into some of the important details from the Biden-Harris administration’s recently released National Cybersecurity Strategy including some of the important details within the new strategy.

The post Jesus Molina on the Biden National Cyber Strategy appeared first on Waterfall Security Solutions.

]]>
Cybersecurity Risk Assessment using IEC 62443 | Episode 104 https://waterfall-security.com/ot-insights-center/transportation/cybersecurity-risk-assessment-using-iec-62443/ Sun, 23 Apr 2023 00:00:00 +0000 https://waterfall-security.com/ot-insights-center/uncategorized/cybersecurity-risk-assessment-using-iec-62443/ The post Cybersecurity Risk Assessment using IEC 62443 | Episode 104 appeared first on Waterfall Security Solutions.

]]>
The post Cybersecurity Risk Assessment using IEC 62443 | Episode 104 appeared first on Waterfall Security Solutions.

]]>
Shining a Light into the Dark for Oil and Gas Cybersecurity | Episode #103 https://waterfall-security.com/ot-insights-center/oil-gas/shining-a-light-into-the-dark-for-oil-and-gas-cybersecurity-episode-103/ Fri, 31 Mar 2023 00:00:00 +0000 https://waterfall-security.com/ot-insights-center/uncategorized/shining-a-light-into-the-dark-for-oil-and-gas-cybersecurity-episode-103/ Getting an industrial site started on the cybersecurity road can be hard. In the Oil & Gas industry, while many major players have deployed cutting edge solutions, and were instrumental in driving cyber standards and methodologies, many small to medium-sized companies are only just starting out on their cybersecurity journey. Matt Malone of Yokogawa joins us to look at strategies to shake loose funding, trigger conditions that can jump-start investments, common stumbling blocks and how to address them.

The post Shining a Light into the Dark for Oil and Gas Cybersecurity | Episode #103 appeared first on Waterfall Security Solutions.

]]>
Getting an industrial site started on the cybersecurity road can be hard. In the Oil & Gas industry, while many major players have deployed cutting edge solutions, and were instrumental in driving cyber standards and methodologies, many small to medium-sized companies are only just starting out on their cybersecurity journey. Matt Malone of Yokogawa joins us to look at strategies to shake loose funding, trigger conditions that can jump-start investments, common stumbling blocks and how to address them.

Listen now or Download for later




SUBSCRIBE


YouTube Podcasts




Google


spotify


rss icon


WF_Podcast_Logo_V4.3-01

THE INDUSTRIAL SECURITY PODCAST HOSTED BY ANDREW GINTER AND NATE NELSON AVAILABLE EVERYWHERE YOU LISTEN TO PODCASTS​

Go To The Podcast Channel ️

About Matt Malone

Matt Malone has been an industrial cybersecurity consultant with Yokogawa for about 3 years, has a master’s degree in IT project management, and a GICSP certification under his belt. At Yokogawa, Matt’s focus is on industrial cybersecurity, where his primary role is providing consultation and completing upgrade and maintenance projects that reduce the overall cyberattack threat level for his clients’ control systems. He is also a US Navy Veteran, a proven team leader, and a lifelong learner eager to pass on the knowledge he’s gained in cybersecurity and other areas he has become an expert in.


oil and gas cybersecurity podcast 103 Matt Malone of Yokogawa
Matt Malone, GICSP, of Yokogawa

The Oil and Gas Cybersecurity Disparity

The topic for this episode is the oil and gas industry, and specifically oil and gas cybersecurity for small to medium-sized facilities. That a lot of such facilities have very little cybersecurity now is a surprise to some industrial cybersecurity veterans, who might recall that API 1164 was the first industrial cybersecurity standard published, even preceding the first ISA SP99 / IEC 62443 publication. Matt explains that while most major oil and gas companies are on the bleeding edge of cybersecurity adoption and deployed technology, most small to medium sized companies have a fledgling program, if there even is one at all. However, he adds that “I think we’ve passed the point of inflection in the industry though where it’s a realized concern now. We’re starting to put people and budgets towards this issue.”

Shake Loose Oil and Gas Cybersecurity Funding

During this podcast episode, Matt talks about the challenge at small to medium sized oil and gas companies to find the financial means to begin and fund initial deployment of, upgrades to, and maintenance for cybersecurity systems. He talks about the financial challenge in the industry to even begin with the basics: the initial cybersecurity assessment. In the upstream and downstream oil and gas industries, budgets can be very tight, and contract delivery prices can be decided one to two years beforehand. Everybody wants to improve their cyber-defenses, but you need to find funding for oil and gas cybersecurity first. Matt explains that one of the ways to do so is to borrow from the strategy used by health, safety, and environment (HS&E) professionals. It costs less to implement safety programs that reduce HS&E risks and costs, than to pay for HS&E incidents and penalties after they occur. Bluntly, it’s in a company’s best long-term interests to find funding for cybersecurity:

It’s a moral decision to protect your folks, but also a very good financial decision. The same can be said for industrial cybersecurity… that a financial argument has been made that protecting our sites is going to be in our long-term interests.

Matt Malone


oil and gas cybersecurity podcast 103 Matt Malone GICSP Oil battery pipeline
Pipeline oil batteries are representative of small to mid-size sites in need of OT cybersecurity

Impediments to Progress

Besides shaking the money loose, Matt reports that the term “air gap” is a real impediment to progress at a lot of sites.’ Rather than “air gap,” Matt would very much prefer another term. “Let’s use a term like [network] segmentation,” Matt says. “Air-gapping provides a false sense of security and is thrown around a little too liberally without people understanding the true situation.” The reality, Matt says, is that modern control systems exist in an almost universally-connected world, especially in the era of IIoT (Industrial Internet of Things).

During this discussion on air gapped networks, our co-host Andrew Ginter provides a personal anecdote about presenting Waterfall’s Unidirectional Gateways as a cybersecurity solution to a large power generating company, only to be rebuffed with the argument that the solution was not needed, because the generating station’s networks were “air-gapped.” A year and a half later, Andrew reports that was called back in to the customer by a regional partner, to give the presentation again. The partner explained that a subsequent security assessment showed that the fleet of generating stations was not air-gapped after all. In the episode, Matt agrees with Andrew:

I wish that term [‘air-gapped’] would just fall out of use. It’s a warm fuzzy blanket that is a lie to hide behind, and at the end of the day you’re just putting off the inevitable.

Matt Malone

Summing Up

We can’t give it all away! In this episode, Matt lays out concrete steps as to how to secure funding and launch a cybersecurity program if a small or medium sized company does not know where to begin. Matt also lays out team-building strategies with an eye to effecting positive change from the inside out, and also digs into new angles to qualitative vs. quantitative risk assessments, again with an eye to shaking loose funding.

So please tune in to this podcast for the conversation with Matt Malone and to learn more about starting cybersecurity programs for small and mid-sized Oil and Gas companies.

The post Shining a Light into the Dark for Oil and Gas Cybersecurity | Episode #103 appeared first on Waterfall Security Solutions.

]]>
Stakeholder-Specific Vulnerability Categorization (SSVC) | Episode #102 https://waterfall-security.com/ot-insights-center/ot-cybersecurity-insights-center/stakeholder-specific-vulnerability-categorization-ssvc-episode-102/ Tue, 28 Mar 2023 00:00:00 +0000 https://waterfall-security.com/ot-insights-center/uncategorized/stakeholder-specific-vulnerability-categorization-ssvc-episode-102/ The post Stakeholder-Specific Vulnerability Categorization (SSVC) | Episode #102 appeared first on Waterfall Security Solutions.

]]>
Stakeholder-specific vulnerability categorization (SSVC) is a new alternative to the common vulnerability scoring system (CVSS) used in the Common Vulnerabilities and Exposures (CVE) database. In this episode, Thomas Schmidt of the German BSI (Federal Bureau for Information Security) joins us to look at the new system, how it works, and where to use it. In the end, SSVC looks to be a new kind of tool, like CSAF and Malcolm that the BSI and others are promoting to help critical infrastructure and other industrial organizations enhance their cybersecurity.

Listen now or Download for later

Apple Podcasts Google Podcasts Spotify RSS Icon

https://youtu.be/n5tVYjGxFj0

THE INDUSTRIAL SECURITY PODCAST HOSTED BY ANDREW GINTER AND NATE NELSON AVAILABLE EVERYWHERE YOU LISTEN TO PODCASTS​

The episode starts by comparing the new SSVC with the older and more widely-used CVSS:

  • CVSS assigns a score to a vulnerability between 0 and 10, with 10 being the most severe kind of vulnerability. CVSS assigns a score based on the characteristics of the vulnerability.
  • SSVC helps individual businesses or sites decide on what action to take on a specific vulnerability, based both on characteristics of the vulnerability and information as to how the vulnerability might affect the business or site.
Thomas Schmidt of the German BSI SSVC Podcast Episode 102
Thomas Schmidt of BSI

SSVC Actions and Criteria

Thomas explains that with SSVC, actions that a site might take if a vulnerability applies to software deployed at the site can include:

  • Track – take no action right now, but keep watching the vulnerability,
  • Attend – alert decision-makers in the organization to the vulnerability and look hard at doing something about the vulnerability (patching or compensating measures) sooner than “normal”.
  • Act – the vulnerability requires immediate attention, analysis and probably remediation.

Unlike CVSS that looks primarily at a vulnerability, the kind of software a vulnerability is found in, how exposed to attack that software tends to be etc., when prioritizing a vulnerability, SSVC looks at specifics of the vulnerability within the context of a specific organization. This means that the SSVC rating for a vulnerability is generally not re-usable across organizations or sites – it tends to be unique to an organization. SSVC looks at criteria including:

  • Exploitation – is there a public POC or evidence that the vulnerability is being exploited in the wild (this criterion can change value over the lifetime of the vulnerability)?
  • Technical Impact – does the vulnerability give the attacker total control over the target software, or something less than that?
  • Automatable – are steps 1-4 of the kill chain (reconnaissance, weaponization, delivery and exploitation) automatable?
  • Mission Impact – if the vulnerable software at your site is compromised, what is the impact on your mission? Eg: in a rail system, is the entire system shut down, or just one section of track? In a refinery, is the entire site shut down, or only one production unit?
  • Public Impact – if you are serving the public, how serious is the impact on the public (eg: minimal / material / irreversible)

SSVC also tracks mitigation status – how difficult it is to do anything about the vulnerability, but this is only for information purposes. How difficult it is to do anything does not determine how serious the vulnerability is for a site, or how urgently the site needs to take action, or not.

Stakeholder-Specific Vulnerability Categorization (SSVC) Calculator
Geographic Strategy for Cybersecurity Workforce Gaps

Using SSVC

All of these factors are connected into a decision tree. An example calculator that can help you walk through the decision tree is available at:

https://democert.org/ssvc

The calculator asks questions about the vulnerability’s exploitability, technical impact, public impact and so on above and results in a recommendation of what to do about the vulnerability: track, attend or act.

On the surface, this looks like a lot of work. There are thousands of vulnerabilities announced every year. CVSS priorities for the vulnerabilities are published in the CVE and other vulnerability databases, but SSVC action recommendations cannot be published – every recommendation is unique to a business or site. Every business or site has to go through the SSVC decision process themselves for every vulnerability to decide what to do about the vulnerability.

In a sense, however, this is no surprise. Most sites have to do this anyways to one degree or another – they cannot use the CVSS ranking exclusively to decide what to patch, because circumstances of each site and each vulnerability differ. This is what SSVC recognizes. SSVC in standardizes the decision process – in part hoping that various vendors will be able to produce asset inventory and other tools that take in information about a specific business or site, and can then use that information as well as CVE and CVSS information to produce an automatic recommendation for each site about what that site should do for each of the tens of thousands of vulnerabilities the site has to look at every year.

So tune in for the conversation with Thomas Schmidt and learn more about the SSVC system.

Previous episodes

The post Stakeholder-Specific Vulnerability Categorization (SSVC) | Episode #102 appeared first on Waterfall Security Solutions.

]]>