OT Insights CenterRisk in Context: When to Patch, When to Let It Ride | Episode 109

Risk in Context: When to Patch, When to Let It Ride | Episode 109

Waterfall team

July 9, 2023

In this episode, Rick Kaun, the VP of Solutions at Verve Industrial Protection, takes us deep into the world of patching OT systems, how it drastically differs from IT patching, and what the process looks like in action.

We also have a look at how to quantify risk in OT, as that’s part of the process of deciding what to patch first, and how to prioritize a large workload of patching that is heavily backlogged.

LISTEN NOW OR DOWNLOAD FOR LATER

About Rick Kaun

Rick Kaun is the VP of Solutions at Verve Industrial Protection, which is an ICS security provider for both software and services. For over 20 years, Rick has travelled the globe, working with clients of all sizes across many industries in his roles at Matrikon, Honeywell and now Verve.

Risk in Context – When to Patch, When to Let It Ride

Transcript of this podcast episode:

Please note: This transcript was auto-generated and then edited by a person. In the case of any inconsistencies, please refer to the recording as the source.

Nate Nelson:
Welcome everyone to the Industrial Security Podcast.
My name is Nate Nelson. I’m here as usual to Andrew Ginter, the Vice President of Industrial Security at Waterfall Security Solutions.
He’s gonna introduce the subject and guest of our show. today,
Andrew, how are you?

Andrew Ginter:
I’m very well, thank you, Nate. Our guest today is Rick Kaun.
He is the Vice President of Solutions at Verve Industrial Protection and we’re going to be talking about context for risk we’re going to be talking about. We’re going to be talking about automation to help us figure out when something is worth patching, when a vulnerability is worth patching and, you know, when we should probably let it ride.

Nate Nelson:
Alright, then, without further ado, here is the interview:

Andrew Ginter:
Hello Rick and thank you for joining us. Ah before we get started. Could you say a few words about yourself and your background and about the good work that you’re doing at Verve Industrial.

Rick Kaun:
Yes, thank you Andrew I appreciate the opportunity to share with you today and and discuss it’s always a great time chatting with you ah, as plugged in as you are. It’s always a great discussion. Um, my name is Rick Kaun I’m the VP solutions for Verve Industrial Protection. We’re a vendor neutral multi-function shop that provides both software and services exclusively to the OT space. Um, we’ve been around for 30 years actually privately funded self-grown organically grown I personally have been in the OT business for 22 years um I started at a company called Matricon ah building the ICS or industrial control security consulting space. We then got picked up by a company called Honeywell where I was a global business development for them for a few years before coming back to a vendor neutral perspective where I am now. And again appreciate the opportunity today.

Andrew Ginter:
Thanks for that. Um, our topic today is risk in context. Can you tell us you know what is that and and you may maybe give us an example or 2?

Rick Kaun:
So yeah, this is the the gist of what I want to talk about today. We can we lots of lots to unpack here. Um, the problem that I’m seeing more and more organization not just me that we as a company and in ah our industry clients and their peers are all starting to realize is that um. Ah, the traditional IT approach where you have very cool tools individual specialty tools and specialty people is interesting but it’s a singular point of perspective. So, I can give you a list of vulnerabilities and you’d think that great I now know what my risk is but the reality in OT is it’s not always directly applicable. That’s a single external indicator. It doesn’t include anything to do with whether that’s a big deal for you. The owner operator or on this particular asset or in a particular facility. Um, and so what we’re seeing is that people are needing more and more context and this is especially true on a couple of fronts. Um number one. We don’t have enough staff to address all the vulnerabilities I mean we turned on our vulnerability mapping at a midstream gas company the other day and literally at their flagship site was 23,000 vulnerabilities. Um, and you’re never going to get that to 0 and you’re not sure even where you should probably start unless you have some sort of context. So a singular view isn’t good enough and it’s actually something we’ve been struggling with for many years I remember about ten years ago going into ah an air separation unit and the plant manager said you’re going to do an assessment and you’re going to tell me that I don’t change passwords and they don’t patch how the heck does that help me.

Rick Kaun:
Um, and so on the front of being able to take those scarce resources and focus them on the things we need to i.e the lack of staff the lack of time. Um the ability to look at what truly is effective for OT which isn’t always everything because everything can’t be Windows 11 and patched every Tuesday but also the nature of OT in that. We can’t do plan a i.e. the patching necessarily we need to do plan b and c and d or compensating controls or as some interestingly creative marketing people have started to call virtual patching. If we can’t apply Bluekeep. Can we at least disable remote desktop or the guest account? Well I don’t know the answer to that unless I know what that system does and what function it provides operations. So the gist of this is any singular indicator of risk I e an external penetration. You know denial of service attempt. Or a list of vulnerabilities or a patching tool in and of itself is not enough to provide you know our audience the detail that they need to be able to act appropriately and effectively and efficiently.

Nate Nelson:
Andrew, the way that he’s talking there, it reminds me vaguely of an episode that we did some time ago, with a guest who was talking about like site-specific vulnerability that that you can’t categorize or that you shouldn’t rather categorize vulnerabilities in a broader sense that he was looking into specific sites in specific industries and evaluating how they affected those specific plants.
It was sort of a different alternative to the usual CVE kind of way of doing things.
Can you remind?
What the name of the guest was there?

Andrew Ginter:
Yeah, that that was Thomas Schmidt from the the BSI, which is the German Office for Information Security and he was talking about SSVC.
So Stakeholder Specific Vulnerability Categorization, this was a, a, a new standard this was six months ago, a new standard to make decisions about whether to apply patches for vulnerabilities and you know Verve, Verve Industrial Protection is one of the vendors in this space.
They have automation for SSVC and Rick is going to say more, you know much more about that in just a moment.

Andrew Ginter:
So I mean, I agree with all that in principle. Um you know, but it’s sort of what’s the right word. These are great principles. But if I have you know, ah 30 sites. Each of them has you know 600 PLCs and who knows what other kinds of, you know, industrial internet and you know everything has a CPU nowadays. Um, you know it’s great to say I can make risk-based decisions like that. It’s another thing to say: Do I actually understand all of my tens of thousands of devices and which of them I should manage? How? How can you you know is there is there is there a way to get insight is a way to get automation instead of saying you know evaluate every one of my 50,000 assets myself?

Rick Kaun:
Yeah, no, great question and you know you’re right like when I talk about the risk indexer and an individual measure as not being enough I mean you look at you look at the NVD score and it’s got a ranking and it’s got WHY it got to that ranking there’s components in there but you’ve raised the other point is how do I take that and scale it? Both in volume, but also there are as you know nuances within those sixty sites or 30 sites in the FLCs some of them do critical safety things some of them do other things and so as soon as you start to unpack this a bit. You realize that you need a great level of detail from multiple perspectives. Um, so let me let me just sort of walk through how we help clients to build that perspective and what we call a 3 three-dimensional view of the asset. Um, and so the first and foremost is going to the source. Absolutely has to be an inventory remember our last podcast, we talked about building a proper inventory and it never meant now more than ever. It’s it’s the most important thing we’re seeing that people are starting to realize is um, you need to go directly to the asset. The first source of data in building this bigger picture view to be able to provide context. Beyond a single measure is to pull in multiple measures of the asset and so the first source is the asset itself can we figure out what it is? Windows? Linux? Unix? Networking gear? PLCs relays controllers?

Rick Kaun:
But the manufacturer is whatever we can glean from that endpoint the more data we can get from that the more equipped we are to pivot to different scenarios and situations in ascertaining and actually measuring risk for our organization. Um, that’s the first bit of data. Second bit of data once you’ve built that information and by the way we advocate for a direct approach going right onto the OS and giving right directly to the PLC. He’s not going to intermediary databases or relying on what I’m hearing through traffic I want to go to the source I want to ask and get answered the questions I need to know. Hope that I hear them or hope that I’m listening to all of the devices, so we get all that data. We automate it into a central database at that central database is where we invest to gather and collect and aggregate all the other data sources that are available to these operators the very first one is really quite simple. Take that database once we built it and we work with operations to say let’s put operational impact or user-defined tags on these assets. So at Triconics, a refinery, clearly a safety system. That’s a high impact asset. It gets a high impact label because when we go to look at risk. Whether that’s assessing if it exists and is important to us or whether we want to turn to remediation tasks like patching or compensating controls. We can then decide which ones we test on by either redundant to safety one or not the not the safety ones but the redundant not so impactful ones or which ones we need to prioritize in terms of budgets and resources etc.

Rick Kaun:
So we start with the inventory, automatically from the endpoint. We then add user defined tags and information about it. We can calculate or infer these from the vendor name or from the subnet they’re in or whatever but we usually like to make sure we loop the plant people in that tribal knowledge that those veterans have on the plant floor is invaluable very powerful. Um, so two different data sources starting to build multiple dimensions on the asset and then the third thing we’ll do is we’ll go and get third -party indicators and this is where we get the national vulnerability database. We’ll bring the list of vulnerabilities in and we can map them 1 to 1 at the database offline from the actual operational environment. So we’re very frequently and very thoroughly mapping vulnerabilities against that very detailed inventory. We can also bring in threat feeds and known exploits. So we’re going beyond by the way you’ve got a vulnerability and there’s exploits out there for it right? That may increase or decrease your urgency. Um. And then the third thing we bring in of course is the first line of defense the patches that may be available for those devices so we start with the endpoint data first, the tribal knowledge around the importance or impact or a role of those devices second, third-party indicators of risk in patch threat feeds and vulnerability data. And then the fourth thing we’ll do in that middle layer database that we are compiling all these different dimensions of the asset about we will go east and west and connect to um, basic building blocks for security tools like the status of your backup or your antivirus and in multi-vendor environments the ability to abstract connecting into semantic versus McAfee versus you know defender to see if your antivirus is up to date or your white-listings in lockdown to aggregate that into a central view is where the real magic comes now how we do this to your point in the scenario of 30 sites is we then cascade. Any single site or database to a single reporting dashboard the reporting dashboard is a single It’s a singular view into the entire fleet and it’s a read-only view. So what we can do then is we have all these different data sources consistently from all of our facilities and across all of our assets. We’ve got multiple dimensions of potential um risk but not just the risk but the assets impact in compensating controls that may mitigate risk or or narrow down the concern and we can see this with a small, specialized team fleet wide this allows for an organization. To build a standard response, a standard threshold, if you will, for various risks and activities that they need to act on um and they can then start to remediate with this same technology where it’s OT safe. Um and we can do this in stages. But let’s. Let’s circle back there on that one. This is this is how we want to build the context in the first place and we want to aggregate it across multiple sites and now give all the power to a ah very specialized team and let them start to see what they’re looking at.

Andrew Ginter:
All right? So that’s I mean that’s a lot of data. You know it’s useful to have that database. It’s useful to capture tribal knowledge but still if we have thousands of assets and you know even more data about each of the assets in a database that’s just. More information for you know my poor my poor brain to you know, try and suck in and and make informed risk decisions is you know is there is there a way to to work with the data once you have it?

Rick Kaun:
Yeah, yeah, yeah.
Yeah, no, that’s an absolutely spot on observation I mean talked about 23000 vulnerabilities and now you added 3 other dimensions to those 23000 so what So we do with it? Great question um the most powerful thing that we’re doing and you can actually do multiple things with it but the most powerful thing you can do with it. Is to calculate an actual what we call a residual risk score. So if you were to take those different indicators and we do this with the clients and it’s automatically done in the software to then report on the dashboard. Um, but the premise behind it is we build indicators of risk and we build scores for things we don’t like. And it’s just like my golf game. It gets really big and huge and ugly real quick. Um, so for example, if ah device is considered a high impact. It gets the score of 10 if it’s considered a low-impact it gets a 5 tribal knowledge now has a score um, if it has a critical vulnerability. It gets 7 points for every critical vulnerability. It may have multiples. Um, if the backup has failed right? So all these different data sources can be pulled into a calculation and a score appended to them and then the true what we call residual risk after we’ve accounted for compensating controls like the backup is good and whitelisting is in lockdown so the score comes off. Um we can build an actual risk score that is. OT or end user specific and then those scores can be put into thresholds which then drive governance around behavior so something that’s considered a critical risk. Maybe as an org we decide has to be dealt with within 24 hours whereas something with a low risk can be calculated to do something within a week or a month.

And because it’s automatically updating the data and calculating you are now always looking at a live score so you extract from the noise. The stuff that is a 5-alarm fire. You have to deal with it.

Nate Nelson:
Andrew, does everybody have their own scoring system?
I mean, we’ve been talking about SSVC their CVE.
Now there’s this.
I wonder whether all these different systems get confusing after a certain.

Andrew Ginter:
I think the the short answer is yes, the the long answer is in a sense, This is why in my understanding SSVC was invented.
It’s because the vendors in the space do all they you know they they did all have their own system and. you know the users. the government especially said you know here are some minimum criteria that you should be using for these risk.
You may be able to get sort of finer grained decisions by doing.
You know more analysis of the of the risk in the context.
But you know, here’s a minimum.
And so they put SSVC together.
And of course, you know, why would the vendors do not just SSVC and be done with it?
It’s, I mean. I worked, I’ve worked for vendors most of my career.
One of the ways vendors distinguish each other from each other is the feature set in their product.
So you know every vendor out there wants to say yes.
Of course we do SSVC and we give you even more power because we do XY and Z in addition.
So yeah, it’s not surprising that everyone does it differently.
This is part of why SSVC exists.

Andrew Ginter:
So So just to clarify. Um these sort of characteristics you know did the backup work. You know what’s the what’s the impact of are these in a sense preprogrammed that that users then populate or are they sort of user defined can the Users Define the calculation or sorry the yeah, the characteristics and you know if the characteristics I used to defined I would assume the calculation has to be user defined How you know how? what’s the right word “customized” is this to every site?

Rick Kaun:
Yeah, okay, fair, Um, the calculation itself is something that we ship with a basic sort of structure if an organization wants to change how it’s calculated or ah emphasize or deemphasize certain indicators one way or the other they can the precursor to that question though was the user defined properties. User refined properties are usually set once I E. It’s a high impact asset to operations or it’s ah not inconsequential asset operations. Um, and those don’t typically change what does change and therefore we automate is the presence of vulnerabilities or the version of the software or the the level of the patching. So where we can automate technical gathering and technical data we do where we need user to find. We usually establish that upfront it doesn’t typically change I mean it’s triconics is IT triconics. It’s safety a safety it doesn’t suddenly stop being something. Um and so we have a combination of sort of set up the automation set up the tags and then. The actual how you want to calculate is absolutely customizable as are the labels upfront I mean if if we think it’s a high impact asset the client disagrees. It’s their.. It’s their database. It’s their threshold they can they can customize it.

Andrew Ginter:
So So that makes sense I mean you know you’ve got Automation. You’ve got a calculation you can see in your words where the 5-alarm fires are um, what do you do about those fires? Are we talking you know, patch everything? Are we talking throw some more firewalls rules in there? And you know. If you make those changes. Um, what change. Do you see in your system? How does can you give us an example how this works?

Rick Kaun:
That yeah, absolutely and that’s the natural follow 1 and as you said in the intro you can’t always patch. You need to have be able to be creative. So this is where we sort of double down on the value of that multidimensional view. So I’ve got a risk my risk score is automatically calculated I’ve got a bunch of Really you know, sort of heavy hitters that look like they’re high risk and I need to act upon them. Um, and so now what I do is I go into those individual assets and I look at the components that have added to what the problem is um, it’s a vulnerability, there’s a patch available for it. It’s not been applied well, that’s your first. That’s your first path we want to look at that patch. Can we deploy that patch? Again with our architecture. Um, we can then do a very safe ot stage sort of approach through the first pass of potentially patching. We will patch. You know redundant systems, domain controllers, file servers, and we’ll bake in a certain version like 2012 make sure it works and then potentially pass that on to the HMIs and engineering station so we can do a very staged. Methodical OT approach if we can’t do the patch though we can then look at our systems and say well. What is the risk of this and 1 example we had with a client was blue keep came out and they couldn’t patch for Bluekeep on some other systems but they didn’t want to just live with a risk so the next thing they did was they said well, what’s wrong with blue keep. Um, and so what they did then was that well it mostly attacks, you know the guest account remote desktop. Well their 24 7 staffed HMIs probably didn’t need remote desktop enabled and they certainly didn’t need the guest account. So then we used the technology to start to disable remote desktop and the guest account again in a very staged approach.

Now if you build it properly in the 30 site scenario that that you shared with us. You can do this again from a central team that queues this up tees it up for the endpoint and can involve the people at the plants to be able to go ahead. Um and be involved in so it’s not a big brother push but it creates very very precise or or precision in deciding what to do and where to do it and going beyond patching to compensating controls.

Um, and there’s 2 things in there that we may want to dig deeper into, Andrew, let me know if you which way you want to go. There is you know more examples of things you can do beyond patching but there’s also the efficiencies gained by the way that this is architected.

Nate Nelson:
Andrew, perhaps this is my naivete, but coming from more of an IT background, I’m not quite sure why a lot of the nuance in this discussion ends up factoring into patching generally.
I’ve always thought about patching as something that you should do as soon as possible whenever the option is available, but here we’re talking about criticality and severity, and all of these factors that seem to suggest that you.
Wouldn’t otherwise patch in every given scenario.
So why is that?

Andrew Ginter:
I think it’s easiest to answer with some examples.
On your average, you know, IT network, you know, let’s let’s leave the the the the service leave, the specialized IT networks out of it.
The the banking systems and the even the SAP servers, the ones that are are handling payroll. Let’s leave sort of that critical systems out.
You’re you’re you’re just you’re desktop network. You got a bunch of desktop machines.
They are reaching out to the Internet every 5 minutes to fetch e-mail, to send e-mail, to browse the Internet, to download stuff.
They’re really very exposed.
And in a sense, they’re all the same.
They all have the same level of exposure.
Yeah, you use the severity and if it’s a severity 9, you patch it automatically and there’s.
There’s not that much decision making going on on your that that kind of IT network kind of everyone’s most familiar with, but.
So on the on the back end here, safety systems are the ones that prevent unsafe conditions from killing people.
You know, over pressure conditions on boilers from blowing up and killing people over temperature conditions from, you know, causing explosions or fires.
And so let’s say there’s a a vulnerability that lets somebody crash the machine.
You send the message the magic, send the “magic” message to the machine and the machine crashes and reboots and it down for you know, 30 seconds or 5 minutes depends how long it takes to reboot.
How important is that on the main part of the control system, it’s fairly important you’re probably going to take the plant down.
It’s it’s a reliability threat.
How important is that on your safety system?
Well, if your safety system is down for 5 minutes, human life is at risk for those 5 minutes, and so the safety system is more critical than the rest of the plant systems.
Arguably you, you know, even if a, you know, cause the machine to reboot, vulnerability is not as bad as you know, execute arbitrary code and do nasty stuff.
It’s still on the safety systems.
It’s they’re that critical that you should be patching these lower, lower priority vulnerabilities on the safety systems.
On the other hand, if your safety systems and you know many of them are air gapped, you cannot send them a message from any other machine. Well then that vulnerability really isn’t exposed on them, is it?
And so you might say yes, normally I would patch it, but because, you know, these safety systems versus those over there that are exposed these ones are air gapped I don’t need to patch these ones as urgently as I do those ones over there that are on the network…

Andrew Ginter:
…So this is an example of where context drives the decision of a given vulnerability that you you might patch in some circumstances and not in others.

Andrew Ginter:
Well let’s touch on both of them.

But um I guess my my question is let’s say I don’t know I put a firewall rule in or I yeah I don’t know I install whitelisting or something. Um I take some other measure. Um.

What do I see in the system? How does how does the system does it notice that I’ve done this? and recalculate how you know how how does this is what it’s great that the system tells me hey look I’m in trouble but does it come back and say you’re not in trouble anymore. Good job or you know how how does that work?

Rick Kaun:
Sure, um, it’s a great question so you’re right. The basic premise is in that scenario I’m going to look at something and my score and you’re right I overlook some of the more simple sort of approach. Potentially the risk score says that I have a high risk and when I dig in. There’s a patch needed. Maybe I can do a compensated control. Or maybe the risk is because the backup or part of the risk core is because whitelisting isn’t currently in lockdown or not installed on that device or the backup failed last night or you know we got this porch or service that really just need to disable and I can’t do anything with it so to your point let’s install a firewall or inline network access control type of thing or we can go rerun the backup or we can enable and install whitelisting and to your question: Because of the nature of the way we collect the data. we’re continually connecting to the endpoints we’re continually remapping vulnerabilities and we’re continually connecting to antivirus and backup databases you are correct once I make those changes rerun the backup install the white listing configure the registry that will then show up in my dashboard and I will see a reduction in risk I will see less devices in that higher or critical category. I will see a trend over time that I had a whole bunch of 5 alarm fires and then I’m down to a couple or none and you can actually see the improvement and that’s exactly what the dashboard is for is to give you that mere real-time view into the current status which includes improvements as you do these things and recalculations. But it also includes obstacles or challenges as new vulnerabilities or threats are included because that’s also injected. So it’s a continually moving and evolving line. It’s just like looking at ah at a trend line in ah in an operational facility when you’re making power or or oil. You’re doing better and you see the productivity go up and the tank fill up and something goes Wrong. You see it drop down the exact same sort of scenario.

Andrew Ginter:
So that makes Sense. You know I was thinking about an example, you gave a little while ago, you know the the remote desktop example and the vulnerability that you know let’s say in remote desktop. Um I’m wondering is your is your system clever enough to say. Hey the vulnerabilities in remote desktop and look they disabled the remote desktop on these devices therefore it it automatically reduces the risk -is that how it works?

Rick Kaun:
Yeah, to an extent I mean in in the in the scenario you just gave. There’s a couple of different indicators that would reduce the risk score and those which are automatically gathered like the whitelisting status is now in locked down or the backup is now a success as opposed to a fail that’s automatically gathered automatically updated, no problem. When you get to the more creative OT needs around compensated controls like disabled remote desktop. We would once we had done that confirm that it was complete on. However, many assets we did it for and then we would set a flag in the software to then? um, not show those systems as still being vulnerable to that particular vulnerability because. The first passive mapping to vulnerabilities and as a patch there would say no but we know we’ve applied compensating controls. So when I show the resulting dashboard to an OT practitioner. They see their true risk. Not again, a false positive in that here’s all these vulnerabilities because we know we’ve got those those extra controls and again, that’s just another reinforcement for why you’d want to have multiple indicators not just the vulnerability, but vulnerability plus how is it configured well now we’ve got a different answer. We got the true answer.

Andrew Ginter:
And you know. I Wanted to come back you you you said there was another topic I think it was scaling. Um, you know how you use these numbers?

Rick Kaun:
It? Yeah no, it’s this is the really powerful part right? So the architecture that we we instill. Um and we touch upon a little bit earlier but the architecture that we instill means that you can bring in multiple facilities and all of your assets into a single view which means you can have 1 small specialized team look at the fleet on behalf of the whole organization. You don’t have individual sites having to dig in and be security experts and figure out what to do about this patch or that risk um and then with that technology you can then start to queue up activities and I wanted to circle back to this especially because OT always gets really scared and you think some central it guy is going to do something I get it. Like said I’ve been doing this for 20 years I’ve I’ve seen a wrench in ah in a firewall in the field. Um through through the physical hardware through the chassis. So let me just walk you through through a quick case study. Um, we have a generation client that wanted to update a particular piece of software. In fact, they wanted to uninstall the software they were afraid of. Its origins and the risks that were associated with it and I’m not going to name anybody but we all know that there’s been a number of companies that have fallen on the ah fail foul side of of many people’s favor. Um, and so there they are on a Saturday afternoon. They’ve got sixth generation coal fire generation very complex environments. Um and they’re needing to remove this piece of software and they are not sure how they’re going to do it. There’s a deadline! So what they do is they can go to the dashboard and in the software and they can see there’s one hundred and forty six copies of software spread across these 6 heads. Um, the central team is able to then initiate a request to uninstall the software and I say request because.

Rick Kaun:
While I’ve said earlier we can make things like registry changes. Automatically, we don’t always want to be messing in the ot space but to be very respectful and usually uninstall with software requires a reboot so we send a request to 146 systems to unsell the software. We send a detailed list to each site of exactly which physical room and wreck this. Devices in and when they get to the console they will see a flashing light saying would you like to accept this action? They are then able to phone the operator in the control room. They are then able to follow their own local change process. They’re able to uninstall the software bring it back on online and move to the next one and this particular client. Anyone who’s listening this knows that this sort of activity removing 150 copies of software from 6 locations um would take probably weeks or even maybe a couple of months of a lapse time. Not not not expended time but just hunting and searching and finding the time this activity was completed with a support person of 1 at the central and 1 individual at each of the 6 sites in and around their day job. It was completed in 90 minutes, and because of the update mechanism we could see that in the software dashboard updating as they went around the flight. So this allows we have a client with 700 facilities and a team of 8 managing worldwide around the clock. So this is the future to me of where Verve, sorry where OT security is going and how Verve is helping their clients to do it is. There’s not enough people and we need to start doing things and so if we can combine context ability to act and OT safe.

We start to really move the needle.

Andrew Ginter:
So that’s you know and that’s powerful automation is is good automation on the security Side. You know I’ve been thinking about this interview though and I know there are standards bodies. There are governments out there. There’s a lot of debate going on saying what are the right metrics. How do you measure risk in in the OT world? Um, you know I see people out there saying well you know, um, be careful how you measure risk. Ah you know if if your risk just keeps going up your your board is going to be asking you. Why? Why they’re spending money on security if risk keeps going up if ah, you know so people are saying use use measures like how many of the top 20 controls have I implemented when you get to 100%. You’re done again. The the question becomes you know why am I spending money on this? The whole question of how do you measure risk seems. Hard yet. You have a dashboard you are it. It seems to me you are measuring Risk. Can you talk about about metrics and the value of metrics and and you know how your how your your stakeholders your customers respond to being able to see. Sort of a number that moves around in in a predictable way reflecting risk.

Rick Kaun:
It yeah and that’s one of the things we haven’t talked about yet is is the value of this insight it. It. It goes from if you remember the example you really the plant manager who told me you’re do an assessment right? I already know I don’t patch and they don’t change passwords how does that help me. That’s the answer to this is once you have this data and you have the context you can have empirical discussions about trends and you’re right? You know the risk doesn’t always go up the risk does go down. Um, and so you can measure risk with what we primarily focus on today which is pure risk in terms of vulnerabilities on assets and the impact of that asset. But once you go up a level you can start to measure well I need to act in this behavior at this facility. But guess what this flagship facility over here that either makes us more money or is more potentially catastrophic if it goes wrong gets a different level of scrutiny and of support or funding. Um, and it’s all empirically driven now and you you mentioned some of the the regulatory standards. Um outside of the pure risk score we are getting more and more calls and more and more more and more engagements to be helping clients to to track and monitor compliance and are we doing enough. Are we showing the auditors are we showing the board are we showing senior management. Are we using the money wisely and are we making trends go in the right direction um things like the CIS controls we have a dashboard that shows here’s your hardware inventory here’s your software here’s your configuration here’s your users and you can look at it as a glance and show the CIO “look. We’re doing the right things” or look. We’re going the wrong way. We’d need more help.

Rick Kaun:
Um, and the most recent one they were quite proud of is actually that the federal government announced a CDM continuous diagnostics and monitoring that all federal entities shall be reporting. Um, we’ve actually been approved as the OT response for that. So We’re not only doing this internally within an organization or. With an ah with an organization towards helping them comply. But now we’re also starting to share with external and federal sort of entities to really consume this data. It’s really starting to catch on.

Andrew Ginter:
So Nate, let me let me dive into this just a little bit.
Visualizing risk is, you know, a measuring risk is is a is a big debate in the industry.
I gave you know some examples in in my question, but I don’t know if I heard my question made sense.
Now that I listened to it again, Rick obviously got it, but let me go through it once more.
Let’s say you know there, there are people out there saying, you know, risks should reflect, you know, what’s what’s happening in the world.
Well, there’s always new vulnerabilities.
Thousands and thousands of vulnerabilities being reported every every year if you count them, your risk indicators going up, your chart keeps going up.
Qualitatively, we get we get word that you know the sophistication of the adversary is going up.
They’re using more and more powerful tools.
They’re using more and more powerful techniques.
Again, the risk line keeps going up if you show the board a risk line that goes up steadily?
They ask the question why are we spending money on you?
You don’t seem to be having any effect.
So you can’t show them that risk line.
You know I’ve had, I’ve had people tell me what you should do is use something like the top 20 controls and say my goal is to get the top 20 controls implemented on every machine in my network.
So I see a risk, you know, I see a mitigation line going up, but eventually, you know, saying I’m getting better and better.
Eventually I’m going to hit my 20.
And I flat line. And again, the board goes you, the strength of your security posture seems to have flatlined.
Why are we spending money on this again? Have you not solved this problem?
Can we start spending less money on this?
And of course, you know you can’t.
That’s that’s the wrong measure as well.
But what Ricks has described here, you know their dashboard.
Imagine what the dashboard looks like.
New vulnerabilities are discovered in the world, and we do the risk calculation.
And some of them are relevant to our most critical systems and our calculation of risk goes up.
You see the trend line going up.
And the OT security team springs into action.
You know, presses the button, figures out which machines most urgently need to be patched, patches those machines or applies compensating measures to protect those machines and the risk line goes down saying good, good job you’ve you’ve reduced your risk. And then of course, there’s more vulnerabilities over time.
The risk risk is going up and down.
You can see progress.
You can see that the money you’re spending is, you know, in the absence of spending money, the risk would go up, unboundedly, and you’re spending the money.
And so it keeps coming down.
You can see, you know, that something good is happening.
This is a step up from, you know, waving your hands at the board.
It’s still a qualitative metric.
I mean, in the engineering space, you might be used to calculating like safety risk.
Mathematically, you’ve got a, you know, one in a million chance of a death at the facility, you know in the next year.
You don’t have that kind of mathematical precision, but at least you’ve got something and it’s something visual and it’s something that in a in a real sense makes sense.
So I, you know, I think this is a a step in the right direction.

Andrew Ginter:
Cool I mean that that is cool I mean you know risk is a dry topic. Um, a lot of people you know? Ah I I speak at conferences if you know I’ve proposed. Ah you know many presentations with “risk” in the title most of them don’t get accepted if I get accepted nobody shows up.

Rick Kaun:
Yep.

Andrew Ginter:
Because it’s you know people see it as as academic as as not actionable, but you know metrics people are interested in you know and concrete advice as to where you know the the highest value investment next is that’s huge. So um, you know, Thank you for for. For joining Us. You know these are to me These are are very positive developments in the in the field of the technology. Um, before we let you go ah you know did you want to sum up for us is there. Are there words of wisdom you have for us?

Rick Kaun:
Ah, things that I think are important whether they’re wise or not led the audience judge. But um, you know I’ve been in this business as I mentioned the top of the the call for 22 years and I’ve seen a lot of attempts at. You know the silver bullet and and you know was whitelisting in in one nine and its its other promises since then and and even some notable public speakers saying well we can’t patch why bother Let’s just give up and and and hunker down but the reality is that this can happen. We just need to be prepared to roll our sleeves up and get at it. Um. Why we’ve been avoiding addressing the the technical in security debt. We continue to amass and compile over adding more technologies and IOT and then being surprised at how much risk there is is it’s a bit baffling to me. Um, you’ve been on the circuit is long or longer than I have Andrew you know that some of the things we say we’ve been saying for 20 years and so people nod their head and write it down as it’s sage advice right? so. Um, I think that we’re seeing a turn that people are um, embracing the fact that yeah I really do need to get into it I really do need the data because I need to make informed decisions. There’s only so many people so many dollars and there’s so much risk I need to be I need to be creative I need to be precise and I need to be effective and so I’m quite excited at what we’re seeing. Um, and I would I would heartily recommend that you dig into how you actually get to those endpoints because that’s where the data resides. It’s where the risk resides and that’s where the solution ultimately resides as well.

Rick Kaun:
And if you do want to dig deeper. You know we have always taken these Case Studies, User Testimonials, Public presentations of of how some of these things are and from from the end users not from us our resource page. We just recently published a couple of new white papers on exactly these types of topics. But also again, some of the use cases that actual frontline industry peers of yours speaking to the audience here. Um that are doing this and realizing the benefits we have some that are accelerating 5 year programs down into you know, two and a half to 3 years and I think there’s some real valuable insights from people frontline you know not to talking heads like me and Andrew. Um, and you really probably should dig into that and you can also probably sign up for a webinar too while you’re there because we do those every month as well. So I hope this helps and please do dig into the educational content we have up there and if anybody’s curious. Please feel free to reach out.

Nate Nelson:
Andrew, that was your interview with Rick Kaun.
Do you have any final thoughts to take us out with today?

Andrew Ginter:
I guess so. You know, I, I was really encouraged by the episode.
I mean, you know, this is automation.
This is automation for security.
You know it’s, it’s a truism that our enemies attacks are getting more sophisticated because they’re automating their attacks because our enemies are using more and more sophisticated automation.
Here is automation for our defenses.
Here is a way to, this kind of automation can can, take a lot of time off of the analysis part of our defenses off of the the implementation part of our defenses.
You know, automating the defenses, sounds extremely useful.
This sounds like a very effective kind of automation.
So, I see this as good news as as a very positive development.

Nate Nelson:
OK.
Well then thanks to Rick for enlightening us today and thank you, Andrew.
For speaking with me.

Andrew Ginter
It’s always a pleasure.
Thank you, Nate.

Nate Nelson:
This has been the Industrial Security Podcast from Waterfall.
Thanks to everybody out there listening.

Stay up to date

Subscribe to our blog and receive insights straight to your inbox

Risk in Context: When to Patch, When to Let It Ride | Episode 109

Waterfall team

LISTEN NOW OR DOWNLOAD FOR LATER

About Rick Kaun

Risk in Context – When to Patch, When to Let It Ride

Transcript of this podcast episode:

Share

Trending posts

Lessons Learned From Incident Response – Episode 139

What is OT Cybersecurity?

How Industrial Cybersecurity Works in 2025

Stay up to date

Tags