06 Apr 2020 1800 sites: air gaps, Windows XP and evolving due diligence: Phil Neray | Episode #32
Phil Neray, VP Industrial Security of CyberX reviews findings, remediations and C-level responses for security assessments at 1800 industrial sites.
READ HERE THE FULL TRANSCRIPT:
Nate: Welcome everybody to the Industrial Security podcast. My name is Nate Nelson. I’m sitting as usual with Andrew Ginter, the Vice President of Industrial Security at Waterfall Security Solutions. Andrew is going to introduce the subject and the guest of today’s show. Andrew, how are you?
Andrew: I’m very well thank you, Nate. It’s always a pleasure to join you. Our guest today is Phil Neray, the Vice President Industrial Security at Cyber X. Phil’s topic, he’s going to be talking about the latest report out of Cyber X, their 2020 data-driven analysis of hidden vulnerabilities observed in 1800 production industrial networks. So, that’s a long… that’s a mouthful, but he’s talking about findings in 1800. network. So, you know, that’ll be interesting.
We’re going to be talking about the latest report out of Cyber X. Before we get into the report, before we get into your findings, can you talk about where did this data come from? How do you know all this stuff?
Phil: So, Cyber X was formed in 2013, specifically to address OT cyber security or ICS cyber security. And we have a purpose-built platform that’s passive, connects to a spend port, collects network traffic, and analyzes it using our patented analytics. And as part of what we do when we deploy our platform in customer environments, in addition to identifying assets, we also look at vulnerabilities that we’ve identified in the network, and also vulnerabilities that we identify in those assets based on the network traffic.
So, the data that we use to compile this report is anonymized from many different vulnerability and risk assessments that we’ve done in sites around the world in diverse verticals. And so, we took all that data from a vulnerability assessment reports, the reports typically have an overall security score that we assigned to the environment, and then detailed information about vulnerabilities we found, we took all that data, anonymized it, put it into a database, and analyzed it to come up with the statistics that we show in the report.
Andrew: Okay, and just to clarify, you use the words ‘assessment’, and you talked about passive monitoring as well. Is the data that you’ve got here coming from passive network monitoring exclusively? Or is it coming from a larger pool of risk assessment activity where you might touch hosts and look at installed software as well?
Phil: No, we don’t… we don’t, in these, assessments do anything active or anything that requires touching the assets themselves. We do offer that as an optional capability in our platform, but we don’t use any of that in our vulnerability… in our vulnerability assessments.
Nate: Andrew, I believe I’ve asked this of you in a previous episode, but could you remind me and our listeners, what does he mean by passive?
Andrew: Passive means the technology is watching messages that are being exchanged on a network, but is not putting any messages on the network itself. This is usually done through what’s called a physical tap, which looks at messages on a wire, or more commonly, a mirror ports on a managed switch. And the mirror port can be configured to send a copy of some or all of the messages in inside the switch, straight out the port for analysis.
And so, when he… when he… you know, the beauty of this, the reason that the industrial folks like passive is because, if there’s no new messages on the network, they, you know, have increased confidence that… that nothing has changed. And, you know, a lot of these environments are, I don’t know what the word is, the best word is, you know, fragile, sensitive, let’s call it, and they don’t want any change to their sensitive environment. And, you know, the value prop here is passive monitoring looks but does not touch. And so, you should not fear that your environment is changed by this security measure that you’ve added.
Nate: That makes sense, Andrew, what’s on the other end of this passive monitoring? What’s doing the monitoring?
Andrew: Well, what’s doing the monitoring is generally a piece of software. So, it typically runs on a dedicated CPU, you know, a server of some sort. You know, sometimes they call it a sensor, it’s a device that is pulling the packets in. And, you know, the kind of analysis that goes on, I’m not familiar with the Cyber X gear, but generally speaking, there’s 3 or 4 kinds of analysis that go on. And, you know, these vendors compete with each other and are also, you know, always inventing new kinds.
There’s signature-based, which is, if I have a pattern, I’ve got, you know, thousands or hundreds of thousands of signatures describing attack patterns, if I see this packet, and then that packet that matches the signature, I raise an alert, I’ve got an attack. There is anomaly-based, which does a lot of statistics, it does a lot of learning. Depending on the system, you know, it might report traffic volume anomalies. Suddenly, there’s much more traffic between these 2 hosts than they used to be, or it might report host-connectivity anomalies. “This host just connected to 17 other hosts and it never connected to Anybody before it was always receiving connections before,” which, you know, again is a suspicious condition.
Sometimes we see reports on kinds of messages. So, you know, an example from an episode a little while ago was a device reset, saying, “Stop and start a PLC.” That’s not something that happens in normal operation. It’s an interesting thing, it might be suspicious, raise an alert. So, there’s many kinds of network monitoring possible on these… on these sensors. And, you know, the difference here is that passive monitoring looks at the traffic and does not send anything, whereas active monitoring (which we’re not talking about) puts, you know, questions on the network and gets answers from devices. And, you know, in that sense, is changing the network traffic and is, you know, alarming in that sense to… to some kinds of industrial operators because they… they don’t… they don’t like changing anything.
Nate: I’m curious how far this technology goes. Tell me if this is science fiction, or if this exists. Say a malicious actor steals credentials from an employee at a plant, can a passive monitoring system, just by virtue of not may… it may be just where they’re located, where they’re connecting from, but also the behavior of that user on the network, are passive monitoring systems advanced enough to actually detect anomalous behavior? Even if they’re not doing anything crazy like connecting to 17 different hosts, can a passive monitoring system learn the behavior of users and pick up on anomalous behavior, even if it’s very subtle?
Andrew: I think the answer is yes. I don’t know which systems do this. I don’t know if Cyber X does this. But, you know, let me give you some simple examples. You mentioned time of day. If you have people logging in with legitimate credentials at strange times of day, that suspicious. Sometimes if you have people logging in simultaneously from 2 devices, that’s suspicious. Certainly, if you have people logging in simultaneously from 2 devices that have IP addresses in different countries, that’s very suspicious. So, yeah, there’s… there is… there certainly are simple examples you can come up with and probably much more sophisticated ones as well. So, with that background, let’s… let’s head back to Phil and, you know, look at the findings in his report.
Can you talk about what you found, what… what’s your reporting here?
Phil: Sure. So, we found a number of interesting results that may be surprising to some folks and maybe not the others. You know, one of the myths that has existed in our community for years is the myth of the air gap. And I think there are still some major executives in critical and industrial infrastructure think there is an air gap. And I’m sure that in some highly regulated environments, the air gap does in fact exist. But because of industrial IoT, because of the requirement to collect more and more real-time intelligence from a plant networks, the air gap doesn’t exist anymore, in fact, may never have existed for most organizations.
So, we found that in more than a quarter of all the sites that we analyzed, we found direct internet connections from the ICS network. And what I mean by that is we actually… it wasn’t just, you know, a device reaching out and calling Microsoft for Windows Update, there was actually a response as well from a server in the public Internet. And anecdotally, we found, you know, it could be that example. It could be someone configured their system to get updates from Microsoft or some other site automatically. It could be a third-party contractor or OT vendor that has decided they’re going to come into the network over the… over the internet to do remote management. It could be, you know, something as simple as that a worker decided they wanted to watch… they wanted to watch ESPN at night on the night shift. And so, they figured out a way to connect the internet to their network, maybe through a Google Home device or something like that.
So, we found all of those examples. And I think that that’s one of the more interesting ones that will help raise visibility, at least with management teams, that the air gap, if it ever existed, does not exist any longer.
Nate: That answer from Phil struck me is rather spicy. We’ve talked about air gaps on this show and usually, they sound like one of the best ideas in Industrial Security. So, the fact that Phil’s saying they’re essentially, if they ever were a real thing aren’t really any more, seems rather surprising to me.
Andrew: Yeah, air gaps are… are controversial. And I’ve got way too much opinion on air gap, so I’ll try and keep it short. Where to start? Air gaps are still used routinely in some of the most critical systems. I think it was Matt Gibson was mentioning something like that, when he was talking about nuclear reactors. Yeah, you want the… you know, the control rod… control system for the reactor to be air gapped. Thank you. But, you know, I take… I take Phil’s point here that way too many people think they have an air gap and they don’t. Actually, maintaining an air gap is much more difficult than a lot of people think for certain kinds of networks.
What a lot of people don’t realize is that, you know, it’s sort of on the third side of the coin, there’s a lot of air gaps around, you just don’t realize it. I mean, most modern power tools, drills, saws have liquid crystal screens driven by a CPU. Is that CPU network? Nope. It’s air gapped. You know, some thermostats have a CPU in it, air gap. So, there’s air gap stuff all over. But the concept of an air-gapped industrial network is… is what’s controversial. It’s more difficult, much more difficult than people think. You can’t just wave your hand and say, “There, I’m air gapped.” You actually have to go and track down every possible connection to the outside world and rip it out. And most people don’t bother. They just wave their hands and say, “Hey, I’m air gapped.”
But, you know, one of the things that bugs me is the word ‘myth’. And you know, I don’t want to pick nits with Phil, but you know, I argued in the press for some at some… length with… with another vendor who was putting out articles about, you know, myths of air gaps and you know, air gaps and unicorns. And, you know, they argued that air gaps are misleading. They give a mis… a false sense of security because attacks can still be carried on USBs. Who was doing this? It was a firewall vendor. They were selling firewalls saying, “Don’t use air gaps, use firewalls.” And I came back and said, “Guys, everything you’ve said about an air gap also applies to a firewall. Firewalls apply a false sense of security, steal the password in your in. Carry a USB past it and you’re in.”
And so, there’s a lot of confusion out in the community about air gaps, but I think the… the point that Phil is making, I have to agree with. And that point is, a lot of people in the industry, the… you know, generally the more senior, the more this… this mistaken conception exists. A lot of people in the industry think that their own industrial control systems are air gapped, and they’re just not.
Interesting. But your report talks about a lot of vulnerabilities. Can you give me some other examples?
Phil: So, another example would be outdated Windows systems. And again, we know that in OT environments, it’s often difficult to upgrade HMIs and engineering workstations to more modern versions of Windows. We know that, you know, some of these systems have been in place for many, many years, everything works perfectly. And to… to upgrade from an older version of Windows would require a lot of time and effort, often requiring, you know, applications to be rewritten. So, we understand why that’s the case. But again, I think this is something perhaps that management teams are not aware of. We found that more than 2/3 of the sites we analyzed had outdated versions of Windows. 71% at outdated versions of Windows, which means they’re no longer receiving security patches from Microsoft, which makes them much more vulnerable, not just to the really severe vulnerabilities like EternalBlue (for which Microsoft did issue a patch, even for older Windows operating systems), but for a lot of other vulnerabilities that attackers can use to easily compromise those systems.
Another example would be the use of unencrypted passwords. And again, if you think about when many of these systems were deployed, it was a time when the designers of the network assumed, if you had connectivity to a device, therefore you had permission to access that device. So, older protocols like SNMPv1, which did not have encryption, transmit passwords in plain text, we found that 64% of the sites we analyzed had unencrypted passwords. Again, I think for people in the community, these shouldn’t… these are not surprising statistics.
Perhaps for management teams, you know, if you can imagine if there were ever a safety incident or an environmental incident in a plant, and that resulted in a corporate liability trial, and information was presented to the judge or the jury that said, “Well, you know, you know that the plant is still running Windows XP, right?” and that there’s… they have unencrypted passwords running on their network.
And then, you know, another statistic around antivirus, we found that 2/3 of the sites did not have antivirus, or at least weren’t receiving automatic updates for antivirus. Since we’re not actually on the endpoint, we can’t tell if they have antivirus. So, we look for the signal that says, “You know, I’m updating my antivirus signatures.” And those signatures don’t necessarily need to come from the internet, obviously, they can come from a local server. But so, we… were stating this statistic in a conservative way by saying 2/3 aren’t receiving automatic updates. But it could also be that close to 2/3 or 2/3 aren’t even running any antivirus at all. In any case, if you’re not getting automatic updates, the antivirus isn’t very useful.
So, again, that would be maybe an eye-opening statistic for folks who haven’t been in the community for a while and don’t understand that, for many years, the OT equipment vendors didn’t allow you to install antivirus on their systems. They were worried about system stability, reliability, sub-millisecond response times being affected by an antivirus scan. So, for many years, it wasn’t allowed. Obviously now, they’ve all approved and certified antivirus programs. And then there’s lots of other endpoint security protections like whitelisting that can be applied nowadays. But again, many of these systems have been in place for a long time. They’re working fine and no one’s gone back and taken a close look to say, “Boy, we really should have some more endpoint security running on this device.”
Andrew: Before we start talking about… about, you know, the implications of the report here, can you dive a little deeper? I mean, I kind of understand how you can tell from passive scanning or passive watching of the network packets if, you know, an antivirus update goes in. Because that’s going to go in over the network. You know, it’s pretty clear to me that, if you see passwords flying by, well, you know, that’s… or that… you know, those versions of protocols flying by that’s… that’s a pretty clear indicator. But how do you know that… that, you know, 60% of Windows boxes are outdated? How do you figure that just by looking at the packets?
Phil: Well, there’s a lot you can get from the network traffic, from the headers. Certain things you can’t see, and that is, I think, one of the reasons why active monitoring has become more acceptable in the last year or so. Like, you can’t tell what service pack is being run. But you can tell by the… the headers and the information in network traffic, what the… what type of Windows system is running on… on those endpoints.
Nate: Andrew, why would an industrial plant in 2020 still be running Windows XP?
Andrew: There’s a lot of reasons. From sort of the mundane, it’s all… it… you know, it’s how we’ve always done it, and we haven’t seen a need to change. Well, security’s a need to change. That’s sort of the obvious one that… that outside observers gravitate to. Two, the very technical, which is, you know, the… we have a large, complex and very expensive industrial process that really needs to keep running. And if we upgrade to Windows 10, and upgrade the software from XP version to Windows 10 version, and migrate the configurations and test it for long enough to gain confidence that making this change is not going to trip the plant and lose us an hour or a day or a week of production, that process of upgrading and testing is very costly. And, you know, these upgrades just take… take place at very long intervals.
To me though the… the big value that Cyber X is providing here is that, you know, they are providing these reports, these vulnerability reports to a very high level in the organization. This is the C levels and the, you know, maybe even the board who… who sees these reports, and becomes aware of these problems. You know, the fact that we cannot upgrade a Windows XP machine does not make the machine any less vulnerable. It’s still vulnerable, we still have to secure it somehow.
And, you know, one of the fundamental tenets of cyber-risk management, of any risk management, you know, what do you do with risk management? You can accept a risk, which says, “If it happens, it happens. We’ll pay the price.” You can mitigate the risk, which is do something about it. You can transfer the risk which is buy insurance for it. One of the fundamental tenets is that you accept only those risks that you understand. You should never be blindsided by a risk. “Bang, here’s 100-million-dollar cost, where did it come from?” And so, it’s important to provide visibility high in the organization into the situation low in the organization where there are significant risks. So, this is a significant value that… that’s being delivered here in terms of organizational-risk understanding and risk management.
So, you mentioned the reaction of, you know, more senior people who might not have been aware of all this. Can you talk about you know, how… how are your findings received when you present them to an organization?
Phil: You know, we find that the control engineers, the people in the plants are aware of these issues, these outdated systems and unencrypted passwords. We find that it’s often a surprise to the corporate IT security folks who traditionally were kept away from the OT environment. But what we’ve also seen in the last year or so is that the corporate security organization, the CSOs organization, realizes that they are in fact accountable for security of the entire enterprise, and that includes OT environment.
So, we’ve seen big uptick in initiatives in critical infrastructure and industrial organizations to strengthen their OT security. And they often use these statistics as a way to internally get the resources and the budget they need. And more importantly, to get the buy-in from the OT teams. OT teams, you know, their priorities are not security, their priorities are production quality, safety. And some of those things overlap with cybersecurity. In fact, all 3 of them overlap with cyber security, but they may not be aware of it.
So, these statistics are a way to get internal buy-in at the management level, at the board level, and at the OT leadership level for changes that need to be made. Because folks running the plant, their number one concern is, “Let’s keep the plants running,” they want to make sure that nothing is going to disrupt that. When they see that cyber can in fact, disrupt production or cause safety or environmental incidents, it gets their attention. And when they see these statistics and realize that, in the last few years, we’ve seen the Norsk Hydro attacks, we’ve seen the (unclear) [26:11] attacks, we’ve just seen an attack on a natural gas pipeline with commodity malware, but it still shut down the HMIs and prevented folks from seeing what was going on in the plant, which could have led to a safety incident. When they start seeing these things combined with that data, they realize that something needs to be done. And like I said, we’ve seen a huge uptick in initiatives in the last year or so around strengthening OT support.
Andrew: So, Nate, you know, I want to echo the Phil’s last comment there. You know, I work at Waterfall. We work with a lot of organizations as well. And what we’ve observed, he said 12 months, we’ve observed sort of more like 18, 24 months. But yeah, in the last year or 2, there’s been what we’ve observed as a C change in the Industrial Security community. You know, before that… I mean, I’ve been in the community for going on 2 decades now, before that, what we observed was that the IT people, the CSOs were sort of outsiders to the OT security process. OT security was, not universally, but largely driven by OT.
In the last, you know, 12 to 24 months, we’ve seen the CSOs the corporate security organizations acting for the first time decisively on their responsibility for OT security and taking the lead on OT security. So, yeah, it’s interesting that that Phil is confirming, again what… what I have observed as a C change in the Industrial Security community.
So, Phil, I also heard you mentioned, you know, the… the perspective of senior executives, you know, considering the possibility of lawsuits, you know, liability lawsuits from, I don’t know, injured parties or from shareholders is… you know, is the concept of OT security due diligence evolving here? Is… can you speak to that?
Phil: I think that, you know, boards and management teams are realizing that, not only can cyber affect them financially by bringing down there plants, but it can lead to corporate liability issues. And in fact, Gartner recently put out a report, where they said that the financial impact of attacks on what they call cyber physical systems, resulting in casualties will reach over $50 billion by 2023, which is 10 times higher than 2013 levels of data security breaches. And they’re also predicting that, by 2024, the liability for cyber physical incidents impacting human life or the environment will result in personal liability for CEOs. You know, much in the same way that Sarbanes–Oxley, for example, resulted in personal liability. So, we’re not quite there yet. But the direction is clear.
And… you know, and boards and management teams don’t need don’t need regulation to realize that this is something they need to pay attention of, and it’s part of their diligence, it’s part of their oversight, it’s part of governance. And CSOs realize that it’s also part of their governance. And that’s why having a unified approach to security monitoring governance that goes across both it and OT and IoT is… is going to be something that people realize they need to do, and they’re… and it’s starting to happen.
Nate: So, $50 billion, that’s a… that’s a chunk of change.
Andrew: Indeed. You know, it’s the first time I’ve heard that. I meant to ask Phil, which report that came out of. But, you know, the… the Gartner Group’s been right before. They predicted back in 2005, they predicted the trend towards IT/OT integration. As controversial as that trend has been, they predicted it correctly. And if they’re predicting this now, you know, to me, that has a lot of credibility. So, yeah, this is… this is a number and a trend that I think we all need to be looking into.
The report also talks about mitigations. Before we get into that, was there any other findings that you… you wanted to share with us?
Phil: Yeah, there’s one other data point that I think will be interesting, again, not surprising to folks in the community, which is that the number of sites that have remote management protocols, I’m talking here about RDP, SSH, VNC is more than half of the sites have those protocols being used internally. Now, I’m not saying that they’re exposed to the internet, but we know that they make life easier for people administering their systems. Often, if you’re sitting on the IT network and you need to configure a device that’s on the OT network, if you have a properly segmented network, there’s a way to get through the firewall to do that.
Interestingly, we found in the first Ukrainian grid attack that that’s actually how they got into the OT network was they compromised through probably phishing an employee’s system on the IP network and then use their credentials through SSH to get to the OT network. And it’s something we’ve seen a lot. We’ve also seen RDP being used as an attack vector for ransomware in the last few years.
And so, all we’re saying here is that once an attacker gets into the network, no matter how they get in, they can use these remote access protocols to get to almost any device in the network, and then exploit the vulnerabilities in those devices to compromise them. So, whether that’s, you know, already knowing the passwords through an encrypted password, whether it’s, you know, there may not be any credentials on some of the PLCs, whether it’s a zero day… but they don’t really even need zero days to get to these devices. They can really just use the insecurity that… that was essentially built into those devices when they were designed to get into them. But the point is, it’s very easy to navigate through these networks using these remote access protocols like RDP. Once an attacker gets in, it’s fairly easy for them to compromise any other device they choose to find.
Nate: What is RDP?
Andrew: RDP is remote desktop. You know, VNC is an equivalent to remote desktop, lets you see the desktop on another computer and, you know, move the mouse, type on the keyboard. You know, I would summarize you know, Phil’s finding here, you know, very loosely as the modern attackers, when they want to move around these networks, they don’t have to use vulnerabilities, they can use permissions. Because these machines are all set up with permissions to let other machines control them. And bluntly, it’s… it’s way easier, it’s way cheaper for an attacker to use permissions than it is to use vulnerabilities.
The report also talks about, what can you do about this?
Phil: Yeah, so in our report, we outline the series of recommendations that are modeled on a process or a methodology that Idaho National Labs has written about called… they call it consequence… Consequence-driven Cyber-informed Engineering, CCE, which is quite a mouthful. We summarize that methodology in 5 steps, which is essentially a way to acknowledge that you can’t fix everything, you can’t upgrade everything, not only because of the huge cost that would be required, but the time and the disruption to production. So, you need to find a way to minimize the risk, fix the vulnerabilities that are the most important, that… the ones that affect your crown jewels, and then put in place compensating controls to protect what you can’t change.
And so, the 5 steps are, number 1, identify your crown jewel processes. These are the processes that your company depends on for its existence. These are the processes that, were they to be compromised, would result in a major safety or environmental incidents, would shut down a key production line that generates revenue for your company. And so, the number 1 thing is to identify, what are those crown jewel processes. And the way to do that is to have conversations with the business. You can’t do it on your own, you have to talk to the business, you have to understand what’s important, you have to understand where the risk lies from a safety or environmental point of view, and then figure out a way to eliminate… so the second step would be to map the digital terrain. In other words, identity all the connected assets in the organization, identify all of the digital pathways into the organization, which can include, you know, third-party contractors who come into the network through remote access or VPN connections. It can include connections between your IT and your OT network. It can include even newer industrial IoT devices that may be in place, not to specifically control the physical processes, but to monitor them for predictive maintenance, for example. So, they’re outside the loop, but they’re still on your network somewhere. Look at all those pathways and map… map that digital terrain so you have an understanding of how an adversary would go after your crown jewel processes.
Andrew: So, Nate, this… this concept of mapping the digital terrain, it’s an important one. The only… you know, all cyber-attacks are information. The only way that a control system can change from an uncompromised to a compromised state is for cyber-attack information to enter the system somehow. So, understanding all of the ways in is very important. But when he talks about inventorying all of the paths into the organization, you know, it wasn’t clear to me, is he talking about into the organization, the IT network, the… you know, the cell phones, or is he talking about into just the control system? So… so, I asked for clarification.
If I could ask, when you’re talking about mapping the digital terrain, you talked about mapping all of the ways in. Into what, please? Are we talking about all of the ways into the industrial network? And so, you know, it’s sort of the conversation starts at the IT/OT interface and at any sort of, you know, dial-up modems or DSL modems that have been deployed directly into the process, or does the way in start at the… the enterprise, internet firewall, and all of the ways that the IT network interacts with the outside world?
Phil: Yeah. No, it’s the first thing. I mean… I mean, conceivably, your corporate IT security team has a good understanding of how the network is configured on… on the corporate IT network side, and what gateways exist in and out of that network. Now, what we want to look at is what pathways exist into the OT network. As you said, either connections from IT to OT, connections for dial-up modems (and we do find those too), connections through VPN, whatever it is, what are the different ways people get in? And then not only how did they get it, but then how would they navigate through the network to compromise your crown jewel processes? So, that’s really step 3, which we call illuminates the most likely attack paths.
If an adversary wants to go after your crown jewel assets and processes, what’s the most likely way they will do that? What are the vulnerabilities they’ll expl…? The easiest, they’re going to look for the easiest way, and so they’re not going to necessarily need zero days to get to your crown jewels. They might exploit weak credentials. They might exploit unauthorized connections between IT and OT that you didn’t know about, subnets connections that don’t go through a firewall or a DMZ. So, you want to look at, as your understanding from step 2 maps the digital terrain, and your understanding of the vulnerabilities in the environment, and combine those 2 together to see, what are the most likely attack paths into the network?
And we have, as part of our platform, an automated threat modeling approach that does that using the information we’ve collected based on the topology and the vulnerabilities we’ve identified, some of which we talked about earlier for the… from the risk report. And… but you’ll also probably need Red Team exercises. So, you can look at other entry points, might be social engineering or physical access to the facilities. But there is an automated way to do this and to actually draw a diagram of what the most likely attack path would be. So, that’s step 3.
Step 4, we called mitigate and protect, which is, find a way to mitigate the risk from these most likely attack paths. So, you can’t patch everything. You can’t upgrade everything. But if you look at the attack path to your crown jewel process, what do you need to do to eliminate those attack paths? And that could include patching. It could include better segmentation. It could include strengthening credentials. And it could include upgrading… upgrading older versions of Windows. But the point is you don’t have to do it for every single device in your facility that may not… may not be relevant to… you know, to an adversary looking for an attack path to your crown jewel processes. So, that’s mitigate and protect.
Now, you can’t patch everything, you can’t upgrade everything. So, the next thing as part of mitigate and protect is to put in place compensating controls. And what we’ve seen is most effective there, and it’s something our platform provides, is agentless-network security monitoring. Monitoring the network so that, if an adversary does get into the network, you can immediately be alerted and you can stop them in their tracks before they shut down or blow up your plant.
And we know that adversaries can be in a network for a long time doing reconnaissance, understanding how the network is configured, looking for the assets that they want to compromise. We know for example, in the Triton cyber-attack on the safety controllers in a petrochemical facility, that they were in the facility for years before being detected. So, most security professionals today realize, you can’t keep a determined attacker from eventually getting into your network. So, the idea is, how do you quickly identify when they’re there and then find a way to shut them down? Quarantined them, you know, blocked them, whatever it takes, before they can actually do any damage.
Nate: We just did 70 minutes on CCE, you know, a couple episodes ago, but it sounds to me like Phil’s take on it is a little bit different than Andy Bachmann’s was.
Andrew: It is. I mean, Andy Bachman had a whole section on physical mitigations on, you know, non-von-Neumann digital mitigations. And Phil hasn’t mentioned that. And I actually asked Phil about this offline afterwards. And, you know, basically, everything Phil describes is in CCE, he’s just skipped that piece of it because it’s not a piece where… where Cyber X adds value. They don’t deal with physical mitigations. They deal with the cyber end of it. And, you know, if you recall what Andy Bachman said was, “Do all that cyber stuff, do it really, really well. And for the risks that you either cannot or have decided not to mitigate, physically, do this attack- path analysis. Take steps to disrupt the attack path so that none of the attacks that lead to your truly unacceptable consequences, you know, have a significant chance of succeeding.”
So, it’s… I think, gets a difference in focus here. Phil is focused on the parts of CCE that Cyber X adds value to, as opposed to the big picture, which is what Andrew Bachman was talking about.
If I recall, though, you said there were 5 steps. What’s step 5?
Phil: To remove silos between OT, IT, IoT, you know, which could include building management systems, CCTV cameras, you know, are those OT? Are those IT? It really depends on the organization. So, this is more of an organizational recommendation, which is that you need a unified approach, you need a holistic approach to the people, the process, the technologies you’re using to secure your facilities. And you can only do that, for example, from a… from a process and technology point of view, the alerts from our platform in all our customers feed into their SIM. You know, that could be their Splunk system, their cue radar system, their ArcSite, their (unclear) [3:35] witness. Because most organizations have spent years and a lot of money building workflows in their corporate socks, training their people on how to handle incidents. And you want to be able to leverage that.
Now, not everything is going to be the same. You know, if there’s an alert that somebody just reprogram the PLC with new ladder logic code, if that alert shows up, it could be legitimate, or it could be malicious. You know, in the Triton attack, that’s actually how they installed their backdoor into the safety controller. So, you need to tweak your processes in the sock so that if someone sees… if an analyst in the sock sees an alert, “Someone just reprogrammed this PLC,” they know which control engineer to call to validate if it’s malicious or not.
But… but the tools and the overall processes and workflows are… you have in place are going to be helpful for… for dealing with OT breaches as well as they do for IT breaches. So, this unified approach is very important because it also gets OT folks and IT folks to speak the same language. They have a common view now of what the network looks like. They have a common view of, you know, “What’s a PLC? What’s an HMI?” And we often do workshops at client sites to educate the IT security folks about how OT is different, and to educate the OT folks about, you know, best practices for security, and to get everyone on the same page and collaborating, because that’s… that’s the only way to stop these compromises.
Andrew: In your description of the methodology that you talk about applying mitigations which… which makes sense. But, you know, the question that always, always bedevils me is… is how much is enough? And it seems to me, if you’ve got a disconnect between business decision makers in terms of their understanding of what’s happening and the… you know, the engineers on the OT side, that you might have… you know that the business decision makers might have trouble expressing due diligence business level imperatives to the OT folks. And the OT folks might have difficulty explaining the degree of protection that’s already in place, and, you know, trying to figure out how that dovetails with the business due diligence requirements. You know, do you see any of that? Can you… can you speak to that?
Phil: Yeah, certainly. I mean, what we see is that, you know, if… the CSO’s job is to support the business number 1, and number 2 to reduce the risk and to reduce the exposure. And so, when the CSOs begin to realize that they’re accountable for security across the organization, not just on the corporate IT network as they’ve traditionally been, they can help educate the board and the rest of the management team and the business about the risks and… and the business risks, right? Because cyber risk is really business risk. Downtime due to ransomware causes financial losses and loss in shareholder value. Safety environmental incidents result in, you know, corporate liability lawsuits, compliance violations, brand impact.
So, those are business-level issues that the board and the management team understands, and that it’s the CSO’s job to educate the management team and the board about the risks that are inherent in the OT side of their business, and to get the buy-in top down from the business to implement stronger security on the OT side. At the same time, what we found is most successful in addition to the top-down approach is a bottoms-up approach. So, our successful deployments or the ones that go smoother than others are when the people implementing stronger OT security spend a lot of time in the plant and a lot of time with the people in the plants, explaining to them why this needs to be done, explaining to them that there’s going to be no disruption to production. And in fact, that there’s actually an operational and a productivity benefit to imply… to deploying this type of monitoring, because often, we’ll find operational issues, misconfigured equipment, or malfunctioning equipment that no one really has a way to troubleshoot. And because you’re monitoring that network traffic continuously and you can analyze it, you can very quickly find out what’s going wrong.
And I’ll give you an example. One of our customers has plants around the world, they’re a manufacturer. And one of their plants, the production line, was shut down for one and a half days, and they didn’t know why. And then that usual sort of finger pointing kind of way, the folks in the plant called folks in the IT security department and said, “You know, maybe you’ve mis… you misconfigured or firewall. There must be a firewall that’s blocking and that’s causing this problem.” And the… the IT security folks said, “Well, take a look in the in the Cyber X console and see what you see.” And they really very quickly found out that it was 1 device in particular, somebody had upgraded their RS logic engineering workstation. And in that upgrade, it changed its configuration so that it was now flooding the OT network and causing the PLC to timeout.
So, there are operational benefits. My point is that there are operational benefits for the OT team as well from a productivity point of view, in addition to the cyber security benefits, which is preventing downtime and safety and environmental incidents.
Nate: Sounds to me that Phil here is echoing one of the themes of our show most recently stated by Matt Gibson, I believe, that you have to know your systems better than your attacker does. Not just where everything is and what it does, but what’s connected to what and what data is going where.
Andrew: Absolutely, absolutely. And you know, this is the kind of insight that this class of system password network monitoring can… can provide. But Phil’s comments are also reminding us of a question you asked earlier in the podcast, which is, you know, why is there so many Windows XP machines running around? Here’s an example of the consequences of a botched upgrade. They didn’t upgrade the operating system. All they did was upgrade the firmware on… on a device, and it caused at least a day and a half of downtime. You know, once they… I did not ask, once they figured out what the problem was, how long did it take to get the plant back into production? Because these, you know, large installations do not come back into production instantaneously once you fix something. It takes time to ramp up back to full production. So, you know, this is an example of a very expensive mistake in an upgrade. This is a reason that people are reluctant to make any kind of upgrade because the tiniest mistake leads to these kinds of consequences.
This has been great, Phil, thank you so much. We like to leave our guests with the last word. Is there a thought you would like to leave with our listeners.
Phil: So, just to say a few words about who Cyber X is. We were founded in 2013. Our founders were blue team network defenders working at the Nation-State level. They realized that the number of unmanaged devices, especially in industrial environments, was going to keep increasing, and they decided to build a platform specifically to address… to address industrial cybersecurity with a deep understanding of the protocols and the devices that are used in those environments. And we’ve since expanded that platform to include IoT devices like building management systems, as well as some of the unmanaged IoT devices that you would find on the IT side like printers and routers that also are being compromised by adversaries.
When you look at the 1800 networks that are analyzed in this report, they’re taken from many different verticals across the world, manufacturing, chemicals, pharmaceuticals, energy and water utilities. And some of our clients include 3 of the top 10 pharmaceutical companies in the world, 3 of the top 10 energy utilities, and 1 of the top 5 cloud data center providers in the world. And our mission is really to help you, help our clients keep their environment safe, keep up their uptime at the highest level possible. And it’s not just their technology, we have, over the years, gained expertise in this area so that we can also help with best practices. We can transfer those best practices to your teams. We can help you address some of the organizational change issues that we’ve been discussing in this podcast.
So, I invite you to download the full risk report. It’s called the Global IoT/ICS Risk Report. You can download it from our website, cyberx.io. You can also just search for it and it’ll show up at the top of the search list. And if you have any feedback on the report or any of the comments I’ve made in this podcast, please feel free to send me an email. I’m firstname.lastname@example.org, or you can also find me on LinkedIn. And I look forward to hearing from you. Thanks.
Nate: Alright, well that was pretty self-explanatory. Let’s wrap things up here. Thanks again to Phil Neray for speaking with you, Andrew. And thank you, Andrew, for speaking with me.
Andrew: My pleasure, Nate.
Nate: This has been the Industrial Security podcast from Waterfall. Thank you for listening
- Ransomware Goes Nuclear – James McQuiggan | Episode #40 - July 27, 2020
- 8 Common Firewall Mistakes Webinar Recording- SANS Webinar - July 23, 2020
- IIoT for Distributed Energy Resources – James McCarthy & Don Faatz | Episode #39 - July 13, 2020