Upstream / Midstream / Downstream Cyber Attacks – Dependency Analysis
It turns out that there are really only three ways that ransomware can shut down OT networks and physical operations: "abundance of caution" shutdowns, OT dependencies on IT systems and services, and ransomware impacting OT networks and systems directly.
Andrew Ginter
“…there is little benefit in having the world’s strongest OT security program if we must shut down our operation every time the IT network is compromised…”
The Waterfall / ICS Strive 2024 Threat Report lists a handful of serious cyber attacks impacting the performance of oil & gas infrastructure in the last several years, including the Colonial Pipeline shutdown and halted shipments at three ports / oil terminals. Most of these incidents were due to ransomware, and most of that ransomware impacted the IT network. It turns out that there are really only three ways that ransomware can shut down OT networks and physical operations: “abundance of caution” shutdowns, OT dependencies on IT systems and services, and ransomware impacting OT networks and systems directly.
In today’s article we look at dependencies. In short, there is little benefit in having the world’s strongest OT security program if we must shut down our operation every time the IT network is compromised with ransomware, because our operations depend on IT services. For example:
- Upstream production might depend on a functioning IT-based royalty reporting system,
- Midstream operations might depend on a functioning IT custody transfer system, and
- Downstream refining might depend on a functioning IT-based emissions reporting system.
These kinds of dependencies are called out explicitly in the US TSA Security Directive 2021-02D for pipeline operators. In particular, the directives establish requirements for the nation’s most important pipelines. For critical OT systems, owners and operators must:
- Implement segmentation designed to prevent operational disruption to OT systems if IT systems are compromised,
- In support of that goal, identify all OT dependencies on IT services,
- Design OT networks so that they can be isolated from IT networks during incident response procedures.
While not stated explicitly in the security directives, the ability to separate OT and IT networks in an emergency can enable OT systems to continue operating through an IT emergency, but only if OT dependencies on IT networks and OT trusts of crippled IT domains do not impair that very desirable ability to operate independently.
If we wish to operate our OT systems through an IT security incident, then while it can be very difficult to eliminate all OT dependencies on IT systems, we cannot simply ignore those dependencies that remain. Instead, we must recognize that IT systems that are essential to continued physical operations are in fact reliability-critical components. These reliability-critical systems may be hosted on what we think of as the IT network instead of the OT network but must be managed and secured as if they were OT systems. For example:
- If a pipeline depends on a custody transfer and billing system in IT, we could modify our customer contracts so that if we must declare force majeure, custody transfer billing enters an “approximation” mode. The OT system continues operating the pipeline, caching all billing-relevant data in a historian or other repository until the billing system recovers and can reconcile accounts.
- If an upstream producer depends on a royalty reporting system in IT, we could (hopefully, beforehand) negotiate with the royalty administrator so that, again, if we must declare force majeure, royalty payments could enter an approximation mode, with manual payments authorized every day or two based on approximate data. The OT systems again cache all royalty-relevant data in a historian until the payment system recovers.
- For refining emissions data we do the same, but there are no payments or monies to track, simply emissions data to track in a force majeure condition.
In all three cases, what we are seeing here is not only two kinds of network criticality, a safety-critical OT network and a business-critical IT network, but three networks. The third is a reliability-critical network that is often mixed up with other IT assets. In the examples above, we might be able to redesign our systems so that custody transfer, royalty payments and emissions reporting can, in an emergency, be seen as non-critical. More generally, such redesign may not be possible. In this case, what we need to do is recognize that we are dealing with three network criticalities and start applying some of the TSA approach to managing the OT-critical components in the IT network.
For example – consider the upstream royalty payment system. To be effective in managing the royalty system as reliability-critical, we need to put the royalty system in its own network/DMZ and apply the TSA approach to that network as well – be wary of allowing the royalty network to rely on IT resources that may be compromised, be wary of sharing trusts between the reliability-critical DMZ and the IT network, and so on. It does no good to restore the reliability-critical systems to an uncompromised state if they, in turn, still depend on Active Directory or other IT services that are still crippled by the ransomware attack.
The word “resilience” is often used when looking at these dependencies between safety-critical and reliability-critical networks. In the royalty example, we might deploy unidirectional gateways at the IT/OT interfaces in the offshore platforms or oil fields to prevent any online attack from migrating from a compromised IT network into the safety-critical OT networks. If the IT network is compromised though, we must still shut down the production of hydrocarbons when the royalty system fails. But – if we can bring the royalty reporting system back within hours of failure, and we can bring the field back into full production an hour or two after that, then the result might be regarded as an acceptable worst-case outage of only a few hours.
This kind of network engineering is an example of enabling resilience – production “springs back” into operation after a brief outage, even while the bulk of the IT network is still compromised. Be aware though – while this kind of reliability-critical dependency analysis can result in improved resilience, it is not always a “silver bullet.” A petrochemical refinery for example, can take days or longer to go from an emergency stop condition back to 100% of capacity. Any IT dependency that triggers even a five-minute complete shutdown of such a facility incurs this start-up cost of losing days or more of production. Applying network engineering principles to reliability-critical IT sub-networks can save us a lot of downtime in some cases, but we must still consider the realities of the physical process.
Further reading:
This example is a small part of Chapter 5 of the author’s new book Engineering-Grade OT Security – A manager’s guide. If you found value in this article, you can request your own free copy of the book here, courtesy of Waterfall Security Solutions.
About the author
Andrew Ginter
Share
Trending posts
The 2024 Threat Report: Prioritizing Cyber Security Spending
How Likely Is That To Kill Anyone?
Hitting Tens of Thousands of Vehicles At Once | Episode 131
Stay up to date
Subscribe to our blog and receive insights straight to your inbox