Leveraging Operator Knowledge in Cybersecurity
Blog

Cutting Through the Noise: Leveraging Operator Knowledge in Cybersecurity

You are off to a great start: Your organization has a network security monitoring solution deployed and configured. Traffic to and from the system and critical assets are on your radar. Events and alert notifications are functioning properly and streaming in. All the necessary data about these events, alerts, and your overall environment is now at your fingertips.

So now what?

While it’s true gaining visibility and insight into your environment is a crucial first step, it also results in a plethora of events and data that can be overwhelming, even for large and mature teams.

Prioritizing network security and performance monitoring is traditionally prioritized for good reason — it’s a solid foundation on which to build a secure overall environment. However, this approach can also overlooks your organization’s most knowledgeable expert: your OT team: the Operators, OT Engineers, and OT Architects themselves.

It’s also worth noting soley or primarily relying on security monitoring solutions may not account for operational metrics like safety, resiliency, and efficiency. Many OT network security monitoring solutions have a limited capability for customization, while others may be complicated and/or IT-focused. This means there can be event notifications for nearly every anomaly regardless of its current impact on operations, even if that event is a normal part of operations for a specific site.

This quickly results in noisy dashboards and alerts that make it difficult to see what really needs to be addressed in your environment. It can also lead to alert fatigue.

So what’s the key to better, more useful monitoring and alert analysis? Utilizing your most knowledgeable experts and codifying their knowledge into your environment’s solutions.

OT Monitoring and Visibility Isn’t One-Size-Fits-All

Every OT environment is unique, purpose-built to meet specific needs and requirements. Because of this, there is no “one configuration to rule them all.”

That means for a visibility and security monitoring solution to be effective, it must be customized to meet the needs of that specific industrial environment. Though potentially beneficial strategically, the truth is relying on threat intelligence for detection doesn’t always add the value for IT and OT teams maintaining day-to-day operations.

Becaues industrial control systems (ICS) environments are highly specific, what monitoring software may detect and flag as an indicator of compromise (IOC) may not in fact be relevant to that environment. This is why operator intelligence is key and plays an invaluable role in ensuring monitoring is effective and useful.

Operators know their systems and environments better than anyone. They know how the processes work, which devices are critical to operations, concerns about specific sites, have deep knowledge of safety systems, are familiar with preventative maintenance schedules, and on and on.

These nuances of an industrial environment, intimately understood by operators themselves, are a critical part of configuring monitoring and detection software so that the resulting alerts are meaningful and actionable.

Bringing Operator Knowledge and IT Solutions Together

Effective monitoring and alerts begin with a strong foundation of communication across IT and OT teams. Improved cross team communication also ensures everyone is on the same page and using the same language when communicating about assets, environments, and systems.

The best place to start merging operator knowledge and IT solutions is a digital asset inventory. This can be as simple as a spreadsheet with the asset’s operational name and/or hostname, location (geographic, sitename, building, rack/cabinet), IP address, and MAC address.

From there, it’s a matter of getting operator knowledge down on paper. IT teams and analysts can work with operators and OT architects to understand what’s expected or normal behavior in an environment even if something generates an event or alert. This has the added benefit of decreasing the potentially dangerous onset of alert fatigue.

Operators can also explain any existing policies around known issues that can’t or won’t be addressed until the next maintenance window or hardware refresh.

For example, an industrial device could use specific protocols that are out of specification, thus resulting in what appears to be a “malformed” packet that would, in turn, typically generate an event alert. But that event is actually expected behavior based on deliberate configuration, not an alert that requires attention. Operators can empower IT team members with this information and help them to monitor the environment more effectively.

There are questions you should ask when creating operator-informed rules for configuring your monitoring and detection platform:

  • How do operators identify assets and their locations?
  • How does the Operations Technology team define criticality?
    • When does an event need to be addressed?
  • What does normal behavior look like during regular operations?
  • What does normal look like during a maintenance period?
  • When are regular maintenance windows? How often do they occur, and how long do they last?
  • How do operators respond to alerts from specific devices?
  • Are there any long-standing known issues currently being or will be addressed by policy or planned maintenance?

Monitoring & Detection Technology + Operator Knowledge = Actionable insights that will help to secure and improve the overall safety and efficiency of an operation.

Alarm Management Style Monitoring for OT Environments

Monitoring and detection can — and should be — tailored to the specific needs of each OT environment to reduce alert fatigue.

Normalized, curated, and classified events on a dashboard allow smaller teams to prioritize high-impact, actionable alerts rather than trying to dig through and decipher what is or isn’t relevant to their operations.

That means that while every alert will be collected and logged it won’t be flagged for attention unless it’s actively impacting operations or deemed critical to other systems.

Operator knowledge helps correlate related events and alerts and identify when issues need to be escalated. For example, three discrete alerts may not be cause for action individually, but the presence of all three may warrant attention to address the cause or fix the underlying issue.

An operational example of this could be if one alarm might indicate there is an overage in pressure and another alarm indicates that the pressure dipped. Eventually, however, the system pressure normalizes. In this case, the events are logged but not immediately flagged for attention since the system is still operating within acceptable parameters.

But if there’s another alarm indicating a valve isn’t working properly in addition to the pressure alerts, and that valve malfunction may have caused the pressure fluctuations, the same alarm would now have to escalated and attended to.

On the flip side, a cybersecurity example of a similar sitution would be a device whose latency has slowly been increasing relative to the average latency of the network. Since that in and of itself doesn’t affect operations, the issue wouldn’t need to be addressed immediately.

But if that same device is suddenly generating unexpected or even new traffic, this could indicate a misconfiguration or even something malicious. Now the alert is an actionable issue that need to be prioritized.

Actionable Alarms Start with Visibility 

Ultimately, creating an effective alarm management-style monitoring workflow still starts with visibility across your environment. After all, you can’t address alerts for devices and sites you can’t see in the first place.

If you need true visibility without the clutter? EmberOT runs as lightweight software on what you already have, reaches the edge, and streams curated events to wherever you work. Our discrete flow-based detection filters noise at the source and delivers only meaningful context to your SIEM, SOAR, or data lake. Then our Now Next Never prioritization turns alert floods into clear next steps so your team moves faster.

Contact us to learn more or schedule a demo.