Using AI for real-time attack simulation at scale (part 3)

Using AI for real-time attack simulation at scale (part 3)
Photo by W W (https://www.pexels.com/@w-w-299285/)

This is the final article in this three part series, describing the process we follow at Snode Technologies to go from idea (part 1), through design (part 2) to prototype.

First demo of our prototype (Joburg BSides https://pretalx.com/bsides-joburg-2024/talk/CDAUVR/)

Quick (optional) recap

A quick summary of our collective journey to this point:

Part 1 - Ideation: the journey from problem statement to initial idea generation.

Part 2 - Design: an approach to research, experimentation and solution design.

Part 3 - Prototype: Using hypothesis testing to develop and evolve a prototype.

In the first part we decided to use AI-based attack simulation modelling, to help us prioritise threat and vulnerability response. Attack simulation modelling, using real-time threat and vulnerability data, would identify the most likely risks to be realised.

High level process overview (Part 1)

In the second installment we walk you through the design process. We designed high-level entity relationships, context, process and data-flow diagrams to help us unravel the overall approach for prototype development. We also mapped key data sources.

Data Flow Diagram example (Part 2)

Part 3 - Prototype

The first prototype, as mentioned in part 1, is always an early form of hypothesis test:

  1. "the first output that Snode's AI-ASM generated (we used Nessus, Nmap, Graphviz and MLpack)".
  2. "we used Nmap outputs to classify assets (with Bayes) and identify weaknesses (with custom NSE scripts)".

We often test our core assumptions in this way. We focus our work using the simplest forms of (1) data inputs, (2) processing and (3) tooling - to produce an output. This is our version* of an MVP (minimum viable product), but we never release it. We (Snode) only release what you may call MQP (minimum quality product). Our "MQP" approach ensures we reflect our values of innovation and excellence in our client beta releases.

*_I'm not saying Eric Ries implied "low-quality product" when he originally used MVP:

Eric Ries book "The Lean Startup" (https://theleanstartup.com/)

MVP hypothesis testing

Let's use a simple example of news classification to identify vendor specific issues.

First, we will need training data. We always look for accurate data sources that are well structured for classification. For example, CISA's Known Exploited Vulnerabilities (CSV) dataset would be ideal. Let's use it for our model training in this experiment.

Known Exploited Vulnerabilities CSV export on CISA's website

Reviewing the (CSV) data you can easily identify the opportunity to classify by vendor.

CSV data specifies the vendor in each data record

With a lot of attackers targeting internet facing VPN (virtual private network) services, this seemed like a good test case. We just need a Bayes classification tool, like dbacl.

Installing dbacl using brew.

Now let's get some relevant headlines, for two VPN vendors, to do some model testing.

First news headline I got from Google (searching "palo alto vulnerability").
First news headline I got from Google (searching "fortinet vulnerability").

Let's create our training data sets, using the CISA Known Exploited Vulnerabilities CSV.

Using CISA (CSV) data to build training data sets for our classification model.

Now we train classifier models to identify headlines related to Fortinet and Palo Alto.

Training classification models using dbacl.

Lastly, we test (model testing) if the news headlines are correctly classified by vendor.

Testing the model classification using the news headlines.

Now we just need to perform the same for every software and network vendor ever!!! Which means we solved the initial problem - we simply need to SCALE our solution. ;)

I'm only half joking,... having a solution that doesn't SCALE (yet) is a great position!

We just moved from a "known-unknown" to a "known-known". So, in summary:

  1. We always start with the "known-unknowns" and solve the problem statement.
  2. We build "known-knowns", like interfaces, last - because we know how to do it.
  3. We start with the simplest form of (1) input, (2) process, (3) code; and SCALE it.

MQP releasing the beta

The Beta is always a combination of solving the problem and a great user experience.

In the case of the attack simulation we went from the flat monochrome diagram to a fully interactive real-time network visualisation. This allowed the user to visually digest the information and glean insights faster. This is achieved without cognitive overload from multiple bar graphs, pie charts and traffic light risk ratings (as shown below):

Moving to a real-time "Digital Twin" interactive visualisation.

Conclusion

The platform described made it into production. At the time of writing there are several clients running this new technology. I hope this helps and inspires your next innovation.

I'll produce a video and article showcasing the new AI-platform - subscribe to get it.

References

I’ve seen you get hacked! (AI Real-Time Attack Simulation) BSides Joburg 2024
Imagine running multiple threat models, attack trees and graphs – simultaneously - on real-time asset cartography, vulnerability data and threat intelligence. Leveraging AI for predictive analytics, you could proactively defend regardless of the dynamics and turbulence presented in the emerging technology, attacker or vulnerability landscape. This is how we did it - and what we learnt.