Research Paper Dissection - Forecasting Cyber Threats

Picture by Rafael Mendoza (https://www.pexels.com/)

This post is based on the research paper written by:

Zaid Almahmoud, Paul D. Yoo, Ernesto Damiani, Kim-Kwang Raymond Choo, Chan Yeob Yeun

What makes this paper topical in a Snode Technologies context:

It talks about using real-time data to preempt cyber attacks.
It argues on human biases and inaccuracy in expert analysis.

If you haven't already, check out our solution for AI-based Attack Simulation:

Short-term cyber threat prediction for proactive defence.

I cover this problem in detail - however, our solution looks at short-term predictions for proactive incident response. This paper does touch on the fact that this is a valuable application but instead focuses on long-term prediction. The paper notes that long-term prediction, often overlooked by researchers, is key to a proactive defence strategy.

I agree. Also, I would love to know how it differs from the vendor predictions for 2025.

In this post we will cover the following:

Most notable - concepts;
Most notable - findings; &
All important - so what?

Key concepts

Protection Motivation Theory (PMT)

I love it when psychology and cyber collide! PMT is a term mostly used in psychology and is often used in other domains. However, it has not been used in the cyber domain extensively. PMT assesses threats using their associated severity and vulnerability.

Additionally, it assesses coping (mitigation) strategies. The paper defines and trends tactical security (compensating) controls as Pertinent Alleviation Technologies (PATs).

Bayesian variation of MTGNN (B-MTGNN)

Simply put, it's a combination of Multivariate Time-series Graph Neural Network (MTGNN) with Bayesian (my Swiss Army knife). At a high-level, it takes historical trend analysis with current (real-time) observations and does a prediction (as shown below).

Researcher's analytics framework with their novel B-MTGNN approach (taken from: https://eprints.bbk.ac.uk/id/eprint/54443/18/54443.pdf)

If you find it easier to read the source (code) and unpack the raw data (files):

Note the similarities to the approach we used at Snode Technologies:

Additionally, note the similar Machine Learning approach used in our patent:

US Patent Application for REAL-TIME THREAT DETECTION FOR ENCRYPTED COMMUNICATIONS Patent Application (Application #20230156034 issued May 18, 2023) - Justia Patents Search

A system and method for real-time threat detection for encrypted communications are provided. A method includes monitoring a data stream in a network, such as an M2M network, including encrypted message data and non-encrypted metadata associated with the encrypted message data being transmitted between endpoints on the network. The method includes extracting data stream metadata from the data stream including data points extracted from the non-encrypted metadata. The method includes enriching the data stream metadata with contextual data relating to one or more of threat, vulnerability and reputation data points and being obtained from one or more signal sources to output enriched data. The enriched data is analysed and a risk probability score associated therewith is calculated. An action is initiated in accordance with the risk probability score so as to mitigate a threat present on the network.

Justia

Findings

Expert (a.k.a vendor) vs. AI (data-driven) predictions

So, straight to the point. Is there a difference? Yes. For example (taken from the paper), the escalation (trend) for malware and ransomware is high - but, the PATs associated with mitigation trend with a lower priority. Keep in mind - this is not an opinion - but, a data-driven finding. Let's look at a few examples that were noted:

Malware is peaking - but, FIM (file integrity monitoring) is low priority; and
Ransomware is increasing - but, AW (application whitelisting) is low priority.

There is a clear disparity between potential attacks and relevant security measures.

Call Bayes,... and do beta

I'm a firm believer in the Bayesian approach, except if it's the name of a boat. The research supports this view. The performance of the Bayesian variation of MTGNN was found to be better. It's robust even with uncertain data and was more reliable.

So, what?

Firstly, we should supplement (potentially biased and inaccurate) human judgement with quantitative (data-driven) analysis for a more effective cyber defence strategy.

Secondly, consider using (automated, AI-based) long-term prediction to drive your cyber defence strategies, policies, standards and technology architecture evolution.

Finally, we should be aware of our own biases to novel techniques (shiny new things) and not neglect the old, trusted and true methods which are still efficient and effective.

If you would like more information on AI-based cyber defence technologies contact: Snode Technologies.