Honeypot v2 progress: Sebek deployed

Following the honeypot range expansion, I have deployed Sebek on the high-interaction host that sits behind one of the Honeyd personas.

This is a short writeup of the deployment and of what Sebek captures that the previous logging did not.

What Sebek does

Sebek is a kernel module — patches into the Linux kernel — that captures every read syscall by every process and writes the captured data via covert UDP packets to a configurable destination.

The key property: the capture happens at the kernel level, below anything an attacker who has compromised the host can interfere with. The packets that carry the captured data are constructed in a way that the standard tools (tcpdump, netstat) on the source host cannot see. From the attacker's perspective, the host appears to be making no unusual network traffic; from the sensor's perspective, every syscall read is captured.

For a honeypot, this is exactly the right capability. An attacker installing a kernel rootkit can hide their files and processes from on-host inspection. Sebek's capture is already off-host before the rootkit's hiding takes effect.

Deployment

The procedure took an evening:

  1. Apply the Sebek patch to the kernel source. The patch is small and well-targeted.
  2. Configure the destination address (the off-host sensor that will receive the captured data).
  3. Build and install the patched kernel.
  4. Reboot.
  5. Configure the Sebek module to load at boot (it cannot be loaded later — the boot-time loading is part of the stealth).
  6. Verify on the sensor that captured data is arriving.

The sensor side runs sbk_extract, a small utility that decodes the covert UDP packets into readable form. The output is essentially a stream of "PID X read N bytes from FD Y, content was [...]".

What is captured

The captured data includes:

  • Every shell command typed by an attacker (since shell reads on stdin).
  • Every file read (including files the attacker views with cat, less, or any other read-based tool).
  • Network reads, partially (depending on protocol details).
  • Inter-process communication via pipes.

The captured data is all of the read activity, including legitimate system reads. The volume is large — about 50MB per day on a quiet honeypot. The interesting parts are filtered downstream from the raw capture.

What I have observed in two weeks

The honeypot has had three compromise events since Sebek was deployed. All three produced clean captures.

For one of them — a relatively unsophisticated attacker who installed a script kiddie's toolkit — the Sebek capture was redundant with the file-system observations. The attacker did nothing to hide their files; standard inspection was enough.

For the second — a more careful attacker who installed a rootkit — the Sebek capture was the only source of information about what they actually did. The on-host file system showed no recent changes (the rootkit was hiding them); standard logs showed no relevant entries (the rootkit was modifying them); Sebek captured the rootkit installation in detail, including the source code of the rootkit's primary module.

The difference is operationally substantial. Without Sebek, this compromise would have been mostly invisible.

What this teaches

Off-host observation is the only reliable observation. Anything the attacker can reach can be tampered with. Off-host observation is, by construction, unreachable.

Sebek is the right tool for high-interaction honeypots. The visibility it produces is essentially complete for the categories of activity it covers.

The same principle applies to production hosts. A production host with off-host observation is dramatically more investigatable than one without. The Sebek-style capture is overkill for production; the principle — keep the evidence off the compromised host — is exactly right.

For my own infrastructure: I am not deploying Sebek on production hosts (the operational complexity is not justified for the value), but the off-host logging principle continues to be applied broadly.

More as captures accumulate.


Back to all writing