Honeypot v2: design notes · Peter Bassill

I have been running my small honeypot for nearly a year. It has been useful — about 4,000 logged interactions, a few interesting captures — but it is a low-interaction honeypot. It emulates services well enough to be probed, not well enough to be compromised. The data it generates is breadth, not depth.

After a year of the Honeynet conversation and a lot of reading, I am rebuilding. The new design is high-interaction: a real machine, with real services, that I am willing to let attackers actually compromise. The hard problems are containment, observation, and not breaking the law accidentally.

This post is the design notes. The actual build is the next two months of evenings.

What the new honeypot is supposed to do

The goal is to capture post-compromise attacker behaviour. Not the scan-and-probe traffic that hits anything on the public internet, but the activity of an attacker who has actually got a shell and is doing something with it.

This kind of data is rare. It is also enormously valuable: it shows what real attackers do once inside. The defensive implications are large — if you know the typical post-compromise sequence, you know what indicators to look for in your own logs.

Low-interaction honeypots cannot generate this data. The attacker never gets in.

The architectural choices

The key decisions, with the trade-offs I have considered.

Real OS, real services, real shell. The honeypot host runs a real Linux installation with a real shell. An attacker who exploits it gets actual root, runs actual commands, sees actual file system contents. The realism is what produces the data.

This means the honeypot is, by design, susceptible to compromise. That is the whole point. The trade-off is that everything below the host operating system has to be the containment.

Network-level containment. The honeypot sits behind a hardened firewall — a separate machine running OpenBSD — that mediates all traffic to and from the honeypot.

The firewall's policy is restrictive in the outbound direction. The attacker, having compromised the honeypot, cannot use it as a launching pad for attacks against other targets. Specifically:

No outbound connections to arbitrary destinations. Only the small set needed for plausible operation (a DNS resolver, possibly NTP, a single decoy mail relay).
No raw sockets out (preventing flood-style attacks).
Strict per-source-IP rate limiting on outbound.
Logging at the firewall of every packet to and from the honeypot.

The honeypot can pretend it has internet access. Specific sites it might want to fetch from will be transparently redirected to a controlled fake server, or refused with a plausible error.

This is the part that takes the most engineering. Getting it right is the difference between a useful research tool and an unwitting attack platform.

Observation at multiple layers. The honeypot is observed from outside, not from inside.

At the firewall, every packet is captured (tcpdump writing to a circular buffer of the last 24 hours). This gives me the network view of everything the attacker does.

At the OS level, the kernel is patched to log certain events to a remote syslog server. Process creation, file access, network operations. The log destination is a separate machine outside the honeypot that the attacker has no access to.

At the firmware level — and this is the most paranoid part — there is a serial console connection that records everything that goes to the system console, including kernel messages and any console-level activity, to a host the attacker cannot reach.

The principle: any record that only exists on the honeypot is potentially modifiable by the attacker. Real evidence has to live elsewhere.

Decoy services. The services running on the honeypot need to look plausibly used, not freshly installed. This means:

Real user accounts with real shell histories. The histories are constructed to look like legitimate prior use — checking mail, editing config files, etc. They do not contain anything I would not show the world.
Mail spool files with old, mundane mail — not real mail of anyone I know, but generated correspondence that looks ordinary.
A /var/log directory with weeks of plausible log data. The data is generated from a script and rotated periodically.
A web server with a small site, including a few pages that have been edited at different times.
Cron jobs that produce some background activity.

The goal is for an attacker doing w, last, ls -la, crontab -l, mail, lastlog to see a plausibly used machine. A freshly-installed machine is the giveaway that this is not real.

The honeypot's identity. The honeypot needs a story. Mine is going to be: a small consultancy's mail and web server, run by someone who is competent but not paranoid. The hostname will reflect that. The DNS will be set up consistently. Whois records (insofar as they apply) will tell the story.

What I am deliberately not doing

A few things that the more aggressive honeypot designs do, which I am avoiding for legal and ethical reasons.

No bait services that I expect to be exploited specifically. I am not, for instance, running an open SMTP relay that I expect to be used for spam. The attacker who finds my honeypot might do that anyway, but I am not actively offering the capability.

No tools that the attacker could use against third parties. No outbound mail capability, no scanning tools pre-installed, no botnet kits in /root. The attacker has to bring their own tools, and the firewall makes those harder to use.

No automatic clean-up. Once the honeypot is compromised, I do not have it self-restore. I observe. After enough data is collected, or after the attacker is doing something I am not willing to observe further, I take it offline manually and rebuild.

The legal questions, briefly

I am not a lawyer. The questions I am thinking about, and the rough answers I have settled on:

Is logging the attacker's activities legitimate? On my own machine, broadly yes. There are some jurisdictions where wiretap laws apply to network communications even on your own equipment; I have read the relevant UK guidance and concluded that the activity is on the legal side of the line provided I am not also spying on legitimate users. The honeypot, by design, has no legitimate users.

Am I responsible for what the attacker does from my honeypot? Potentially. The firewall design is the answer here — the attacker cannot do most things from the honeypot because the firewall does not allow it.

Should I notify the operators of compromised hosts that the attacker uses to come at me? In principle yes. I will do this for any case where I have a clear contact. In practice the chain is often opaque.

I have read the Honeynet Project's emerging guidance and consulted a friend who is a lawyer. The conclusion is: be careful, document everything, and stop short of doing anything that would look bad on a court submission.

What I expect to learn

The specific things I am hoping the honeypot produces:

Attacker tools and techniques after the initial compromise.
Realistic dwell times — how long does an attacker stay, what do they look at first, how often do they come back.
The pattern of pivoting attempts and how they manifest at the firewall.
Specific exploit payloads I have not seen before.

The project is going to take months to set up properly. I will write about it as I go. Some of what I write will be about specific captures — sanitised, with identifying details removed. Some will be about the engineering. The total scope is by some distance the most ambitious thing I have built in this discipline.

The honest assessment is: I might not finish it. The complexity is high. The operational discipline is demanding. But the design exercise alone has clarified my thinking about what defenders should be looking for, which is half the value already.