Cumulative honeypot analysis: six months of captures

Six months of running honeypot v2 has produced about 200 logged compromise events. I committed in the midyear reflection to writing a structured analysis. This is that analysis — patterns of attacker behaviour, ranked by frequency, with the defensive implication for each.

This is a longer post than my usual. Bear with it; the cumulative picture is more useful than the individual captures I have been writing about.

The dataset

200 compromise events, observed between January and July 2000. Each event is one or more sessions where an attacker had a shell on the honeypot for non-zero time. Source addresses are diverse — at least 90% of source IPs appear only once in the dataset.

The events are categorised by what the attacker did during their session(s). Several attackers had multiple session types and are counted in each.

Pattern 1: enumeration and exit (frequency: ~60%)

The most common session shape. Attacker gains shell, runs a standard sequence of read-only commands (w, last, ps, netstat, cat /etc/passwd, ls /home, cat /etc/issue), satisfies themselves about something, and logs out. Total session time: typically under 5 minutes.

These are almost certainly triage events — the attacker is qualifying the host as worth further effort. Most of them never come back; the host has been added to a list and either rejected or saved for later.

The content of the enumeration is informative about what the attacker cares about. The most common command sequences imply:

Confirming root: id, whoami.
Checking who else is logged in: w, last.
Surveying users: cat /etc/passwd, ls /home.
Looking for valuable accounts: specific patterns like cat /home/admin/*.
Checking system age: uname -a, cat /proc/version.
Looking for monitoring: cat /etc/syslog.conf, ls /etc/init.d/.

A triage profile of a useful host emerges: long-running, well-populated with users, has admin accounts, no obvious monitoring infrastructure visible. Hosts that look freshly installed are typically triaged as not-useful.

Defensive implication: a host that looks fresh (no users, recent install date, minimal log activity) is, by attacker triage logic, a less attractive target. Hardening through making your hosts look more boring — limiting visible signs of operational activity — is a small but real defence against the triage filter.

Pattern 2: tool deployment attempt (frequency: ~25%)

Attacker enumerates briefly, then attempts to fetch tools from a remote server. The fetch is blocked by my outbound filtering. The attacker tries several alternative fetch methods (wget, curl, ftp, nc, bash /dev/tcp/); all are blocked. After a few minutes, the attacker logs out.

Several observations from this pattern:

The first fetch attempt is almost always wget. A surprising fraction (~70%) of attackers reach for wget reflexively.
The second attempt is typically curl, then ftp.
Few attackers (~5%) attempt non-HTTP fetch via custom protocols.
Almost none (~1%) attempt to write code in-place — typing or pasting the tool source rather than fetching it.

This tells me that outbound HTTP is the choke point that most attackers depend on. Restricting outbound HTTP from compromised hosts breaks ~70% of post-compromise attack flows.

Defensive implication: outbound traffic restrictions on a compromised internal host are dramatically more effective than inbound restrictions. Almost every attacker assumes they will be able to fetch their tools from the open internet. Removing that assumption disrupts the standard playbook.

Pattern 3: spam-relay attempt (frequency: ~7%)

Attacker installs SMTP-relay infrastructure on the host with the intent of using it for outbound mail. The pattern matches what I described in capture C of the second-month writeup.

Specific tools observed: most often a custom small SMTP daemon written in Perl or C, sometimes packaged variants of sendmail configured for open relaying. Always followed by an attempt to send a test mail through the relay.

The outbound mail is blocked at my firewall. The attacker spends some time troubleshooting before logging out.

Defensive implication: spam-relay setups are economically motivated and indicate the rise of cybercrime as commercial activity. The defensive measure that matters is outbound port-25 filtering on internal hosts. ISPs that filter outbound port 25 from customer lines (other than to the ISP's own mail server) substantially reduce the spam-relay pool.

Pattern 4: persistent backdoor installation (frequency: ~5%)

Attacker enumerates carefully, deploys a persistence mechanism, and exits. The persistence mechanism is typically:

A modified service binary with a backdoor.
A new cron entry that re-establishes access on a schedule.
A modification to existing scripts to invoke the attacker's code.
Occasionally a kernel rootkit, as in capture A from May.

The deployment is usually quick (a few minutes) but the preparation is not. The careful attackers in this category have done more enumeration than the average and have specifically chosen modifications that fit the host's existing patterns.

Defensive implication: persistence mechanisms are detectable from outside the host (file integrity monitoring, log analysis at the off-host level, network traffic analysis of unusual outbound patterns). They are not detectable from the host itself once the attacker has root, especially if a kernel rootkit is involved. The off-host monitoring discipline is the only reliable way to find them.

Pattern 5: data exfiltration attempt (frequency: ~3%)

Attacker enumerates broadly, collects accessible data into an archive, attempts to exfiltrate the archive. Outbound is blocked; the attempt fails.

The data targeted is typically: /etc/passwd and /etc/shadow; user home directories; mail spools; any database files on the host; specific configuration files (Apache, sendmail) that might leak credentials.

Defensive implication: data on compromised hosts is what attackers want. Reducing the data on each host (segmenting databases, encrypting at rest, minimising on-host caches) is a long-running discipline. The combination of access restriction (so attackers cannot read the data) and outbound restriction (so even if they read it, they cannot exfiltrate) is the right architecture.

Patterns I have not observed

A few things I expected to see and have not, which is itself informative.

Wormable propagation attempts. I have not seen any attacker try to use the compromised host to scan for and exploit further hosts on the network. This may be because my honeypot is on an isolated network with no obvious lateral targets visible. Or it may be that worm-style automation is mostly fully automated (no shell access, just exploit-and-replicate) so the captures I see — which are by definition shell sessions — would not include them.

Sophisticated kernel exploitation. I have not observed attackers attempting kernel-level privilege escalation. This may be because they generally arrive with root already, having exploited a privileged service. Or it may be that they did not need additional kernel access to do what they wanted.

Defacement or destruction. No attacker has tried to deface a public-facing service or destroy data. This is consistent with the commercial motivation of most attackers — they want quietly-controlled hosts, not loudly-broken ones.

Observations about timing

A short list of timing patterns:

Most compromises happen during European/American business hours. Specifically 14:00-22:00 UTC. The honeypot sees less activity during nights (from any single time zone's perspective).

The interval between scan and compromise is short. Median: 4 hours from first scan to first compromise attempt. This suggests automation — the scan results are being rapidly fed into exploitation pipelines.

Returning attackers come back at consistent times. The attackers I have seen return for second sessions tend to do so at similar times of day to their first session. This might be a person with a regular schedule, or might be automation running on a schedule.

Sessions are typically short. Median session length: 3 minutes. The interactive attacker who spends an hour on a host is rare. Most sessions are surgical — get in, do one specific thing, get out.

What I am taking from this

Three summary observations.

The threat actor population is dominated by automation and triage. The vast majority of compromise events are filtered through automated tools that do basic enumeration and discard hosts that do not pass triage. The careful, persistent, individually-skilled attacker is rare in this population.

Outbound network restrictions are the highest-leverage defensive measure. Almost every attack pattern I have observed depends on outbound network access to either fetch tools or exfiltrate data. Restricting outbound traffic disrupts most attacks at their second step.

Off-host monitoring is the only reliable defence against the careful attacker. The 5% of attackers who deploy persistence mechanisms can hide from any on-host inspection. The defence is to monitor from outside the host: firewall logs, structured logs forwarded off-host, periodic file integrity checks from a separate machine.

These are not new observations. The cumulative data confirms what individual captures had suggested. Confirmation matters; my calibrated humility discipline requires that I state when the evidence is consistent rather than always promoting new findings.

This writeup will be the basis for my first contribution to the Honeynet Project's emerging research output. The sanitisation and review work for that submission will take a few weeks. Expect a follow-up post once it is published in their channels.

More as the second half of the year develops.