Honeypot Q3 patterns · Peter Bassill

Q3 2001 was, by some distance, the busiest quarter the honeypot range has ever seen. A summary of the patterns is worth writing while the data is fresh.

The volume

In rough numbers, comparing Q3 2001 with Q2 2001:

Total inbound connection attempts: up roughly 8x (from ~50,000/day to ~400,000/day at peak).
Distinct source IPs per day: up roughly 5x (from ~3,000 to ~15,000 at peak).
Compromise attempts (full exploit-and-payload): up roughly 20x.
Sebek captures of human-attacker sessions: up roughly 3x (from a few per quarter to a few per week).

The scale is dominated by Code Red and Code Red II automation. Even after the worms reached saturation, the residual scanning continues at substantially-elevated levels.

The mix of attack types

The distribution of activity across categories has shifted:

HTTP-targeted exploits: 65% of compromise attempts (up from 30% in Q2). Mostly Code Red and Nimda variants.
Outlook MIME exploitation: 8% (from <1% in Q2). Mostly Klez and earlier mass-mailers.
NetBIOS/SMB scanning: 12% (similar to Q2). The pre-existing pattern continues.
SSH brute-force: 6% (similar to Q2).
Other (DNS, FTP, RPC, miscellaneous): 9% (down from 18% in Q2 in proportional terms; absolute volume is similar).

The HTTP shift is dramatic. Code Red and its successors have made HTTP probing the dominant attack pattern.

The Sebek captures

The high-interaction honeypot caught 23 sessions of human-attacker activity in Q3. Of these:

14 were enumerate-and-leave — attackers who confirmed shell access, did some enumeration, then logged out without doing harm.
5 attempted persistent-backdoor installation. Most were script-driven; one (described in the careful-attacker post) was hand-crafted.
3 attempted to use the host as a stepping stone for further attacks. All were defeated by the outbound filtering.
1 was an unusual capture I am still analysing — an attacker who spent two hours reading documentation files on the host before doing anything else. The behaviour is unfamiliar to me.

The overall pattern continues: a small fraction of attackers are skilled and patient; the majority are running scripts; outbound filtering disrupts most attacks regardless of attacker skill.

Specific observations from the Code Red period

Three things from the data.

The peak scan rate was dominated by a small number of source ranges. About 60% of the Code Red scan traffic against my range came from approximately 5,000 distinct sources. The Pareto distribution suggests these are the most-effectively-compromised hosts on the internet — those with the most bandwidth and the longest uninterrupted infection.

Manual investigation of compromise was rare during the worm peak. Almost all activity was automated. The careful human attackers that produce my Sebek captures were largely absent during late July and August. I suspect they were busy with other work; the noise of the worm was making careful operations harder to do.

Code Red and Code Red II co-existed in the data. Both worms were active simultaneously for over a month. Distinguishing them at the wire level requires careful attention to specific request patterns. My structured-log analysis supports the distinction now; earlier in the quarter it did not.

What this teaches

Three generalisations.

The threat landscape is now strongly modal. A few specific attack categories dominate the volume; everything else is rare. Defenders who tune their detection to the dominant categories catch most of the activity.

The careful-attacker population is small but enduring. They were quieter during the worm peak but did not disappear. The defensive disciplines that catch them (off-host logging, behavioural pattern detection) remain important even when the volume of automated traffic is high.

Outbound filtering remains the highest-leverage defensive measure. Across all 23 attempted human-attacker sessions, outbound filtering was the choke point. Without exceptions, the attackers who found their tools blocked gave up and left.

What I am doing with the data

For my own writing: the Q3 captures provide material for several future posts. Specific captures (sanitised) will appear over the next few months.

For the Honeynet Project: the cumulative data set is now substantial. I am preparing a contribution to the cross-operator analysis paper currently being assembled. The sanitisation work for that contribution is the bulk of the remaining work.

For my calibration discipline: the Q3 data confirms several earlier predictions and updates a few probabilities. I will incorporate these into the year-end review.

More as the year wraps up.