Spam, before it becomes the year's headline

Spam volume on the mail relay I run has roughly doubled in the past quarter. By the end of January, spam was approximately 35% of inbound mail volume, against 15% a year ago. The trajectory is clearly steepening.

This is not a new observation. What is new is the pace — spam is becoming the dominant operational mail problem at small relays, not just at the large ones. Worth writing about before it becomes the year's mail-security headline.

What is producing the increase

Three shifts I have observed.

Compromised hosts as relays. A growing fraction of spam is being relayed through compromised hosts. The Lion-style worms and Sub7-style trojans are producing a substrate of compromised machines that can be used as spam relays. The volume each compromised host can send is modest; the aggregate across millions of compromised hosts is enormous.

Open relays still exist. Despite years of public attention, a measurable fraction of mail relays on the internet still allow open relaying. Spammers find them; use them; continue.

Direct-from-attacker delivery. A growing fraction of spam is sent directly from short-lived senders that bypass the relay infrastructure entirely. Source IPs rotate; recipients are individually selected; the spammers' mail-server software handles the SMTP protocol directly. Traditional DNS blacklist defences against open relays are less effective against this pattern.

What is being done

The defensive community has been working on this. A few specific responses:

Expanded blocklists. The original RBL covered open relays; newer lists cover compromised hosts (XBL, the CBL) and broader ranges of suspicious sources. The aggregate blocking is more aggressive than it was a year ago.

Bayesian content classifiers. Several research groups are building statistical content classifiers that learn from a corpus of known-spam and known-good mail. The technique is promising; deployment is starting in the open-source mail world (SpamAssassin has been picking up momentum).

Sender authentication proposals. A couple of proposals — variants of "the sending domain advertises which IPs can legitimately send for it" — are being discussed in the IETF. None has reached deployment-ready state; the conversation is forming.

Industry coordination. Several major mail providers are starting to coordinate on spam handling. Some kind of multi-vendor agreement on filtering thresholds and reporting protocols is forming, slowly.

What I have done on my own infrastructure

A few specific changes over the past month:

Tightened blocklist subscriptions. I now subscribe to three blocklists rather than one. The increased false-positive rate is small; the increased true-positive rate is substantial.

Implemented a reputation-based score. Each incoming message accumulates a score from various indicators (source IP reputation, sender domain age, content patterns, header anomalies). High scores are rejected; medium scores are tagged for the recipient; low scores pass through. The system is rough but is catching roughly 60% of what gets through the blocklist alone.

Started a personal-feedback loop. Friends are forwarding me false-negatives (spam that got through) and false-positives (legitimate mail that was rejected). The volume is modest; each report improves the system slightly.

Watching for SpamAssassin maturation. I am running an experimental SpamAssassin instance against a sample of inbound mail to evaluate its quality before committing to it. Early results are promising; full deployment probably mid-year.

What I expect over the year

Several predictions, with the calibrated discipline:

Spam volume continues to grow. By year-end I expect spam to be 50-60% of inbound mail volume at a typical small relay. Probability: 80%. Deadline: 31 December 2001.

SpamAssassin reaches operational maturity. Becomes the open-source default for spam filtering at the relay level. Probability: 75%. Deadline: 31 December 2001.

Sender-authentication proposals make slow progress. No widely-deployed standard by year-end; meaningful operational discussion. Probability: 70%. Deadline: 31 December 2001.

The first major commercial mail provider implements aggressive spam filtering. Hotmail, Yahoo, or AOL ships substantial filtering improvements that visibly affect their users' inbox experience. Probability: 80%. Deadline: 31 December 2001.

Spam-themed legislative activity in at least one major jurisdiction. UK, EU, US, Australia — at least one will pass or seriously consider legislation specifically addressing unsolicited commercial email. Probability: 60%. Deadline: 31 December 2001.

A small reflection on the categories

Spam is the kind of problem that resists easy solutions because the cost-asymmetry favours the spammer. Sending a million emails costs a few dollars; processing a million emails costs operators substantially more. The economics produce the volume.

The long-term answer is structural — sender authentication, reputation systems, possibly an economic model that imposes small costs on bulk senders. None of these is close to deployment. In the meantime, operators continue to absorb the cost.

For my own work: spam is becoming a larger portion of my mail-server attention. I will be writing about it more this year. The category is now operational, not academic.