Mail filtering at the relay, revisited

Two weeks on from ILOVEYOU. The cleanup is finished. The new operational normal is clear: every relay should be doing aggressive filtering of executable attachments, regardless of which platform the recipients run.

I have been refining my own filtering setup over the past fortnight. This post is the architecture I now consider correct, with worked examples from my actual configuration. The general shape will be useful to anyone running a mail relay.

The architecture

The relay acts as a content gateway between the internet's SMTP world and the recipient mailboxes. It does several distinct things:

Standard SMTP receipt — accepts mail from authenticated or trusted sources.
Anti-relay — refuses mail not destined for local users from non-trusted sources.
Spam filtering — rejects or marks bulk unsolicited mail.
Antivirus scanning — examines attachments for known malware.
Extension-based stripping — removes attachments matching dangerous patterns.
Header sanitising — strips or rewrites headers that leak internal information.
Local delivery — drops mail into mailboxes or forwards onwards.

Each is a discrete layer. The discipline is to keep them separate so each can be audited and modified independently.

Layer 1: SMTP receipt with sane defaults

The SMTP daemon for me is Sendmail 8.10 on Slackware 4.0, configured with the m4-based configuration I wrote about previously. The relevant security-related defaults:

FEATURE(`access_db', `hash /etc/mail/access')dnl
FEATURE(`relay_hosts_only')dnl
FEATURE(`blacklist_recipients')dnl
FEATURE(`accept_unresolvable_domains')dnl

The first two enforce relay control as I described before. The third allows me to explicitly blacklist recipients (useful for honey-trap addresses that nobody legitimate would email). The fourth is anti-paranoid — accept mail from senders whose domain does not resolve, because there are real legitimate cases where DNS is briefly unavailable. Reject if you prefer; my own default is permissive at this layer because the later layers will catch problems.

Layer 2: anti-relay

The /etc/mail/access file:

localhost                RELAY
localhost.localdomain    RELAY
127.0.0.1                RELAY
192.0.2.0/24             RELAY

Nothing else. No relays for any external source. This is unchanged from my previous post; worth restating because it is the foundation of every other layer.

Layer 3: spam filtering

For my modest mail volume, the spam filtering is currently:

DNS blacklists: rejecting connections from sources on known spam-source lists. The major lists are MAPS RBL and ORBS. My Sendmail configuration consults both.
Subject-line filtering: a small procmail ruleset rejecting common spam subject patterns. About fifteen patterns, most of which I have not changed in months.
Recipient-address checks: refusing mail to addresses that are clearly harvested rather than legitimate (the well-known mistake of harvesting <> from web pages).

This is light-touch spam filtering by modern standards. My volume is low; the false-positive cost of aggressive filtering exceeds the false-positive benefit. A larger relay would invest more here.

Layer 4: antivirus scanning

This is where the relay does the work that has become non-optional. I run AMaViS — a Perl wrapper that hands every mail through one or more virus scanners before delivery. Currently configured with F-Secure's command-line scanner and McAfee's command-line scanner, used in parallel for redundancy.

The scanner integration is in /etc/procmail.rc:

:0fw: amavis.lock
| /usr/local/sbin/amavis

Every mail passes through amavis, which extracts attachments, scans them, and either passes the mail along or replaces dangerous content with a notification.

The operational discipline: signature updates are pulled from the vendors at minimum every 4 hours. The cost in bandwidth is small; the cost of a stale signature when a new worm appears is large.

Layer 5: extension-based stripping

This is the big change post-ILOVEYOU. AMaViS will catch known malware via signatures. It will not catch zero-day malware that has not yet been signed. Extension-based stripping fills the gap.

My current strip list, which I am willing to defend:

.exe, .com, .bat, .pif, .scr — direct executables.
.vbs, .vbe, .js, .jse — script files.
.wsh, .wsc, .sct, .hta — Windows Scripting Host related.
.cpl, .shs, .shb, .lnk — Windows-specific dangerous formats.
.reg — registry edits.

Also, importantly: any file with a double extension where the second is in the list above. LOVE-LETTER-FOR-YOU.TXT.vbs triggers on the .vbs; report.doc.exe triggers on the .exe.

The stripping is implemented as a small Perl script that walks the MIME structure of incoming mail and replaces any flagged attachment with a text notice:

[ Attachment removed by mail filter: filename=LOVE-LETTER-FOR-YOU.TXT.vbs,
  type=executable, action=removed for safety. If this attachment was
  expected, please request a different delivery method. ]

The notice tells the recipient enough to handle legitimate cases. The originating sender does not know their attachment was removed; they have no way to bypass the filter except by changing the format.

Layer 6: header sanitising

A few headers leak internal information that should not escape the perimeter:

Received: lines from internal hosts revealing internal IP structure.
X-Originating-IP: showing where a message was composed.
X-Mailer: showing the recipient's mail client version (and thus its vulnerabilities).
Message-ID: containing internal hostnames.

For outgoing mail, my relay rewrites or removes these. The replacement is generic — Received: from [internal] rather than the actual hostname; no X-Originating-IP; a generic X-Mailer: filtered.

This is small but worth doing. Each leaked detail is a piece of information for an attacker; removing them at the perimeter is cheap and unobtrusive.

Layer 7: local delivery

Standard procmail with each user's preferences. Not security-relevant per se; just where the mail ends up.

What this looks like to the user

From a user's perspective:

Most mail arrives as it always did.
Mail with malicious attachments arrives with the attachment replaced by a notice.
Mail with executable attachments — even legitimate ones — arrives the same way, with a notice. They have to ask the sender for an alternative format.
Mail from spam sources mostly does not arrive at all; rejected at the SMTP level.

The small inconvenience to legitimate use cases (the rare time an executable attachment was actually wanted) is the price of the substantially reduced exposure to mass-mailing worms. It is a price I am happy to pay on my friends' behalf, and they have not complained.

What I am not yet doing

A few things on my list but not yet implemented:

Bayesian content classification. Several promising approaches to spam filtering use statistical classification of the mail body. The technique requires a training corpus; my volume is too low to train effectively without effort I have not yet put in. I will revisit when I have more spam to learn from.

Header-checking against known spam templates. Specific header sequences are characteristic of bulk mail. Filtering on these is more precise than blacklists but requires more maintenance. Probably worth doing eventually.

Outbound filtering. My relay currently scans inbound traffic. Outbound traffic from internal users is not scanned. If a friend's machine is compromised and starts sending malicious mail, I would catch it only at the recipient's relay. The right architecture is to scan in both directions. I have not done this yet.

A small honest reflection

The filtering setup is not particularly sophisticated. Each layer is an off-the-shelf component with sensible configuration. The discipline is in having the layers, not in any one layer being clever.

This is, I think, the right pattern for most operators. The exotic techniques are interesting; the boring layered architecture is what catches the actual threats. I would rather run an unsophisticated layered system that catches 95% of incoming bad traffic than a clever single-component system that catches 99% but only when its specific assumptions hold.

For any friend who is running a mail relay and asks me where to start: AMaViS plus a strip-list of dangerous extensions is the minimum. Everything else is incremental improvement on top.