Blaster: RPC DCOM worm · Peter Bassill

Blaster appeared on 11 August 2003 and reached operational saturation within days. The worm exploits a buffer overflow in Windows RPC DCOM (port 135), patched in Microsoft Security Bulletin MS03-026 a month earlier. Yet again the gap between patch availability and worm exploitation is shorter than the operator-patching window for typical organisations.

This is going to be a substantial post. Blaster is the first major Windows worm of 2003 — Slammer in January was a SQL-Server worm rather than a general Windows one — and the structural lessons it produces are worth careful treatment. The worm itself is straightforward in design; the operational consequences are large.

What Blaster does

The technical mechanism is well-understood by now. Blaster exploits a buffer overflow in the RPC DCOM interface that Windows uses for inter-process communication, including across the network on TCP port 135. The vulnerability allows a remote attacker to execute arbitrary code by sending a malformed RPC request. There is no authentication required; any host that can reach port 135 on a vulnerable machine can compromise it.

Once a host is compromised, Blaster:

Installs itself. The worm copies itself to the Windows directory, modifies the registry to launch on boot, and starts running in the background. Cleanup requires more than a reboot.

Begins scanning. Each compromised host scans random IP addresses, looking for other vulnerable hosts on port 135. The scan rate is moderate by Slammer standards — perhaps 50 attempts per second per host — but the cumulative volume across the compromised population is substantial.

Schedules a DDoS attack. Blaster includes a DDoS payload aimed at windowsupdate.com, scheduled to begin on 16 August. The intent is presumably to disrupt Microsoft's ability to distribute the patch.

Displays a message. The worm includes embedded text that reads, roughly: "I just want to say LOVE YOU SAN!! billy gates why do you make this possible? Stop making money and fix your software!!" The message is occasionally produced by the host's system tray, depending on the variant.

The combination is operationally unpleasant. Compromised hosts experience repeated reboots (the RPC service crashes when the exploit is delivered, taking down the operating system with it) and produce substantial scan traffic that disrupts the local network.

The MS03-026 advisory

The RPC DCOM vulnerability was disclosed by Microsoft on 16 July 2003 in security bulletin MS03-026. The disclosure included the patch.

The advisory was clear about the severity. The bulletin explicitly noted that the vulnerability was remotely exploitable, did not require authentication, affected essentially every supported Windows version, and could result in system compromise. The recommended action was to apply the patch immediately.

The gap between disclosure and worm appearance was 26 days. For comparison: Code Red appeared 25 days after its patch in 2001; Slammer appeared 6 months after its patch but was an unusual case. The 26-day pattern is, on the available data, becoming the typical timeline for major-vulnerability-to-worm.

Why operators were unpatched

The pattern of operator response has been informative.

Several specific factors produced the unpatched population.

The patch is large. MS03-026 is approximately 1.5MB. For dial-up users this is a substantial download. For organisations distributing it across thousands of desktops, the bandwidth and rollout time are non-trivial.

Reboot required. The patch requires a system reboot to take effect. For desktop machines this is a small inconvenience; for servers it is a maintenance window. Many servers had been queued for the next maintenance window rather than being rebooted immediately.

Competing patches. Microsoft's July patch cycle included multiple security bulletins. Operators triaging across all of them sometimes deployed others first; MS03-026 was queued behind other patches that were perceived as more urgent.

Internal-network exposure. Many operators reasoned that port 135 was only exposed internally, not to the internet. This is partly true — many networks do filter port 135 at the perimeter. The internal exposure is, however, exactly what allowed Blaster's lateral spread once any single host on the internal network was compromised.

The default configuration. Windows desktops ship with port 135 listening by default. Blaster's vulnerable population includes essentially every default Windows installation that did not specifically disable RPC. The default configuration is the dominant configuration; the dominant configuration is vulnerable.

The combination produced a vulnerable population large enough to support sustained worm propagation. Estimates suggest several hundred thousand hosts compromised within the first week.

The DDoS phase

On 16 August, the scheduled DDoS against windowsupdate.com began. By that point Microsoft had moved the IP address that windowsupdate.com pointed to, so the worm's hardcoded target was a stale IP. The DDoS continued — compromised hosts continued to send HTTP requests to the now-unused IP — but Microsoft's actual update infrastructure was largely unaffected.

This is a competent operational response. The worm cannot adapt — its hardcoded IP is in the binary; the new windowsupdate.com IP is not the target. The mitigation defeated the worm's DDoS phase entirely.

The lesson is that hardcoded targets are operationally fragile. Worm authors who wish to defeat this kind of mitigation must use dynamic targeting (resolving DNS at runtime, with the resolution happening on the compromised host rather than at compile time). Code Red I had the same issue; the response to Code Red taught the same lesson; subsequent worm authors continued to make the same mistake.

It is a small structural advantage to defenders that worm authors do not always optimise their tools effectively. The advantage will erode over time as the discipline of worm authorship matures.

Lateral spread inside organisations

The most operationally severe aspect of Blaster, in my conversations with operators, has been the lateral spread inside organisations.

A typical pattern: a single laptop carries the infection through the firewall. The laptop user connects to the corporate network from home (or from a hotel, or from any other untrusted environment). At some point the laptop is compromised by Blaster. The laptop is then connected to the corporate network. Inside the network, port 135 is accessible to many hosts; Blaster spreads aggressively.

Within hours, the entire corporate desktop estate can be compromised. The worm cycle of compromise-and-reboot disrupts the desktops; the reboots cascade through the network as RPC services crash.

Several organisations I have spoken with experienced multi-day disruption to their desktop estates. The cleanup is substantial — every compromised desktop needs to be patched and verified clean — and the cleanup period is extended by the worm's continued attempts to reinfect during the cleanup.

The defensive lesson is structural: internal network segmentation matters as much as perimeter filtering. Hosts on a flat internal network are exposed to whatever any single internal host is exposed to. The compromise of one host becomes the compromise of all.

For my own infrastructure: I have been making this argument for years. Blaster is the strongest single piece of evidence I have ever had to support it.

What about Welchia

A week after Blaster, on 18 August, Welchia (also called Nachi) appeared. It exploits the same RPC DCOM vulnerability as Blaster but with a different intent — it removes Blaster from infected hosts and applies the MS03-026 patch.

The author appears to have intended Welchia as benevolent. It is not. The worm has produced substantial collateral damage:

Bandwidth consumption. Welchia scans aggressively, like Blaster, contributing to the network congestion.

Patch-installation conflicts. Welchia attempts to install MS03-026 automatically. On many systems the automatic installation conflicts with operator-managed patching processes; the result has been corrupted patch state on substantial numbers of hosts.

Incidental damage. Some organisations have reported that Welchia's installation process produced unintended consequences — disabled antivirus, broken applications, registry corruption — on a fraction of compromised hosts.

The cleanup is the same as Blaster. Even though Welchia's intent was benevolent, the cleanup is no easier than Blaster's. Both worms need to be fully removed and the underlying vulnerability patched.

The lesson is general: good worms are still worms. The unintended consequences of automated mass intervention are real. An operator's network is a complex system; introducing automated changes from outside, even with good intent, produces unintended cascades.

For my own writing: I have noted this lesson before in different contexts; Welchia is the largest-scale demonstration to date.

Sobig.F arrives

On 19 August, the day after Welchia, Sobig.F appeared. This is a mass-mailing worm in the MyDoom-precursor lineage, and it broke the volume records that had been held by previous mass-mailers.

At peak, Sobig.F was generating an estimated one in seventeen email messages globally. The volume was sustained for several days. Mail relays across the internet experienced substantial pressure; many were briefly unable to keep up with inbound volume.

The combination of Blaster, Welchia, and Sobig.F in two weeks has produced the busiest worm period since the Code Red/Nimda window in 2001. The cumulative operational cost has been substantial.

Three worms, two weeks

The three-worm sequence in two weeks is itself structurally significant. Each major worm individually produced operator response; the cumulative effect across three has been to consume operator attention to the point where some organisations have not been able to respond effectively.

The pattern: defenders are now being targeted by waves of incidents rather than individual incidents. The defensive infrastructure that handled isolated incidents in 2001 is being stress-tested by the cluster pattern in 2003.

For operators with mature, automated defensive infrastructure, the cluster has been manageable. For operators relying on manual response, the cluster has overwhelmed available capacity.

This is a capacity problem, not a capability problem. Each organisation has the technical knowledge to respond to each individual worm. What they lack is the capacity to respond to three worms in two weeks while continuing normal operations. The structural answer is to invest in automation that handles routine response without consuming human attention.

What operators should do

For anyone running Windows, the immediate response is clear:

Apply MS03-026 immediately. The patch is widely available; the cost of applying it is small; the cost of not applying it is severe.

Apply current antivirus signatures. Multiple AV vendors have shipped Blaster, Welchia, and Sobig.F signatures within hours. Update mechanisms must be aggressive.

Block port 135 at network perimeters. No legitimate service uses this port across an internet boundary. Filtering eliminates the external exposure entirely.

Block port 135 on internal segments where possible. The lateral-spread problem requires internal segmentation. A compromised laptop should not be able to reach port 135 on the file server.

Apply current Outlook security updates and standard mail filtering for Sobig.F. The advice for mass-mailing worms is unchanged.

For cleanup of compromised hosts:

Remove Blaster, then patch, then verify. The standard sequence: kill the running worm process, remove the registry persistence, apply the patch, reboot, verify the host is no longer susceptible.

Inspect for Welchia residue. Hosts where Welchia ran may have inconsistent patch state, disabled antivirus, or other artefacts. The inspection is more involved than Blaster cleanup alone.

Reinstall when in doubt. For hosts that have been compromised by both Blaster and other malware, the residual state is hard to verify. Reinstall from clean media is the safer default.

What this teaches

Four generalisations from the past two weeks.

The patch-to-worm window is now reliably under a month. Operators must patch within days of advisory release; the alternative is to be exposed during the worm window. The patching cadence at most organisations is too slow.

Internal lateral spread is a major attack surface. Blaster's spread inside organisations was its largest operational impact. Internal segmentation is now operationally non-optional for any organisation of size.

Cluster events are the new pattern. Worms come in waves rather than singly. The defensive capacity must be sized for sustained pressure, not for isolated incidents.

Good worms are still worms. Welchia's intent was benevolent; Welchia's effect was disruptive. Automated mass intervention from outside an operator's control is, regardless of intent, an attack pattern.

What I have done

For my own infrastructure, the impact has been minimal. I do not run Windows desktops on internet-facing IPs; my home network has port 135 filtered at the firewall; my friends running Windows desktops were patched in the days immediately following MS03-026, before Blaster appeared.

For friends and small organisations who called this past fortnight, the work has been substantial. Three different organisations needed help with cleanup; one of them had over 100 compromised desktops; the cleanup took most of a week of evenings.

For my Snort sensor, the alert volume during the past fortnight has been the highest I have ever observed. The peak was during the Sobig.F window when mass-mailing pattern matches were firing roughly 100 per minute.

For my structured-log analysis, the dataset is now substantial. The post-event analysis will continue for weeks; the patterns will inform my predictions calibration and my future writing.

A reflection on the pattern

Writing this on Tuesday evening, with the Sobig.F volume finally subsiding, with operators across the field still working through cleanup. The fortnight is, by some distance, the most operationally significant single period since Code Red and Nimda in 2001.

The specific worms will be defeated by the standard combination of patches, signatures, and cleanup. The structural lessons — the patch-cadence problem, the internal-spread problem, the cluster-incident problem — are with us for years.

For my own writing: more of my next several posts will deal with the structural responses. The Blaster/Welchia/Sobig.F pattern has provided enough material that the writing will be substantive for some time.

The field will absorb this pattern over time. The defensive infrastructure will improve. Future similar patterns will, on the available trajectory, be handled more cleanly than this one. The investment in structural defence is the long-term answer; the short-term reality is that operators are exhausted and the next pattern will probably arrive before this one is fully cleaned up.

More as the situation develops.

A final operational note

For anyone reading this in the immediate aftermath: take care of yourselves. The past fortnight has been hard work for many people; the next several weeks will be harder still as the cleanup continues. The work matters; the people doing the work matter more. The notebook will continue; the patches will continue; the worms will continue. None of this is more important than your sustained capacity to do the work over years.

If you are a small-organisation operator who is over your head with this incident: ask for help. The community is, in my experience, generous with technical assistance during incidents. Reaching out is better than struggling alone.

If you are a senior operator helping smaller organisations: the help is appreciated more than you may realise. The cumulative effect of experienced operators helping less-experienced ones during incidents is large.

A note on what comes after Blaster

The Blaster/Welchia/Sobig.F sequence has revealed something important about the defensive infrastructure across the operator population. The differentiation between organisations with mature, automated response capabilities and those without has never been starker.

Mature organisations — those that had invested in structured logging, defence in depth, and rapid patching processes — handled the cluster with bounded operational pain. The disciplines were in place; the disciplines worked.

Less mature organisations are still in cleanup three weeks later. Their patching processes were not designed for the sustained pressure; their signature distribution was not aggressive enough; their incident-response procedures had not anticipated cluster events.

The structural shift implied by this differentiation is going to play out over the next several years. Some organisations will mature; some will not; the ones that do not will increasingly find themselves uninsurable, unattractive to customers, and economically pressured. The market will, eventually, reward defensive maturity in ways it has not historically. The mechanism is slow but the trajectory is clear.

For my own writing, this is an arc I expect to return to repeatedly through the rest of 2003 and into 2004. The defensive maturity question is becoming the central operational question of the field; the answers, when they arrive, will reshape how I think about the work.

More in time. The next post will probably be about the Welchia analysis specifically, which deserves its own treatment beyond what I have written here.