SQL Slammer

Yesterday — Saturday 25 January 2003 — a worm called SQL Slammer reached global saturation in approximately 10 minutes. The fastest-spreading worm in internet history, by a substantial margin. The worm-arithmetic models I have been writing about for the past two years have been validated in extreme form, and the operational consequences are still unfolding as I write this on Sunday evening.

This is going to be a longer-than-usual post. The incident is significant enough that the careful walk-through is justified. I want to cover what the worm does, what it teaches at the technical level, what it teaches at the operational level, and what it implies for the next several years of defensive practice.

What Slammer is, mechanically

Slammer exploits a buffer overflow in Microsoft SQL Server 2000 — specifically, in the SQL Resolution Service that listens on UDP port 1434. The vulnerability was disclosed and patched in Microsoft Security Bulletin MS02-039 in July 2002. The patch has been available for six months. The worm exploits the fact that most operators have not applied it.

The entire worm is 376 bytes long. To put that in context, that is shorter than this paragraph in this notebook post. The whole self-propagating mechanism — the exploit, the worker code, the scanning loop, the propagation — fits inside a single UDP packet. There is no TCP handshake required to deliver it. The worm transmits in one packet and propagates instantly.

This matters more than it sounds. The TCP-based worms I have been writing about — Code Red, Code Red II, Nimda — required a three-way handshake (SYN, SYN-ACK, ACK) before they could deliver their payload. The handshake takes a round-trip-time worth of latency, which on the internet is typically 50-200 milliseconds depending on geography. Slammer skips this entirely. A packet leaves the attacker's machine; the same packet arrives at the victim's machine; the victim's UDP processing parses the packet and is exploited; the worm is now running. Sub-millisecond from start to compromise on the victim side.

The scanning is correspondingly aggressive. Each compromised host generates random IP addresses and sends the worm packet to each of them, on UDP port 1434, as fast as the host's network interface and CPU will allow. On modest hardware, this is roughly 26,000 packets per second per compromised host. On faster hardware and faster network connections, the rate is higher.

The propagation arithmetic, in detail

Let me apply the model I have been refining for the worm-propagation maths, with this specific worm's parameters.

The target population is approximately 75,000 internet-reachable instances of SQL Server 2000. This is a small fraction of the overall SQL Server install base — most installations are behind firewalls or otherwise inaccessible from the internet. The 75,000 figure is the exposed population.

Each compromised host scans at 26,000 packets per second. The probability that any given probe hits a vulnerable host is approximately 75,000 / 4 billion ≈ 0.000019. So each compromised host generates roughly 26,000 × 0.000019 ≈ 0.5 successful infections per second.

With β ≈ 0.5 per second per host, the doubling time is approximately ln(2) / 0.5 ≈ 1.4 seconds. From a single seed, ten doublings produces a thousand compromises; twenty produces a million; in this case, saturation of 75,000 hosts is reached after roughly seventeen doublings, which is about 24 seconds of pure exponential growth.

The actual saturation took approximately 10 minutes, not 24 seconds. The reason is that the math above ignores two real-world constraints.

First, the network does not have infinite capacity. Each compromised host generates 26,000 packets per second of outbound traffic. With each packet at roughly 400 bytes (header plus 376-byte payload plus framing overhead), that is 10 megabytes per second of outbound traffic per host. Multiplied by even a few thousand active scanners, the aggregate traffic begins to congest the network. The backbone carriers were observed to drop substantial fractions of UDP traffic during the peak; the dropped packets reduced the effective propagation rate.

Second, the random IP generation in Slammer is genuinely random — no biasing, no avoidance of obviously-non-existent addresses. A substantial fraction of the scan packets go to addresses that are not allocated, that are reserved, or that are routed through black holes. These do not produce infections. The effective scan rate per host is lower than the nominal 26,000 packets per second.

With these corrections applied, the model predicts saturation in roughly 10-15 minutes. The actual data fits this prediction.

The collateral damage

The saturation of the vulnerable population is one part of the story. The collateral damage to the broader internet is a different, larger part.

The scan traffic from compromised hosts saturated network links across multiple geographic regions. The worm itself generates the equivalent of a distributed denial of service attack, targeted at all of the internet simultaneously. The bandwidth consumed by Slammer's scanning was, at peak, comparable to the entire global internet's normal traffic.

Specific consequences observed:

Bank ATM networks. Several US banks reported that their ATM networks were unable to process transactions during the peak. The ATMs themselves were not compromised; the networks they used were carrying so much Slammer traffic that legitimate ATM transactions could not get through.

Airline reservation systems. Continental Airlines reported that their check-in systems were unable to process passengers for several hours; flights were delayed because of the resulting backlog. The reservation systems were not vulnerable; the network connectivity to them was overwhelmed.

911 emergency services. Reports from Seattle and elsewhere indicated that 911 call routing was disrupted by the scan traffic. The emergency dispatch systems were not vulnerable; the network they used was.

Microsoft itself. Microsoft's own internal SQL Server deployments were extensively compromised. The irony was widely noted; the operational consequence was that Microsoft's update servers were partially unavailable during the very window when operators most needed to download patches.

South Korean internet. The internet in South Korea was, on the available reporting, essentially non-functional for several hours due to the worm-traffic load. South Korea has a particularly dense fibre infrastructure; the density meant the load was concentrated.

This is the category change that Slammer introduces. Earlier worms produced disruption proportional to the size of their compromised population. Slammer produces disruption proportional to the size of its compromised population multiplied by the bandwidth-amplifying nature of its scanning. The collateral effect is much larger than the direct effect.

The patch gap, examined

The SQL Server patch was released on 24 July 2002. The worm appeared on 25 January 2003. Six months between patch availability and worm exploitation. By the standards of recent worms, this is actually a long gap — Code Red appeared 25 days after its patch; Nimda was even faster. Slammer's authors waited.

Why did so many SQL Server installations remain unpatched after six months?

A few specific reasons emerge from operator conversations.

First, forgotten installations. SQL Server is sometimes installed as part of other Microsoft products — Visio, the developer tools, certain server applications. Operators install the parent product without realising that SQL Server is also installed and listening. The listening SQL Server is on UDP port 1434, exposed to the network, vulnerable to Slammer. The operator does not know they are running SQL Server, so they do not know to patch it.

Second, home-grown applications with embedded SQL Server. Many small businesses run line-of-business applications that include SQL Server as part of their installation. The application vendor controls the SQL Server version; the customer cannot easily update it without coordinating with the vendor. Many vendors had not certified the patch against their applications; many customers had not been notified that the patch existed.

Third, the patching procedure itself. Microsoft's patch for MS02-039 required a SQL Server instance restart. For production database servers, restarting requires a maintenance window. Many operators had not scheduled the maintenance window because the vulnerability had not been associated with active exploitation. The patch sat in the queue.

Fourth, the routine cost of patching. Each patch requires testing, scheduling, deployment, validation. The cumulative cost is substantial. Operators triage patches; the SQL Server patch was triaged as moderate-priority because no active exploitation was visible; the patch was scheduled but not urgently.

The combination of these factors produced a vulnerable population large enough to support a self-propagating worm. The lesson is structural: the patching system is the bottleneck. Improving any single operator's patching cadence is small; improving the patching system across the operator population is large.

Why ten minutes matters

The ten-minute saturation is the most important single fact about Slammer. It is the parameter that breaks the existing defensive playbook.

Consider the standard incident-response procedure that I and most operators have been working with: detection, escalation, response, containment. Even a well-organised team takes minutes to escalate from "something is wrong" to "this is what is wrong and here is what we will do about it". A reactive defence based on this cycle has a minimum response time of perhaps 5-10 minutes for the most prepared organisations, and 30-60 minutes for typical organisations.

Slammer reaches saturation in 10 minutes. By the time the most prepared response team has identified the situation and decided what to do, the worm has already finished saturating its vulnerable population. The reactive defensive model fails.

The defensive responses that do work against Slammer-class worms are entirely pre-positioned: patches applied before the worm appears, perimeter filtering in place before the worm appears, alerting and monitoring configured before the worm appears. Anything that requires human intervention during the incident is too slow.

This is a structural change. The defensive disciplines that have evolved over the past five years assumed hours-to-days timescales. Slammer compresses everything to minutes. The disciplines need to be re-thought.

For my own infrastructure, the implications:

  • All known-vulnerable services need patches applied within the patch-cycle window of release. Not within a month; within days at the outside.
  • Network filtering at the perimeter needs to default to deny-by-default, with explicit allow rules only for what is actively used.
  • Internal segmentation needs to assume that worms can saturate any vulnerable population in minutes, so internal blast radius depends on segmentation rather than on response time.
  • Out-of-band communication channels need to exist for the case where in-band communication is congested by worm traffic.

For the operators I help, the implications are similar but with the difficulty of bringing them up to my own posture's level. The patch cadence at most small organisations is measured in months; reducing it to days is a substantial cultural change.

What this teaches about the future

The worm's authors made specific design choices. Each was effective. The choices will be reused.

The single-packet UDP design is general. Any vulnerability that can be triggered by a single UDP packet is a candidate for the same approach. SNMP, NTP, DNS, and a long tail of less-prominent UDP services all have potential. I expect future worms in this style.

The random-scanning approach is unsophisticated by current standards. Future worms will use smart scanning — local-bias, internet-cached IPs, target lists — to be more efficient and harder to detect. The combination of single-packet propagation with smart scanning could produce saturation in seconds.

The 376-byte size limit is artificial. Slammer's authors chose it to fit in a single UDP packet. A multi-packet worm could be larger; a worm that fragments across multiple UDP packets could be substantially more sophisticated. The size constraint is not fundamental.

The destructive payload dimension is, fortunately, not present in Slammer. The worm scans aggressively but does not damage the hosts it compromises beyond using their bandwidth. A worm of similar architecture with a destructive payload — wiping disks, encrypting data, disrupting services — would be substantially worse. The author chose not to include such a payload; future authors may choose differently.

The coordinated attack dimension is also not present. Slammer's compromised hosts do not coordinate after compromise. A worm that, having saturated its vulnerable population, then issued coordinated commands to all infected hosts (a DDoS, a coordinated data-exfiltration, a coordinated lateral-movement attack) would be a much larger event. Slammer's authors did not include this; they could have.

The trajectory points toward worms that combine these features. Rapid propagation plus smart scanning plus destructive payload plus post-saturation coordination would produce events that the current defensive infrastructure cannot meaningfully respond to.

What operators should do

For anyone running SQL Server, the immediate steps are:

Apply the MS02-039 patch. If you are reading this and have not patched, you are very probably already compromised. The reboot needed to apply the patch will remove the worm; but reinfection is likely within seconds unless other defences are in place.

Filter UDP port 1434 at the perimeter. Most SQL Server installations do not need to be reachable on this port from the internet. Filtering at the perimeter eliminates the exposure entirely.

Disable the SQL Resolution Service if it is not needed. The Resolution Service is what is exploited; disabling it eliminates the vulnerability without requiring the patch.

Audit for forgotten SQL Server installations. The hosts compromised are often hosts whose operators did not realise they were running SQL Server. A network scan for UDP 1434 listeners reveals what is exposed; the discoveries are often surprising.

Monitor for Slammer-pattern traffic. Even after cleanup, the residual scanners in the wild continue to probe. Snort signatures for the worm pattern are available; deploy them.

For anyone not running SQL Server, the steps are about absorbing the bandwidth impact rather than direct vulnerability:

Verify that your perimeter blocks UDP 1434. Even if you do not run SQL Server, the scanning traffic consumes bandwidth.

Watch for unusual UDP traffic patterns from inside your network. If you have Windows desktops with Office or developer tools, you may have SQL Server instances you do not know about.

Consider whether your network can survive a similar event in the future. Slammer was UDP 1434; the next one may be a different protocol. The capacity-and-segmentation question is general.

What I am doing personally

For my own infrastructure, the response was straightforward. I do not run SQL Server. The collateral effect of the worm scanning was real — my upstream link saw substantial traffic increases — but my own services were unaffected.

For the friends I support, the situation was more involved. Two friends had SQL Server instances they had forgotten about, embedded in installations of other Microsoft products. Both were patched within hours of the worm becoming public. Neither was compromised, as far as we can tell, because the patching beat the local exposure window.

For my structured-log analysis, the Slammer event has produced the largest single dataset I have ever processed. The scan attempts against my range during the peak were so dense that the database backend struggled briefly. The post-event analysis will take weeks to fully process; the patterns will inform my predictions calibration for years.

For my honeypot range, Slammer probes were captured but not at the same dense rate as the broader internet. The honeypot is not running SQL Server; the probes simply hit and were rejected. The rate of probes per second peaked at several hundred at the worm's peak.

A reflection on speed

In 2000 I wrote about the worm-propagation arithmetic, modeling worms with hours-to-days saturation times. In 2001 Code Red saturated in hours. In 2003 Slammer saturates in minutes. Each step in this trajectory has been roughly an order of magnitude faster than the previous.

If the trajectory continues, the next iteration would be saturation in seconds. Whether this is achievable in practice depends on factors I have not yet had time to model carefully — bandwidth availability, the size of the vulnerable population, the per-packet processing latency at compromised hosts. I do not see a structural reason why it could not be reached.

What this means for the defensive picture: the response window continues to shrink. The window for human-in-the-loop response was already too small for Slammer. The window for any response that depends on patches being applied during the incident has been gone since Code Red. The window for response that depends on signatures being distributed during the incident is shrinking toward zero.

The defensive disciplines that survive this trajectory are entirely structural: pre-positioned patches, pre-positioned filters, pre-positioned segmentation, pre-positioned monitoring. The reactive disciplines fade in importance.

This is not a comfortable conclusion. The reactive disciplines are what most security teams are organised around. Re-organising security teams around structural pre-positioning is a substantial cultural change. It will, on the available evidence, take years.

Closing thoughts

Writing this on Sunday evening, with the worm-traffic still elevated above baseline, with operators still working through cleanup. The week ahead is going to involve substantial work for many people; the structural lessons will take much longer to absorb.

For my own writing: the rest of the year is going to involve more discussion of structural defences. The reactive disciplines I have been writing about for five years have not become wrong; they have become necessary but insufficient. The structural ones become the differentiator between organisations that survive future Slammer-class events and those that do not.

I will write more as the analysis develops. The next several weeks of operator discussions will produce data that supplements the technical writeup; the structural conversations will benefit from being held in the immediate aftermath rather than in retrospect.

For anyone running infrastructure that includes Windows servers: this week is the right time to audit. The cost of the audit is small; the benefit, if Slammer-style events become routine, is substantial. I would rather be the operator who audited and found nothing than the operator who did not audit and was eventually surprised.

More as the situation develops.


Back to all writing