Worm propagation modelling, with arithmetic

Two years of mass-mailing worms have produced enough data to fit a simple epidemic model. I have spent a few evenings doing the arithmetic. The model is straightforward; the predictions it makes are sobering enough that I want to write them down.

This post is unusually mathematical for me. Bear with it; the conclusions are operationally useful.

The simplest model: susceptible-infected

The canonical model for an epidemic without recovery (which describes a worm that does not get removed once a host is infected, until manual cleanup) is:

dI/dt = β I (N - I) / N

Where:

I is the number of currently infected hosts at time t
N is the total susceptible population
β is the per-host effective reproduction rate (how many new hosts each infected host successfully infects per unit time)

The solution to this differential equation is the well-known logistic growth curve. Initially exponential. Slows as the susceptible population is depleted. Asymptotes to total infection of the whole susceptible population.

For a worm, the relevant parameters:

N is the global population of vulnerable hosts. For ILOVEYOU, this is roughly the worldwide Outlook install base — call it 100 million.
β is the effective spread rate — depends on how many addresses each infected machine sends to per hour, multiplied by the fraction of recipients who open the attachment, multiplied by the fraction whose configuration allows the attachment to execute.

The model is simple enough to fit by hand to observed propagation curves. The fit is reasonable.

The arithmetic for ILOVEYOU

Let me try the numbers. ILOVEYOU's mechanism: each infected machine sends to its full Outlook contact list. Average contact list size, on the available reporting, is around 50-100 addresses. Let us say 50 to be conservative.

Assume 20% of recipients open the attachment (this is in the published estimates' range; varies by population). Of those, perhaps 80% have a configuration that allows execution.

So each infected machine produces, in roughly the time it takes to process the contact list (minutes to hours), about 50 × 0.20 × 0.80 = 8 new infections. Each new infection produces 8 more.

With β = 8 per hour and N = 100 million susceptible hosts, the early-phase growth doubles roughly every hour. From a single seed infection, by hour 10 you have about a thousand infections; by hour 20, about a million; by hour 30, about a billion — except the susceptible population is exhausted before then.

In practice, ILOVEYOU saturated the susceptible population in approximately 24-48 hours. The model predicts 30-40 hours. Close enough that the model is plausibly capturing the dynamics.

What the model says about the future

The model has free parameters. The interesting question is what happens when those parameters change.

If β doubles — which would happen if a future worm has both email and SMB propagation, like ExploreZip — the doubling time halves. A worm that doubles every 30 minutes saturates the same population in roughly half the time. From single seed to global infection in under 24 hours.

If N doubles — which it will, as Windows installations grow and home connectivity expands — the saturation point grows but the growth rate is unchanged. Same time to peak; more total infections at peak.

If a worm uses active propagation (scanning the internet for vulnerable hosts and exploiting them directly, rather than waiting for users to open attachments), β can be much higher. A worm that scans 100 hosts per second per infected machine, with a 1% successful exploitation rate, has β of one infection per second — three orders of magnitude higher than a mail-borne worm.

This last case is the genuinely concerning one. A worm of this shape doubles every 30 seconds. Saturation of the global vulnerable population in minutes, not hours.

This is exactly the Code Red / Slammer shape that has been theoretically discussed but not yet operationally demonstrated. The math says it is technically possible. I would not be surprised to see a proof of concept within 18 months.

What this implies for defenders

A few uncomfortable observations.

Reactive defence has a horizon. Antivirus signature updates take hours; a worm that saturates in minutes is past peak before signatures arrive. The signature-based defensive architecture, in this scenario, fails. The window during which signatures could matter is too short.

The defence must be in place before the outbreak. Anything that requires response — patching, signature updates, manual investigation — is too slow for the active-propagation case. The structural defences must be in place at deployment time, not added in response.

Patching cadence becomes critical. A worm that exploits a vulnerability with no patch available is unstoppable in the active-propagation case. A worm that exploits a vulnerability with a recent patch is stoppable only by hosts that applied the patch promptly. The window between patch availability and worm release determines what fraction of the population is vulnerable.

The vulnerable-host pool is the variable that matters. Anything that reduces the pool — better patching, default-restricted configurations, network segmentation, the structural changes Microsoft is unlikely to ship — reduces the saturation point of the model. Even modest reductions can change the dynamics significantly.

What I am taking from the modelling

Three practical implications:

Real-time detection is the only useful detection. Anything that depends on retrospective signature updates is too slow. Snort-style anomaly detection is the right architecture for this. Even imperfect anomaly detection is faster than perfect signature-based detection.

Network segmentation is the only structural defence. Once a worm is in your network, it propagates internally faster than externally (no upstream rate-limits, well-known internal hosts, predictable architectures). Segmentation that limits internal lateral movement is one of the few defences that scales to the active-propagation case.

Default restrictions are essential. A host that is fully patched but has every service enabled is more vulnerable than a host that is partially patched but runs only what it needs. Reducing the attack surface — disabling unneeded services, restricting filesystem permissions, applying the least-privilege discipline — is the structural defence that compounds.

A small observation about prediction

I have written this whole post in reasonable confidence. I should note explicitly that the model is a model, not the only one. Real worm propagation has effects the simple SI model does not capture: spatial correlation (some networks are more connected to others), temporal correlation (worms propagate faster during business hours), defender response (some hosts get patched mid-outbreak), and various other refinements.

More sophisticated models exist. They are more accurate. They also have more parameters that have to be estimated, which introduces its own uncertainty.

The simple model has the virtue of being easy to reason about. It captures the essential dynamics. Its predictions are wrong in the details but right in the orders of magnitude. For the operator-level question — how fast should I expect the next worm to spread? — the simple model gives a usable answer.

The answer is: faster than the response time of any defence that requires human intervention. The implication is to not rely on human-intervention defences for this category. The operational discipline follows from there.