Predictions, scored

I have been making predictions in this notebook for two years. I committed in June 1999 to scoring them honestly. This post is the scoring exercise — going through every prediction I have written down, marking it right, wrong, or unresolved, and reflecting on what the calibration tells me.

The discipline of reviewing predictions, I am increasingly convinced, is more useful than the discipline of making them. Predictions are easy to make and easy to forget. Reviewing them honestly forces engagement with the gap between my expectations and what actually happened.

The 1999 predictions

From my January 1999 predictions post, five specific predictions:

1. "The community ruleset for Snort matures." Right. By end-1999 the ruleset was several hundred rules; by mid-2000 over a thousand. Direction and magnitude both correct.

2. "Honeypots become a category." Mostly right. The Honeynet Project formalisation in mid-2000 was the visible event. The category clearly exists in 2000 in a way it did not in 1999. The timing was a few months off (I had said "this year" meaning 1999); the substance is correct.

3. "Distributed denial-of-service becomes a thing people have heard of." Right. Mafiaboy made it household-news. Under-predicted in scale.

4. "Y2K teaches us something other than what people are saying — security regressions from rushed remediation." Mostly right. The expected wave of remediation-induced advisories is materialising in late 2000 and into 2001 as I had said. Slightly later than I had hoped.

5. "The conversation about disclosure intensifies." Mostly right. The conversation has matured; norms are forming; the consensus is incomplete. Right direction; the speed was slightly slower than I implied.

Net score: 5 of 5 right directionally; 3 of 5 right on timing.

The mid-1999 predictions

From the looking-ahead-to-1999 post in late 1998:

"The Snort wave" — community ruleset matures, performance issues with large rulesets emerge, packet-rate becomes a topic. Right. The 1.7 release I wrote about addressed exactly the performance scaling.

**"Honeypots becoming a category" — already covered above.

**"Y2K" — already covered above.

No new predictions to score from this earlier post.

The 2000 predictions

Mid-year 2000 I wrote some predictions:

1. "DDoS response will be more coordinated than I expected." Right. The major US carriers have visibly tightened peering practices and started enforcing source-address validation. The trajectory is clearly steeper than I had said in February.

2. "Platform change is glacial." Right, depressingly. Microsoft has visibly intended to ship structural fixes; the actual shipped fixes have been minimal. Default Outlook is still vulnerable to similar attacks.

3. "Honeypot data is more interesting than I had hoped." Right. The cumulative analysis confirmed the patterns at a level I had not expected.

From October's IIS post:

"Mass scanning starts within 24 hours." Right.

"Public exploit code by tomorrow." Right (within hours, in fact).

"Mass exploitation within the week." Right.

"A subsequent worm within months." Unresolved at time of writing. Probably right; the worm has not appeared yet but the conditions are clearly present.

From November's BIND 9 post:

"BIND 9 will be less buggy per line of code than BIND 8 but absolute advisory rate may not decrease much in 1-2 years." Unresolved. Will need a year or more to evaluate.

From various posts:

"Microsoft will ship default attachment blocking by 2001." Unresolved at time of writing. No major Outlook update yet.

"WEP attacks reach practical tooling within 12-18 months." Unresolved. The published research is converging; tools are not yet public.

"Several more IIS advisories will follow this pattern." On track; advisory cadence has continued.

What the calibration tells me

Several observations.

My direction calls are reasonably good. Of the predictions where I have something to score against, I have got the direction right in essentially all cases. The categories I identified as emerging (DDoS, mass-mailing worms, honeypots, wireless) have all emerged.

My timing is consistently optimistic. I tend to predict things faster than they actually happen. The Honeynet Project formalisation took 18 months instead of the 12 I had implied; structural fixes from Microsoft are taking 2+ years rather than the 12-18 months I had suggested; wireless attacks are emerging more slowly than I had expected.

My magnitude predictions are typically conservative on the wrong side. The Mafiaboy attacks were larger than I had predicted DDoS would be in 2000. The ILOVEYOU damage was larger than I had predicted for the next mass-mailing worm. The patterns have been more aggressive, not less, than I had expected.

The structural-improvement predictions are right but slow. Things like "BCP 38 deployment will accelerate" or "open-source security tools will mature" — directionally correct but happening more gradually than my writing implied.

What this teaches me about my own forecasting

Three things.

Lengthen the implicit timescales in my predictions. When I write "by year-end" or "in the next few months", I should mean something more like "in the next year or two". The internal time-sense I am applying is too compressed.

Be more deliberate about magnitude. The threat side has been underestimated; the defence side has been overestimated. I should not assume I can extrapolate linearly from current conditions to future conditions; the threat side has acceleration that my models do not capture.

Continue scoring honestly. The scoring discipline is the thing that produces calibration. Without it, the predictions are decorative.

A small experimental discipline for 2001

For the year ahead, I am going to try a slightly more rigorous prediction discipline.

When I write a prediction, I am going to:

  • State it explicitly rather than embedding it in prose.
  • Give it a probability — not a binary yes/no, but a rough sense of how confident I am.
  • Give it a deadline.
  • Score it explicitly at the deadline.

This is more work but should produce a better calibration over time. The probability quantification is the key piece — saying "I think there is a 70% chance of X by date Y" is meaningfully different from saying "I expect X by Y". The accountability is sharper.

I will report on this experiment a year from now. If it produces better calibration, I will continue. If it produces only marginal improvement, the cost is not worth the benefit.

A closing reflection on calibration

The broader lesson, which I keep returning to, is that being right is less important than being honestly wrong. The predictions that mattered most for my own learning were the ones I got wrong — they forced me to update my mental model. The predictions I got right were satisfying but, in some real sense, not as informative.

For a writer who is trying to communicate honestly with readers, publishing predictions and scoring them is a small accountability discipline. It is not flashy. It will not produce excitement. It does, slowly, build trust — readers know the writer is trying to calibrate rather than just to be impressive.

For my own work going forward, I am going to keep doing this. The discipline is sustainable; the cumulative effect over years is a writer who is more trustworthy than they would otherwise be.

More as the year wraps up.


Back to all writing