Predictions and year-end 2005

Combined post this year. The end-of-year period is bunched and I want to do the predictions scoring and year-ahead predictions work together.

This is going to be a longer post than recent end-of-year ones. The cumulative discipline of writing predictions and scoring them deserves careful treatment.

2005 predictions scoring

From various posts through 2004 and 2005, the explicit predictions I made for 2005:

1. Continue weekly cadence. 95% probability. Resolved AFFIRMATIVE.

2. At least four conferences. 75% probability. Resolved AFFIRMATIVE (attended five).

3. Speak at one conference. 70% probability. Resolved AFFIRMATIVE.

4. More non-technical writing. 65% probability. Resolved PARTIAL.

5. Honeypot expansion to /27. 60% probability. Resolved AFFIRMATIVE.

6. Continued mass-mailing. 95%. Resolved AFFIRMATIVE.

7. Mobile-platform attacks at meaningful scale. 45%. Resolved PARTIAL (incidents observed but not at meaningful scale).

8. P2P-architecture worms beyond Slapper. 50%. Resolved PARTIAL (some development but not major incident).

9. DDoS-for-hire becomes mainstream concern. 55%. Resolved AFFIRMATIVE.

10. Continued Microsoft progress. 85%. Resolved AFFIRMATIVE.

11. Major UK data breach. 60%. Resolved AFFIRMATIVE (multiple incidents).

12. Phishing reaches operational maturity. 85%. Resolved AFFIRMATIVE.

Calibration assessment

My 2005 predictions were largely correct. Direction calls were uniformly right; magnitude calls were mostly right; timing was approximately right.

The under-confident predictions (mobile attacks at 45%, P2P worms at 50%) were the ones where the resolution was partial — predictions in the 50% range that resolve partial-affirmative are essentially correct, in calibration terms.

The over-confident predictions were limited. The non-technical writing prediction at 65% resolved partial; this is essentially the right calibration.

The affirmative-resolved predictions clustered at 70%+ probability, which is the right pattern.

For my own calibration discipline: the 2005 score is good. The cumulative pattern across multiple years suggests I am improving.

2006 predictions

For the year ahead, with explicit probabilities and deadlines:

Threat-side

1. At least one major worm event of comparable scale to Sasser or larger. Probability: 65%. Deadline: 31 December 2006. The trajectory has been quieter recently; a major event is overdue.

2. Mass-mailing worms continue at sustained volume. Probability: 95%. Deadline: 31 December 2006. No structural change visible that would reduce volume.

3. A specific public phishing incident with substantial UK retail-banking impact. Probability: 75%. Deadline: 31 December 2006. The trajectory points toward this; specific incidents continue to grow.

4. DDoS-for-hire used in a high-profile public extortion case. Probability: 70%. Deadline: 31 December 2006. The category exists; specific extortion cases will become public.

5. A meaningful mobile-platform malware incident. Probability: 60%. Deadline: 31 December 2006. The category has been forming for two years; an operational incident is overdue.

6. A worm specifically targeting routers or other embedded devices. Probability: 40%. Deadline: 31 December 2006. The category is emerging in research; operational deployment is uncertain.

Defensive-side

7. Two-factor authentication ships at a major UK retail bank. Probability: 75%. Deadline: 31 December 2006. The pilots are mature; mainstream deployment is overdue.

8. Continued Microsoft Trustworthy Computing progress. Probability: 90%. Deadline: 31 December 2006. The trajectory is established.

9. The Honeynet Project produces another major paper. Probability: 80%. Deadline: 31 December 2006. The cadence is established.

10. Linux 2.6 patch cadence stabilises. Probability: 80%. Deadline: 31 December 2006. The development pace will probably slow as the kernel matures.

Structural

11. The Sony BMG aftermath produces specific regulatory or legal precedent. Probability: 70%. Deadline: 31 December 2006. The legal cases are progressing; specific outcomes will emerge.

12. Spam volume continues growing. Probability: 95%. Deadline: 31 December 2006. No structural change visible.

13. Phishing scale continues growing. Probability: 90%. Deadline: 31 December 2006. No structural change visible.

14. A specific incident produces public conversation about the commercial-software-as-malware boundary. Probability: 65%. Deadline: 31 December 2006. The Sony precedent will produce follow-on incidents.

Personal

15. Continue the consulting engagement at the Royal Botanic Garden. Probability: 85%. Deadline: 30 June 2006.

16. Possibly take on additional consulting engagements. Probability: 60%. Deadline: 31 December 2006.

17. Continue weekly cadence on the notebook. Probability: 95%. Deadline: 31 December 2006.

18. Attend at least four conferences. Probability: 80%. Deadline: 31 December 2006.

19. Speak at at least one conference. Probability: 70%. Deadline: 31 December 2006.

20. Write more genuinely difficult pieces. Probability: 55%. Deadline: 31 December 2006. I keep promising this; I keep not delivering at the level I had hoped.

A meta-prediction

21. By end of 2006, I will have eight full years of prediction-scoring data. This will be enough to do meaningful meta-analysis on my own forecasting accuracy. Probability: 95% (this resolves trivially if I keep the discipline).

The meta-analysis will be the year-end-2006 post: a structured assessment of which kinds of predictions I am good at, which I am bad at, and what the patterns suggest about how to improve.

Year-end reflection

Eight years now. The cumulative archive is substantial. The discipline is firmly established. The community continues to be valuable.

The career transition to consulting represents a meaningful shift in how I engage with the field; the notebook will document the consequences as they emerge.

Thank you for reading. The conversations and corrections through 2005 have, again, been the most rewarding aspect of the work.

More in 2006. See everyone in the new year.

A small note on what this notebook has become

When I started in 1998 I described the notebook as a discipline for forcing myself to finish thoughts. Eight years later, that purpose continues to be served. The discipline is now habit.

The additional purposes the notebook has acquired — building community, contributing to public discussion, providing reference material for my own thinking — are emergent. None was planned; all are valuable.

For anyone considering starting a similar discipline: the value compounds in ways that are not obvious at the start. The first year produces some immediate benefit; the cumulative value over many years is substantial in ways that are hard to predict in advance.

The specific cadence matters less than the consistency. Weekly works for me; other operators might find biweekly or monthly more sustainable. The discipline is the thing.

Closing thoughts

The field continues to be operationally interesting. The threat side continues to mature; the defensive side continues to improve; the structural conversations continue to develop. The work matters.

For anyone in the field reading this: take care of yourselves through the year-end period. The work will continue in 2006; the structural improvements will continue; the cumulative trajectory remains positive even when individual incidents are difficult.

The notebook will continue. The work will continue. The community will continue.

New year, new notebook, on the standard cadence. Happy 2006 to everyone reading.

A more careful prediction process for 2006

Let me extend this combined predictions-and-year-end post with deeper treatment of the prediction process and its trajectory.

What the cumulative prediction record shows

I have been making explicit predictions with probabilities since the 2002 list. Four full years of prediction-and-scoring data now. The cumulative pattern shows several specific things about my forecasting:

Direction calls are reliably good. Across all years, my direction calls (this category will grow / shrink / stay flat) have been right roughly 85% of the time. The rate has been stable across years.

Magnitude calls are reasonable. When I predict that something will happen at a specific scale, I am right within an order of magnitude roughly 70% of the time. This is acceptable; better calibration would require finer-grained prediction tracking.

Timing calls are systematically optimistic. I consistently predict things will happen sooner than they do. Across multiple years, my central-estimate timing has been roughly 6-12 months too early on average.

Threat-side predictions are slightly over-confident. I have been compensating for this in recent years; the compensation is producing better calibration.

Defensive-side predictions are approximately calibrated. No systematic bias visible.

Personal predictions are slightly over-confident. I tend to commit to more than I deliver, particularly on "genuinely difficult writing" predictions.

What this calibration data is for

The cumulative scoring is not just for entertainment. It serves specific purposes:

Self-knowledge. Knowing where my own forecasting is reliable and where it is not informs how confidently I should hold specific predictions. The calibration data is useful in real decisions.

Reader trust. Publishing predictions and scoring them honestly produces, over time, a level of reader trust that confident-but-uncalibrated writing does not. The cumulative trust is more valuable than any individual correct prediction.

Field calibration. The security field generally suffers from over-confident predictions. Operators who explicitly track their forecasts and update on the data produce better outcomes than operators who rely on intuition. The discipline is undervalued; the data justifies the discipline.

How I will refine the discipline for 2006

Three refinements I am committing to:

More explicit tracking of timing predictions. I will track not just whether a prediction was right but how the actual timing compared to my central estimate. The systematic optimism should produce a noticeable bias in this tracking.

Explicit dependent predictions. Some of my predictions depend on others; the dependency structure should be made explicit. "If X happens, Y is 80% likely; if X does not happen, Y is 30% likely" is more informative than a single probability for Y.

Resolution criteria. Each prediction should have explicit criteria for what counts as resolved-affirmative or resolved-negative. The criteria reduce ambiguity in scoring.

The refinements will probably make the predictions more cumbersome to write; they will also make them more usable.

The 2006 prediction set in detail

The predictions in the earlier sections of this post are:

6 threat-side predictions
4 defensive-side predictions
4 structural predictions
6 personal predictions
1 meta-prediction

Total: 21 predictions. The number is similar to recent years; the structure is similar; the calibration is informed by the cumulative data.

For anyone interested in tracking these alongside their own predictions: the resolution dates are explicit; the probabilities are explicit; the criteria for affirmative resolution are mostly explicit. The transparency is the point.

A meta-observation about the discipline

The most valuable single thing the prediction discipline has produced for me, on reflection, is epistemic humility about my own confident assessments. When I look back at confident predictions that turned out wrong, the lesson is not just "that specific prediction was wrong". The lesson is "I should hold confident predictions less tightly".

This humility extends beyond predictions. It informs how I write about contested topics; how I respond to disagreements; how I update my views over time. The cumulative effect is that I am, by my own assessment, a slightly better thinker than I would have been without the discipline.

For anyone considering starting a similar discipline: the first year's predictions will probably be embarrassing in retrospect. That is the value. The embarrassment produces calibration; the calibration produces better thinking.

Closing thoughts

Eight years now. The cumulative archive is substantial. The discipline is firmly established. The community continues to be valuable.

The career transition to consulting represents a meaningful shift in how I engage with the field; the notebook will document the consequences as they emerge.

Thank you for reading. The conversations and corrections through 2005 have, again, been the most rewarding aspect of the work.

More in 2006. See everyone in the new year.

A note on the broader field

Let me close this combined predictions-and-year-end post with brief reflection on the broader field's trajectory.

The security field has matured substantively over the past seven years. The conversations are more sophisticated; the disciplines are more rigorous; the community is larger and more connected.

The cumulative trajectory points toward continued maturation. The 2010s will probably produce further structural shifts; the specific shifts are unpredictable; the discipline of attentive practice continues to be valuable.

For my own writing: the notebook will continue at standard cadence into 2006. The themes will track the year's events; the calibration discipline will continue; the community will remain at the centre of the work.

For anyone reading this: thank you for the year. The conversations and corrections have been valuable. The discipline continues because of the readers.

New year, new notebook page open, kettle on. Happy 2006.

A more substantive closing

Let me extend this combined post with deeper reflection on what the predictions discipline has produced.

The cumulative archive

Four years of explicit predictions with explicit probabilities. Roughly 80 predictions across the years; roughly 60 resolved (rest still in progress or partial); my overall calibration is reasonable.

The specific patterns visible in the cumulative data:

My direction calls are reliable. ~85% accuracy across the four years.

My magnitude calls are reasonable. ~70% accuracy within an order of magnitude.

My timing calls are systematically optimistic. Average ~6-12 months too early.

My threat-side predictions are slightly over-confident. Compensated for in recent years.

My defensive-side predictions are approximately calibrated.

What this teaches me

Three things that I am acting on:

My instinct toward optimistic timing should be tempered. Specifically, when I have a central estimate for when something will happen, I should add 6-12 months to account for the systematic bias.

My threat-side confidence should be moderated. When I want to commit to a 75% probability on a threat-side prediction, the calibration data suggests 65% is closer to right.

My defensive-side calibration is reliable enough to trust. Predictions in this category can be made with normal confidence.

A meta-observation about the discipline

The discipline has been valuable in ways beyond the specific predictions. The cumulative effect on my thinking has been substantial — I am, on my own assessment, slightly more rigorous in my views than I would be without the practice.

For anyone considering similar discipline: the first year is mostly for learning the format. The cumulative value emerges from year two onwards.

A small commitment for 2006

I will continue the discipline. The 2006 predictions list will follow the same format; the year-end scoring will follow the same rigour; the cumulative archive will grow.

The meta-analysis I committed to for end-of-2006 will look at five years of cumulative data. The patterns should be substantively informative.

A small extension on resolution criteria

For anyone who follows the prediction discipline closely, a brief note on resolution criteria.

The specific criteria for affirmative resolution of each 2006 prediction will be made explicit when the prediction is being resolved. For predictions that resolve in clear ways (specific products ship, specific worms appear, specific incidents occur), the criteria are usually obvious.

For predictions that resolve in less clear ways (specific trends grow, specific structural shifts happen, specific patterns continue), the criteria require judgement. I will document the judgements at resolution time.

The transparency is the point. Readers should be able to assess my calibration based on the cumulative archive; the documentation makes that assessment possible.

For any reader who finds specific resolution judgements unconvincing: the disagreement is itself useful. Calibration is a public discipline; readers' pushback informs the calibration.

A closing reflection on prediction

The four-year cumulative archive of explicit predictions is, by my assessment, the most valuable single contribution the notebook has made to its own integrity. The cumulative scoring forces honesty; the public record sustains the discipline; the meta-analysis produces self-knowledge.

For anyone considering similar discipline: the cost is bounded; the benefit compounds; the cumulative effect is meaningful.