What pen testing now actually buys you

AI-assisted offensive tooling, cloud-native estates, supply-chain shaped scope — what pen testing in 2025 actually looks like, and what boards are still mis-reading in the deliverable.

Penetration testing is one of those services where the words what we deliver and what the customer pays for have drifted apart over the years, and 2025 has been the year that drift has become visibly uncomfortable. Boards are still buying the 2015 product in many cases. The firms doing the work are running the 2025 version. The deliverable in the middle has not always caught up. This post is for the board director or audit committee chair who reads pen test reports and is starting to suspect that they are not quite getting what they used to.

What changed in 2025

Three things, all converging.

Tooling has moved. AI-assisted offensive tooling has gone from interesting demo to routine practitioner workflow in roughly eighteen months. The good firms are not asking AI to do the test for them — they are using it to compress reconnaissance, to enumerate vulnerabilities across larger surfaces, and to produce candidate exploit chains that a human practitioner then evaluates. The work is faster and broader. It is not yet noticeably better at depth. That is a different problem and the firms claiming otherwise are overselling.

Estates have moved. The customer estate that was a network of servers and a couple of cloud accounts in 2018 is now, for most customers, a sprawl of SaaS tenancies, identity-provider configurations, container registries, CI/CD pipelines, third-party API integrations, and the rump of legacy on-premises infrastructure. The scope of a pen test in 2025 is not the perimeter the customer thinks they have. It is the perimeter the customer actually has, which is several layers larger than the IT inventory.

Threat models have moved. The threat actor who in 2018 might have phished a user and pivoted into the network is now likelier to compromise a developer's npm package, exfiltrate via a CI/CD token, or live off the land in an identity-provider configuration. The tests that catch the 2018 actor will miss the 2025 one. The tests that catch the 2025 actor look quite different on paper, and several customers find them harder to recognise as pen testing at all.

The deliverable problem

Here is the deliverable that boards have been used to reading. A scoped technical engagement, two to three weeks of testing, a report with an executive summary, a list of findings rated by CVSS, and a remediation table. Most pen test reports still look broadly like that. Most pen test reports still need to. The trouble is that the most consequential findings of a 2025 engagement are not the ones that fit neatly into the CVSS-rated finding list.

The most consequential findings often look like this: the firm has no inventory of its third-party identity-provider integrations. Or: the firm's container build pipeline has a secret rotation period that exceeds the lifetime of the project for which the secret is in use. Or: the firm cannot tell which of its public-facing assets are owned by it and which are owned by an acquired subsidiary that has not been fully integrated.

These are not vulnerabilities in the traditional sense. There is no CVE. There is no patch. They are structural findings that, if exploited, would explain the loss of an entire system that, on paper, was healthy. Board readers who have been trained to look at the CVSS heatmap will skim past them. Practitioners writing the reports know this and have a quiet vested interest in writing in a way that the board reader will not skim past.

The deliverable problem in 2025 is this: the parts of the report that matter most are the parts least optimised for the audience reading it.

What a board reader should look for in 2025

Three things, beyond the heatmap.

An attack narrative, in plain English, that a non-technical reader can follow. The good firms now write a one- to two-page attack walk-through at the front of every report, describing how a realistic attacker would have moved through the engagement, what they would have reached, and what would have stopped them. If your report does not have this, ask for it. If the firm cannot produce it, that is a meaningful signal.

Structural findings clearly distinguished from technical findings. A finding that says the firm lacks an inventory of public-facing SaaS tenancies is qualitatively different from a finding that says a particular web application has a missing security header. The good firms now report them in separate sections. The point of doing so is to make sure the structural findings do not get triaged in the same week as the technical ones and quietly closed because they are harder.

Evidence that the test was, in fact, performed by humans. This is now a real question, in a way that it was not eighteen months ago. The good firms will tell you which parts of the engagement were AI-assisted, what the practitioner judgement was on each AI-generated finding, and which findings were the result of human exploration that the AI tooling did not surface. A report that cannot tell you this is, increasingly, a report you cannot fully trust.

What boards are mis-reading

Three patterns I see consistently.

One: the heatmap fallacy. A heatmap with mostly green and a few yellows is being read as evidence of a healthy posture. In 2025 it is more often evidence that the scope was too narrow. A report with no high-severity findings against a large modern estate is a report whose scoping conversation is worth re-reading.

Two: the unfound-equals-secure fallacy. We had a pen test and nothing was found is not the same as we are secure. Pen tests are time-boxed and scope-limited. They tell you what was findable by a competent team in two to three weeks against an agreed scope. They do not tell you what would be findable by a determined attacker over six months with broader scope. The good firms will say this on page one. The cheap firms will not.

Three: the once-a-year fallacy. An annual pen test made sense when the estate changed slowly and the threat model changed slower. Modern estates change weekly. Modern threats change daily. The good firms now sell some combination of focused annual deep-dive testing, continuous discovery and validation, and event-triggered testing around significant change. Customers who are still buying one big test a year are buying a product that has aged badly.

What I would like to see in 2026

Three changes I would push for.

Report formats that put structural findings on page one. The heatmap should be on page three, after the attack narrative and the structural risks. The reordering would, by itself, change a great deal of subsequent conversation.

Customer maturity tracking. The good firms have always known which of their customers were getting better year-on-year. They have not always said so to the customer. They should. A pen test report that, for a returning customer, opens with here is what changed since last year, and here is what did not would be more useful than most.

Methodology evidence. I would like the report to include, as an appendix, an honest statement of methodology and tooling — including any AI-assisted components — so that the customer can ask informed questions. The CREST methodology framework already supports this. Most firms do not yet take advantage of it. They should.

One sentence

A pen test in 2025 is no longer a verdict on your security; it is a structured conversation about what your security is currently optimised for, and what it has stopped seeing. Read it accordingly.