Vulnerability assessment, penetration testing, and the gap between them

There is a vendor pitch I keep seeing on mailing lists, in trade press, in a few sales conversations I have had — that "vulnerability assessment" and "penetration testing" are essentially the same activity, varying only in price. They are not. The conflation is, in my view, doing real harm to how organisations think about security testing.

I want to write down the difference, because I keep having to explain it.

What vulnerability assessment actually is

Vulnerability assessment is discovery. The aim is to enumerate, as completely as possible, the weaknesses in a system or network. The methodology is automated, broad, and shallow.

A typical vulnerability assessment runs a tool — ISS Internet Scanner is the commercial standard right now, and the open source Nessus is rapidly catching up — against the target. The tool checks every host for every known vulnerability in its database. It produces a report listing what was found. Done.

The value is the breadth. A scanner with 1000 checks tests for all 1000 in minutes; a human tester would take weeks. The cost is depth. Each check is shallow: "is the BIND version below 8.2.2-P5?". The tool does not exploit the vulnerability. It does not assess whether it is reachable. It does not consider chains of vulnerabilities that might combine.

What penetration testing actually is

Penetration testing is demonstration. The aim is to prove, with a small number of concrete examples, that the target can be compromised. The methodology is human, narrow, and deep.

A typical penetration test starts with a vulnerability assessment as input, but then a human goes much further. They take a small number of identified weaknesses, develop or adapt exploits for them, and use those to gain access. From the initial foothold, they look for ways to escalate, to pivot, to reach the actual targets the engagement is about.

The value is depth. A pen test produces a small number of specific findings: "by exploiting bug X in service Y on host Z, I gained shell access; from there I obtained the contents of file W". Each finding is a complete, demonstrable attack chain. The cost is breadth — the human cannot test everything.

The gap, and why it matters

A vulnerability scan that finds 200 weaknesses is, in some technical sense, more comprehensive than a pen test that finds three. The scan covered everything; the pen test covered three.

But three demonstrated attack chains tell the operator something the 200 weaknesses do not: which weaknesses combine into actual compromise. The 200-weakness list includes some critical ones, some moderate, and some that are technically vulnerable but not exploitable in this environment. The list does not say which is which.

The gap is not a thing the scanner can tell you. The scanner reports facts; the pen tester provides interpretation. The interpretation is the value. Without it, the scan is a list of work to do, not a description of risk.

A scanner cannot, for instance:

Know which vulnerabilities are exploitable in this network. A bug requires certain pre-conditions (network reachability, specific authentication state, particular OS configuration) that the scanner does not always check. A pen tester checks.

Know which vulnerabilities chain. Bug A on host X, combined with bug B on host Y, is a complete attack. Either bug alone is a moderate finding. The combination is severe. The scanner reports them separately.

Know what a successful exploit lets the attacker do. "Buffer overflow allows arbitrary code execution" sounds bad in the abstract. Whether it allows root or unprivileged shell, whether it gives access to interesting data, whether the host is a stepping stone to other valuable hosts — these are the questions that determine actual risk. The scanner does not know.

Know what a defender already mitigated. If chroot is in use and the daemon runs as a non-root user, a buffer overflow that the scanner reports as critical may, in this environment, be moderate. The scanner sees the code is vulnerable; it does not see the surrounding mitigations.

When each is appropriate

A few rough heuristics for which to use when.

Vulnerability assessment is right when:

You need to understand the baseline of your weaknesses across many hosts.
You want to track improvement over time — "are we better than last month?".
You want to identify obvious problems quickly and cheaply.
You want to feed a vulnerability management process that prioritises by severity and remediates in batches.
The cost of a thorough investigation is too high relative to the budget.

Penetration testing is right when:

You need to know whether a specific attacker — say, an external actor with no inside knowledge — can compromise a specific asset.
You want a demonstration of risk for a sceptical audience (executives, auditors).
You suspect there are issues a scanner cannot find — logic bugs in custom applications, subtle privilege chains.
You need to test your detection at the same time as your defences. A pen test tells you whether you would have noticed.
You are stress-testing a specific defensive change.

Both are needed when:

You are running a serious security programme. They are complements, not alternatives. The scan tells you the volume; the pen test tells you the depth.

The third thing: continuous monitoring

There is a third activity that is sometimes lumped in with these and is, I think, importantly different. Continuous monitoring — running intrusion detection, log analysis, traffic graphing — is what tells you whether actual attacks are happening, distinct from whether you are vulnerable to them.

A host can be vulnerable but unattacked. A host can be attacked despite being patched. A host can be compromised without anyone exploiting any of the vulnerabilities a scan would find — the compromise might come from a successful phishing attempt or a credential reuse incident, neither of which a scan would detect.

The security programme that has all three — scanning, pen testing, monitoring — is materially more robust than the one that has any one or two. The programmes I have seen fail most quickly are the ones that do scanning only, because they conflate "vulnerable" with "insecure" and miss everything that is actually happening.

What this looks like for a small operator

For my own scale, what I do is approximately:

Monthly self-assessment with nmap and a small Nessus run. Cheap, fast, identifies obvious problems.
Quarterly self-pen test where I sit down with a fresh head and try to break my own infrastructure. I have no formal training in this. I am clearly missing things a professional would find. The exercise is still valuable.
Continuous monitoring with Snort, the log scanner, and MRTG.
Annual external pen test by someone I trust, when I can afford it. The output of this is the most valuable single document I produce in a year.

For anyone running a real organisation, the model would scale up — quarterly external testing, daily scanning, continuous monitoring run by a real team. The principles are the same.

The vendor pitch that conflates the three categories is, in my view, either ignorance or malice. The categories are different and any organisation that thinks of them as the same is going to make worse decisions about which to invest in. Worth pushing back on, when you hear it.