Hardened Linux: the boring middleware nobody writes about

Most of my career has involved Linux estates that started life hardened and then quietly accumulated entropy. Every estate is hardened on day one, because someone bothered. By month six it is mostly hardened. By year two, it is hardened-ish, and somebody is writing a remediation programme to bring it back. The pattern is depressingly consistent, and it is not because anyone was lazy — it is because the hardening was treated as configuration rather than as code.

Hardening is a property of the pipeline, not the host

The single most important shift in how I think about Linux hardening is this: hardening is not a thing you do to a host. It is a thing that emerges from a pipeline. The pipeline produces images or applies configuration that cannot be in a non-hardened state without that fact being visible. Hosts are cattle. The hardening rides on the cattle, and a cow that turns up without it is sent back.

Concretely, that means image build pipelines (Packer, kiwi, Image Builder, whatever you prefer) that bake the hardening in, and configuration management (Ansible, Puppet, Chef, salt, your preference) that maintains it. The hardening recipes are version-controlled. Drift detection runs continuously. A host that has drifted is either re-imaged or has its drift explained, in writing, in the same repo.

If you cannot say with confidence, on any given Tuesday, what the canonical hardened state of your estate looks like — you do not have a hardened estate. You have a hardening intent.

What to actually harden

Reasonable people disagree on the exact recipe, and the recipe should match the threat profile of the estate. The categories I always cover, in roughly this order:

Boot and kernel. Verified boot where the platform allows it. Sensible sysctl baseline (network hardening, ASLR, kptr_restrict, ptrace_scope, BPF restrictions). Minimal kernel modules; module signature verification on. Where the workload permits it, lock down loading further with kernel.modules_disabled after boot.

Mandatory access control. AppArmor or SELinux on, in enforcing mode, with profiles that are tight enough to be meaningful. Permissive everywhere is not hardening. The work of getting profiles right is non-trivial; the work of being able to point to enforcing profiles in incident response is enormous.

Userland minimisation. No package present that is not actually used. SSH config narrowed to key-based authentication, with a sensible cipher and KEX policy, and ideally with bastion-only access from a known administrative range. No interactive shells for service accounts.

File systems. Read-only root where possible. nodev, nosuid, noexec on /tmp, /var/tmp, /home where the workload allows. Audit rules that capture the events forensics will actually want — process execution, privileged commands, file integrity for critical paths.

Secrets and identity. No secrets baked into images. A secrets manager. Short-lived workload identities. Rotation that is automated and tested.

Network policy. Egress filtering, not just ingress. Most modern compromise scenarios depend on outbound connectivity to a controller, and outbound filtering is one of the cheapest, highest-leverage interventions you can make on a hardened estate. It is also one of the most often skipped, because it is fiddly to operate without breaking workloads.

Logging and audit. Auditd configured with a rule set that survives review. Logs shipped off-host, ideally to two destinations, with the host having no rights to delete them. Log retention that meets the regulatory and forensic horizons of the organisation.

The unglamorous reality of operating hardened estates

Operating a hardened estate is significantly more work than operating an unhardened one. Workloads break in odd ways. Developers complain. Engineers spend afternoons debugging seccomp profiles. The temptation, in the face of that friction, is to relax the hardening — and once relaxed, it tends not to come back.

The cultural intervention that works, in my experience, is to make the enforcing state the default and permissive the exception, with permissive states logged, time-bounded, and reviewed in the same way that production access exceptions are reviewed. Yes you can run permissive on this profile for a week while you debug, and the exception will expire automatically and we will revisit. That is materially different from yes we will turn enforcing off and circle back later, which means never.

Drift is the long-run enemy

I have come to think of drift as the dominant failure mode in long-lived estates. It is not catastrophic in any single moment; it is slow, plausible, and eventually fatal. The defences against drift are unsexy: continuous configuration enforcement, drift dashboards that someone actually looks at, image rebuild cadences that are short enough that hosts age out before they decay, and audit-driven validations that fail the build when the configuration has slipped.

If you only get one of those four right, get the rebuild cadence. A hardened estate that re-cycles its hosts every thirty days through a known pipeline is harder to detune than one that does not, because the temporary relaxations decay back to baseline automatically. Pets versus cattle turns out to be a hardening principle as much as a scaling one.

The defender's view

The reason this all matters, ultimately, is the defender's view of the estate. When an incident occurs, the defender needs to know, with confidence: what was the canonical state of this host before the incident? Without that knowledge, every odd thing on the host is suspicious, and the cost of triage explodes. With it, anything not on the canonical baseline is a strong signal, and the work of containment becomes tractable.