goto fail · Peter Bassill

Apple shipped iOS 7.0.6 on Friday with the fix for CVE-2014-1266, which has been called "goto fail" in the half-dozen technical analyses that have appeared over the weekend. The fix is one line: a duplicated goto fail; statement that, in C-language semantics, makes the second goto unconditionally jump to the cleanup branch and bypass the actual signature verification in the TLS handshake. The bug means that, for the past eighteen months, iOS and OS X TLS connections have not been properly validating server certificates. Anyone on the same network as an iOS or Mac user could have presented an arbitrary certificate for any domain and had it accepted. The implications for any iOS or Mac user who has used coffee-shop Wi-Fi, hotel Wi-Fi, or any network they did not control over the past eighteen months are direct. The exact phrasing of the implication is left as an exercise for the reader.

Adam Langley's analysis at ImperialViolet on Saturday is the canonical technical writeup. The bug is, as Langley says, "as embarrassing as it gets". The duplicated goto fail; is on lines 631-632 of sslKeyExchange.c in the open-source libsecurity_ssl codebase. The first goto is conditioned on a check that may or may not fail; the second goto is unconditional and skips the actual signature-verification step. A proper fix would have been the equivalent of the first goto without the second; what shipped in production was both gotos with the second one effectively neutralising the security check that follows. Whether this was a copy-paste error, a merge artefact, an editor accident, or — and the conspiracy-minded reading is hard to dismiss in the post-BULLRUN environment — something more deliberate, is not something the public source code can answer. What is clear is that the bug has been in shipping iOS for eighteen months and that nobody at Apple's TLS-validation testing caught it.

The latter part is the part that bothers me most. iOS is a flagship product running on hundreds of millions of devices. The TLS implementation is one of the most security-critical components of the operating system. The "have we fuzz-tested certificate validation" question is one that I would have assumed Apple's QA infrastructure addressed routinely and rigorously. Eighteen months of shipping a TLS implementation that does not actually validate certificates is, on the most charitable interpretation, a substantial QA failure. The less charitable interpretations are worse. I do not have the ability to determine, from outside, which interpretation is correct. Bruce Schneier's piece on the bug is the right summary of the structural argument that goto fail represents.

For the engagement work, the post-goto-fail conversation has been operationally familiar from the post-Heartbleed-shaped vulnerability response patterns of earlier years. The clients with substantial iOS deployment — most of them, given that BYOD-with-iOS has been the norm at the secondment clients for some time — need to push the iOS 7.0.6 update through their MDM infrastructure on an aggressive timeline. The clients with substantial Mac deployment need to push 10.9.2 (which shipped this morning) through their patch-management infrastructure. The clients with users who travel — which is most of them — should assume that any TLS-protected operation those users have performed over networks they did not control may have been intercepted, and should consider whether credential-rotation is appropriate. The honest answer for most clients is that the practical risk of someone having actively exploited goto fail against them specifically is low, but the structural risk of having shipped iOS-and-OS X for eighteen months in this state is high enough that the credential-rotation conversation is worth having.

For the Hedgehog SOC, the detection-content question for goto fail is operationally bounded — we cannot, from the network observation point we have, detect retrospectively whether someone-on-the-network was exploiting the certificate-validation bypass against any particular client iOS user. What we can detect is the post-exploitation patterns where a successful MITM attack would produce subsequent anomalous behaviour at the application layer; those are already in the detection content from the APT-shape detection work last year. Heartbleed-style detection at the network level is not applicable to this class of vulnerability.

The wider lesson — which I have been writing in various forms since BULLRUN — is that the trust-the-platform default in 2014 is the position you have to actively justify, not the position you can take by default. iOS, OS X, Windows, Linux, OpenSSL — all have shipped substantial vulnerabilities that fundamentally undermine the security guarantees the platforms have been claiming. The defensive engineering response is the same response: assume the trust chain is contested at every layer, design for compromise of any single layer, and verify the layers above. Most of my clients are not yet operating on this basis at full depth; goto fail is the latest of the "most of my clients" lessons, and it is going into the engagement-team material this week.

The next post is probably the Optic Nerve material that the Guardian have been trailing for the past week — there is, on the public reporting, a substantial GCHQ-Yahoo-webcam disclosure coming soon — or the Mt. Gox bankruptcy filing that is rumoured for next week. The pace of significant disclosures in early 2014 has not been slower than 2013.