Cloudbleed · Peter Bassill

Cloudflare published the post-mortem yesterday on what Tavis Ormandy has been calling Cloudbleed (cloudflare.com blog post by John Graham-Cumming, February 23, and Ormandy's Project Zero entry). A bug in the company's HTML-parsing code, used to rewrite outbound responses for various Cloudflare features (email obfuscation, server-side excludes, and a few others), produced a memory-overrun condition that occasionally caused the response sent to the requesting client to include uninitialised memory from the reverse-proxy worker process. That memory could contain content from prior requests through the same worker — including, on the disclosed analysis, request bodies, response headers, cookies, authentication tokens, and arbitrary fragments of content from other Cloudflare customers' sites.

The bug's introduction date and the disclosure timeline make this incident materially more concerning than a typical TLS or web-application disclosure. The buggy code path was active in production from approximately the 22nd of September 2016. The leak rate was very low — Ormandy's discovery was based on observing leaked memory in Google search results from cached Cloudflare-fronted pages, and the rate of cached-page-leaks was modest enough that nobody noticed for five months. The peak leak window was 13 to 18 February 2017, with around 1 in every 3.3 million HTTP requests through the buggy code path producing some amount of memory disclosure. The total volume of leaked content is, on Cloudflare's analysis, in the millions of requests; the actual leaked memory is substantially larger because each affected request typically returned several kilobytes of memory contents.

The cleanup posture is the part of this that has me reorganising customer briefings this morning. Search engines have been crawling and caching pages from Cloudflare-fronted sites throughout the entire window. Google, Bing, Yandex, and the various smaller crawlers have, in their various caches and snapshots, fragments of memory from Cloudflare workers that include content the original requesters intended to be private. Cloudflare and the affected search engines have been working over the past week to expunge cached content that contains leaked material; Ormandy's bug report describes this work and lists specific search queries that surface examples. The work is ongoing and is going to take more time. The data has, in many cases, already left the search-engine caches and is in the hands of various third parties who have been independently pulling the same searches over the past week as the analysis has gone semi-public.

For customer organisations whose web properties are Cloudflare-fronted — and a substantial number of vCISO clients use Cloudflare directly or via vendor-managed services that use it — the immediate operational concern is the credentials and tokens that may have been disclosed. Session cookies, authentication tokens, API keys, OAuth bearer tokens, and any other credential transmitted in HTTP headers or bodies through the affected period may have been included in leaked memory. The action this week is to enumerate the credential surface for each customer's Cloudflare-fronted properties, rotate the credentials that have been transmitted in plaintext (which is most of them, because the credentials are visible to Cloudflare as a TLS terminator), and consider the application-side session-invalidation and re-authentication question. The Browne Jacobson estate is small in this respect and is being addressed today. Towry's trading-platform fronting was migrated off Cloudflare last year for unrelated reasons and is unaffected. Northcott's properties are mostly internal and only one external service is in scope. The manufacturer's posture is in audit; the new financial-services client uses Cloudflare for several customer-facing endpoints and is the largest piece of work on this for the week.

The structural concern is the trust posture for CDN-class TLS terminators. A CDN that terminates TLS sees, by design, the plaintext of every request and response that traverses it. The trust the customer places in the CDN is therefore very high — comparable in scope to the trust placed in the customer's own infrastructure. A bug in the CDN of this nature exposes content from many customers simultaneously, in proportion to the CDN's market share. Cloudflare's market share in 2017 is large; the implication is that Cloudbleed is a single-incident exposure of content from a substantial fraction of the internet's TLS-fronted services. There is no defensive posture available to the customer organisation that addresses this; the customer can choose not to use a CDN, but the trade-off in DDoS resilience and in performance is severe enough that few will make that choice. The CDN's posture matters. Cloudflare's incident response on Cloudbleed has, in the public reporting, been thorough — fast root-cause identification once the bug was reported, careful disclosure timeline coordinated with search-engine cache cleanup, comprehensive public post-mortem — and the operational competence of the response is, in itself, a useful signal for trust calibration. But the structural fact remains that the CDN is in the trust path and the customer's risk surface includes the CDN's bug surface.

The wider thought is on Project Zero's role. Tavis Ormandy's identification of the bug, his coordination with Cloudflare, and the structured disclosure that resulted are an example of the value of research-team-driven third-party scrutiny of infrastructure-grade software. The bug was introduced in September 2016 and was not noticed by Cloudflare's internal review or by ordinary user reports for five months; Ormandy noticed it from Google's perspective on the data. The structural argument for funding and protecting research teams of this nature, both inside major vendors and in third-party organisations, is reinforced every time one of these disclosures works.

I am going to be writing more this week as the cleanup progresses. The credential-rotation conversation with customers is uncomfortable because the rotation cost is real and the demonstrated harm is unclear, but the rotation is the correct conservative posture. We will do it.