Cambridge Analytica · Peter Bassill

The Observer published Carole Cadwalladr's piece yesterday morning (theguardian.com/news/2018/mar/17/cambridge-analytica-facebook-influence-us-election) and the New York Times published the parallel reporting from Matthew Rosenberg, Nicholas Confessore, and Carole Cadwalladr the same day. Christopher Wylie's whistleblower account, the documentary trail showing Cambridge Analytica's harvesting of Facebook user data through the Aleksandr Kogan "thisisyourdigitallife" personality-quiz application, the 87-million-record figure (revised upward from initial estimates), and the firm's claimed use of that data for political-campaign targeting in the 2016 US presidential election and the 2016 UK Brexit referendum, are now in public discussion at the highest political level.

The technical mechanism is the part of the story that needs writing about, because it is not what most readers will assume. The harvest happened through the Facebook Graph API as it existed in 2014, which permitted third-party applications to access not just the data of users who installed the application, but also a substantial subset of those users' Facebook friends' data, without the friends' explicit consent — only the original installer's consent was required. Kogan's application, presented as an academic personality study, was installed by approximately 270,000 users, each of whom provided consent for themselves and (under the API model) for the data of their friends. The 87-million figure is what that 270,000 produces under the friend-graph multiplier of the 2014 Graph API. Facebook restricted the friend-of-friend data access in their API V2 in May 2014, but the data Kogan had already collected was not retroactively removed and was passed (via the Cambridge Analytica predecessor entity SCL Group) to Cambridge Analytica.

The choice that produced the harvest, in other words, was the platform's API design — specifically, the consent model that allowed one user's installation to expose the friend-graph data without friend-side consent. That design choice was, in 2014, controversial in the privacy and platform-design communities (Aleksandr Kogan's defence of his role, Channel 4 interview, March 18) and the controversy was, on the public record, internally noted at Facebook. The decision to allow it was a product-and-business decision that prioritised developer-platform attractiveness over consent-purity. The retrospective on that decision is now being conducted in extremely public form.

For the privacy-regulation conversation, Cambridge Analytica is the worked example that GDPR's drafters anticipated. The combination — large-scale data collection without effective consent, third-party transfer of the collected data without subject knowledge, use of the data for purposes the subjects would not have agreed to, and absence of effective regulatory enforcement against the platform — is the configuration GDPR's structural design (consent, data minimisation, purpose limitation, lawful basis, data-controller-and-processor accountability) is intended to prevent. GDPR coming into force on the 25th of May would not, by itself, have prevented Cambridge Analytica — the underlying conduct happened before GDPR — but it does change the structural environment for any future repetition. The Information Commissioner's Office investigation under the Data Protection Act 1998 is active and aggressive (ICO statement and warrant action against Cambridge Analytica) and the post-25-May legal posture would be substantially worse.

For the customer-organisation work, the Cambridge Analytica disclosure has produced an immediate uptick in board-level interest in the data-protection programmes. Several customers' boards have, in the past 48 hours, asked specific questions about whether the customer organisation has any relationship with Cambridge Analytica's parent or subsidiaries, whether their own third-party data-processor relationships could produce a Cambridge Analytica-shaped exposure, and what their consent posture looks like under GDPR. The conversations are useful and overdue. The vCISO programmes are getting board attention that the ordinary GDPR-readiness work has been struggling to mobilise.

The wider strategic point — and this is going into the longer-form essay file — is that platforms with very large user populations and extractive data models are now being held publicly accountable in a way that they have not been before. Facebook is at the centre of this story but the broader pattern includes Google, the various advertising-technology vendors, the data brokers (Acxiom, Experian, the same credit-bureau industry that produced Equifax). The political conversation about platform regulation in the US, in the EU, and in the UK is going to advance substantially over the next 18 months, and the regulatory output is going to reshape the economic environment for data-intensive consumer-internet businesses. The customer organisations that have, until now, treated the platform-economy data flows as someone else's problem are going to be drawn into the conversation through their own marketing, advertising, and customer-analytics relationships.

For the personal-ethics question that the Cambridge Analytica reporting raises — the role of academics, the boundaries of consultancy work in politically-sensitive domains, the personal responsibility of individuals working in data-intensive consultancies — there is a longer essay to write. The Wylie testimony specifically, and the broader pattern of insider disclosure that Wylie's testimony fits into, is in the same tradition as Snowden, Manning, the Hacking Team-leak source, the Panama Papers' John Doe, and the various Vault 7 sources. The disclosure ethics conversation continues to develop. I will write more.

I will be at the Infosec Europe panel on platform-data-handling in June discussing this. The customer briefings this week have largely been responding rather than initiating. The signal is strong — boards that have been slow to engage with GDPR are engaging now.