The Black Hat USA 2021 paper-and-presentation by the lead engineer (with my supporting role) was delivered yesterday in Las Vegas, in the Mandalay Bay's Las Vegas Convention Center. The presentation was hybrid — in-person attendance was reduced from the pre-COVID norm but substantial, with virtual attendance through the Black Hat platform pushing the total audience well past the in-person-only count. The session went on the operational measure well, with the question-and-answer engagement substantive across both audience modes.
The paper's substantive content covered three years of threat-intelligence-integration data from the EmilyAI deployment across the customer fleet — the architectural design of the threat-intelligence ingestion and feature-engineering layer, the per-customer customisation patterns we have documented across the deployment, the precision-recall improvements that the integration produces against the incident-grade-class detection problem, and several worked-example case studies (anonymised on the customer side per agreement) of specific incident detections that the threat-intelligence layer enabled and that the rule-and-baseline-only detection layer would not have produced. The paper is in the public Black Hat materials archive as of yesterday and the team's blog post on the work is linked from the company site.
The presentation went, on the lead engineer's delivery, well. Nine years from the postgraduate-intern hire to a Black Hat USA presenter slot is the kind of professional-development arc that I do not think I could have predicted at any point along the way, and the team's mood at the post-presentation dinner reflected the recognition that the work has produced a substantive contribution to the security-research community.
The networking afterwards was, on the COVID-period-modified Black Hat experience, less concentrated than the pre-COVID norm but still substantive. The substantive customer-prospect conversations were three; the substantive partner-and-vendor conversations were several more; the substantive academic-side conversations on the research direction were two and are continuing through email.
Three thoughts for the file from the Q&A session.
First, the cross-customer-data-sharing question is now substantially asked. The ML-for-security community is increasingly aware that the per-customer data that organisations like ours hold — analyst-decision data, incident-and-disposition data, threat-intelligence-correlation data — is, in aggregate across the customer base, a significant defensive resource that the broader security community does not have access to. The question of how to share that aggregate value while preserving customer-organisation confidentiality is operationally challenging and structurally important. The federated-learning and differential-privacy-on-cyber-data research literature is the relevant academic frame, and the operational engineering of the techniques against actual SOC-decision data is the work that will produce the substantive answer over the next several years.
Second, the adversarial-machine-learning question is more substantive than at the BSides Manchester 2019 talk. The post-2019 academic literature on adversarial examples in cyber-detection contexts has produced several specific demonstrations of model-evasion techniques against classes of cyber-detection model that are similar in structure to ours. The operational evidence we have from production deployments has not, through 2020 and into 2021, shown active model-aware adversarial behaviour, but the structural risk is sharper than the 2019 conversation suggested and the engineering team has been investing in adversarial-robustness work as a substantive theme in the 2021 product roadmap.
Third, the regulation-and-machine-learning intersection. The post-GDPR environment for automated decision-making (Article 22 GDPR specifically) and the emerging EU AI Act framework that has been progressing through the legislative process are going to apply to security-decision automation, and the customer-organisation programme work needs to anticipate the regulatory requirements rather than retrofit. The Black Hat audience has not, on the historical pattern, engaged extensively with the regulatory dimension of ML-in-security work; the Q&A engagement on this thread was modest but the conversation needs to develop and the engineering teams that build security-ML capabilities need to understand the regulatory environment as substantively as they understand the technical environment.
The travel back to the UK is on Friday. The customer-portfolio operational work will be waiting. The blog continues.