Why Cybersecurity is Critical in MLOps

June 20, 2023
Larger and more sophisticated businesses will lean into building out their in-house data science teams and capabilities

If your business relies on machine learning (ML) to drive strategic decision-making, you’re in good company. A recent report by ClearML shows the technology clearly entering the mainstream, with 60% of organizations’ ML leaders planning to increase ML investments by more than a quarter in 2023. The same study revealed that 99% of respondents either already have dedicated budgets for ML operations (MLOps) or plan to implement them this year.

But, as MLOps mature, they also carry more risk. According to a recent study by NCC Group, organizations are deploying ML models in more applications without considering security requirements. In a separate survey by Deloitte, nearly two-thirds of AI and ML users describe cybersecurity risks as a significant or extreme threat, but only 39% feel prepared to combat those risks.

MLOps model creation pipelines are vulnerable and easily attacked in three separate ways: by malicious insiders, through software supply chain manipulation, and via compromised systems. If the SolarWinds supply chain attack taught the industry anything, it’s that continuous build processes are both a target for sophisticated adversaries and a blind spot for in-house security operations teams.

In 2023, continuous build processes will continue to be a target for threat actors. As these attacks start to impact enterprises’ bottom lines, they will have to start paying more attention to the cybersecurity side of MLOps.

Here are some ways to make MLOps projects safer and more secure.

Secure the Whole Pipeline

Part of the challenge of securing MLOps is the sheer length and depth of typical machine learning pipelines. They include a half dozen or more phases – data collection and preparation, along with the creation, evaluation, optimization, deployment and usage of an ML model. Vulnerabilities can crop up at any point in the process.

Early on, in data collection, threat actors can taint the data, manipulate the annotation or conduct adversarial attacks on the metadata stores. In later phases, open-source models and frameworks can include hidden vulnerabilities. Potential bias and system performance need to be addressed. And as models are deployed and used, new data is often introduced, expanding the attack surface and opening an organization up to all kinds of threats – including evasion attacks, model theft, code injections and privacy attacks.

At the tail end of the process, there’s a lot of intellectual property inside an ML model. Decades of transactional data and learnings from financial models that are built and trained into models may only be 10s of kilobytes in size. It’s easier to steal that model than to steal the actual source data.

These models tend to be exposed. Attackers have become skillful at querying models and reproducing them somewhere else. This requires a new way of thinking about the value of the model. Tooling and alerting not only around the theft of data but around the manipulation of the models is important to an overall MLOps security strategy.

Invest in Tooling to Scale Across SOCs

It’s no secret that security is no longer siloed in a single department. It cuts across all functions, and organizations are creating Security Operations Centers (SOCs) to improve the visibility, manageability and auditing of their overall security posture. To extend the SOC’s capability to MLOps, organizations need to incorporate tooling that scales to much larger uses than ever before.

Meeting MLOps’ data needs forces SOCs to adapt in two ways. Existing SOC operations teams are now accountable, forcing them to build the additional tooling and reporting to support MLOps teams from a security perspective. Plus, MLOps teams that are specialized in data science curation are able to leverage larger toolsets – including logging analytics platforms that provide higher levels of threat detection.

Double Down on Security Best Practices

Some of the best defense tactics for MLOps are practices organizations deploy regularly across the rest of their operations. A zero trust security policy requires the authentication and authorization of anybody trying to access applications or data used in the development of ML models. It also tracks their activity. Applying the principle of least privilege (PLoP) limits users’ access to the exact data sets and models they are authorized to touch. This reduces the attack surface by prohibiting hackers who have gained access to one data trove from moving freely throughout the system.

Use Analytics to Observe and Log ML Tasks

An important step in protecting an ML system is to understand the system’s behavior in healthy and unhealthy states. To do this, organizations need to set up alerts that trigger action before an incident occurs. This is called “observability.” A vulnerability introduced early in the training data will affect the model’s performance down the line. Tracking performance data and logging metrics of ML tasks gives organizations insights into any and all security issues that could affect the ML model.

Future Monitoring of Model Development Lifecycle

The continuous lifecycle of MLOps necessitates the continuous monitoring of a deployed model’s response to adversarial manipulation and corruption. In the future, expect to see larger and more sophisticated businesses lean into building out their in-house data science teams and capabilities, detecting threats with security analytics, and pruning and filtering data from unknowns to improve the advancements in the next generation of ML and AI.

About the author: Gunter Ollmann is the CTO of Devo Technology, a cybersecurity company that provides cloud-native logging and security analytics for organizations.