The vulnerabilities of PyTorch-nightly

The supply chain plays a curious role in modern the enterprise. We know it’s important, and understand how, in a global economy with components coming from all kinds of places, it’s incredibly complicated. A simple hiccup in a small town somewhere can reverberate in a major operation halfway around the world. And yet, the topic seldom gets too much attention, of course, until there’s a problem.

The software supply chain is even less visible than the classical one involving physical goods. When it comes to ICT, much attention is given to the hardware, the network, the operating system and the applications running thereon, but oftentimes not so much to the constituting elements of such applications–a myriad of small building blocks. But do we really need to know the details?

Well, people responsible for IT security surely do, since at least in 2017, when a security vulnerability in the Apache Struts Web application framework hit Equifax, a well-known U.S. credit bureau. Back then, the private records of 149 million Americans were compromised. Yet another reason to care is the recent attack on PyTorch-nightly, which is the unstable development version of the well-known PyTorch deep learning library. Luckily, the problem is by far not as grave as what happened in 2017, especially because only the nightly build was affected, and not the stable release. Still, over the Christmas break, its authors urged users to uninstall and replace versions downloaded during that time frame.

A Complex Supply Chain

Before your eyes glaze over, here’s a little context.

The software supply chain is extraordinarily complex. Most consumers assume that dedicated developers, the employees of the software vendor, do all the development work for a given piece of software. But the truth is that a large share, sometimes up to 80%, of the code stems from open-source software (OSS), which is created and constantly refined in a collaborative fashion, by voluntary contributors from all around the world. And even if we know that there’s lots of OSS in many applications, it is more difficult than you would guess to understand which exact versions are in use, who contributed what and which portion of that open source code actually gets executed.

More specifically, given the number of open source projects used and their size, it is very tedious and time-consuming, virtually impossible, for developers to know and understand in all detail what code is being pulled into their software development projects, where that code originates from, how much it’s been changed since the last version, or—and this is vital—just how safe the version they use really is.

The software supply chain is even less visible than the classical one involving physical goods.

Technologists rightly make the point that OSS is invaluable because it prevents developers from having to constantly reinvent the wheel. But when they have to spend too much time managing open-source components and making sure they’re secure, the benefits of recycling are definitely undermined.

This is the software dependency lifecycle, and the components constituting applications are commonly categorized as either “direct” dependencies (known and consciously selected by developers) or “indirect” or “transitive” dependencies (which are dependencies of dependencies and so forth, all automatically pulled into development project). Endor Labs’ own research has shown that the majority of vulnerabilities floating around right now can be found in transitive dependencies.

Caught in a Trap

This has led to significant problems. Not long ago, the Log4j vulnerability caused one federal department alone to spend 33,000 hours trying to find and remediate the errant code. People involved struggled to understand which software depended on Log4j, where this software ran and whether the vulnerability was exploitable, thus, whether attackers could take advantage of it or not.

Coming back to the recent episode with PyTorch-nightly, this was a classic attack on the software supply chain: Cybercriminals created a malicious open-source package with the same name as the legitimate one, but one a different download server (a.k.a package repository), which is preferred by developer tools over PyTorch’s legitimate one. As a result, 2,000-plus developers downloaded the malicious version of PyTorch-nightly. It was laced with a payload that allowed the bad guys to steal passwords, SSH keys and other files on victims’ computers.

This technique, sometimes called dependency confusion, exploits developer setups where multiple package repositories are used for downloading project dependencies. Depending on the resolution algorithm of the package manager—e.g., the order in which repositories are contacted—an attacker can make the package manager download the malicious package rather than the legitimate one.

This is sadly common: Over the last few years, we’ve been seeing an increase in such next-gen attacks, where cybergangs or individuals aim at injecting malicious code into open-source projects such that it gets downloaded and executed by application developers or end-users. Unfortunately, some of those attack vectors can’t be detected by traditional vulnerability scanning.

The severity of such attacks for affected developers and organizations depends on the assets compromised, such as the identities and secrets exfiltrated, the systems encrypted by ransomware, the computing resources used by crypto miners (and paid by victims) or the intellectual property leaked by the malware.

In some sense, such attacks can be compared to spam emails: Just like we all got used to not opening every email and attachment, developers should become more careful pulling and installing open-source dependencies, because attackers discovered this as a viable means to distribute malware.

One can say that the first decades of OSS were all about productivity, which caused open source to become a crucial element of all kinds of software, including mission-critical systems. But today, we need to find ways to maintain speed and productivity without compromising security. For that, we must examine the process of selecting OSS dependencies, and better understand how to define more sustainable processes that reduce long-term risk.

How to Protect Open Source

There are defenses, and the industry as well as open-source foundations like the OpenSSF make significant investments in securing open-source ecosystems and the software supply chain.

Dependency confusion, for example, can be addressed with private repositories to both hosts internal packages and mirror external packages, e.g., using devpi (yet another open-source solution for the Python ecosystem). Typically, such solutions allow more control over dependency resolution and package download processes. However, their setup and operation also require some technical skills and effort.

The big picture perspective is that today, applications depend on dozens, even hundreds, of open-source components. Some of those are developed, built and distributed on potentially insecure systems. To better understand this complex problem space, Endor Labs has already documented more than 100 unique attack vectors at the disposal of attackers to compromise such components, and we expect that more will be discovered.

What the market needs are better ways to simplify dependency selection, monitor and prioritize security risks once dependencies are integrated and an in-depth understanding of program behavior. Options to do this are becoming available.

Henrik Plate, Security Researcher, Endor Labs

About the author: Henrik Plate, CISSP, is Security Researcher at Endor Labs, the startup dedicated to securing open-source software reuse in application development. He’s an experienced software developer, architect and researcher with a focus on software security and a demonstrated history of authoring scientific papers and patents as well as developing commercial and open-source software solutions. He previously spent nearly two decades as a researcher and software developer at SAP. His current research focuses on the security of software supply chains, including the detection, assessment and mitigation of dependencies with known vulnerabilities as well as malicious open-source components. He also led and contributed to publicly-funded research projects and presents at industrial and academic security conferences. Plate holds a Master’s degree in Computer Science and Business Administration from the University of Mannheim.

The vulnerabilities of PyTorch-nightly

A Complex Supply Chain

Caught in a Trap

How to Protect Open Source

Henrik Plate | Security Researcher at Endor Labs

Lessons learned from notable third-party data breaches of 2021

Third-Party network: Your friendly Dark Web?

Latest in Managed Network Security

How to Recognize Social Engineering and Block the Modern Kill Chain

Bryan Knepper named Division President of Vector Security Networks