Recently, on the "Ask The Architect" session from the Devoxx UK 2018 conference, Oracle's chief architect, Mark Reinhold, shared his thoughts about Java’s serialization mechanism which he called a “horrible mistake” and a virtually endless source of security vulnerabilities. This is evident in nearly half of the vulnerabilities that have been patched in the JDK in the last 2 years are related to serialization. Serialization security issues have also plagued almost every software vendor including Apache, Oracle, Pivotal, Cisco, McAfee, HP, Adobe, VMWare, Samsung, and others.
More importantly, Reinhold, announced Oracle’s decision to improve Java’s security by changing the way Java handles object serialization. Specifically, Reinhold mentioned that Oracle’s long-term goal is to remove native object serialization and the creation of a new plugin mechanism that will allow developers to choose the serialization format of their choice. Supported formats will include XML, JSON, YAML and even the existing, problematic native serialization. Additionally a new safe serialization format will be created that will be based on a new language feature called Data Classes, which is part of project Amber.
This is a huge decision considering that object serialization in Java is two decades old. It was first introduced to the Java platform in version 1.1 and is tightly coupled to hundreds of components and important functionality of the JVM. Additionally, countless other libraries, frameworks and enterprise servers depend directly or indirectly on native Java object serialization.
Because of this tight-coupling and strong dependencies, removing Java’s serialization mechanism is a very difficult task and a big engineering challenge that requires careful planning and a thorough design. It is not a surprise then that Reinhold mentioned that this is a long-term goal and that Oracle cannot commit to a release schedule for replacing serialization.
There is little doubt that serialization issues plague Java and that addressing the underlying causes will benefit the Java community. But, how long will it take to bring a new approach to the market and will simply replacing the old serialization mechanism with a new approach end the issue?
To answer this question, we first need to understand a bit about Java’s object serialization. Object serialization is the process of converting an in-memory object (graph) into a stream of bytes for transport and storage. Deserialization is the reverse process.
This process can be fully automated by the JVM and it can be transparent to any application component that needs such functionality. In order for a class (component) to utilize Java’s object serialization the class needs to implement the Serializable interface. The whole process of serialization and deserialization is based on a very detailed specification. One of the fundamental goals of the serialization mechanism is to support bidirectional object transfer between different versions of a class operating in different virtual machines. In other words, Java’s object serialization must be backwards compatible while allowing code to evolve. To achieve this backwards compatibility goal, the specification defines a series of strict requirements of what constitutes a compatible change. According to the specification, removing the Serializable interface from a class is an incompatible change.
Therefore, dropping serialization support from Java cannot be achieved by simply removing the Serializable interface because this will clearly have a significant compatibility impact. Maintaining backwards compatibility is a fundamental requirement in several enterprise systems. Serialized object graphs could be stored in databases for an arbitrary time period with the expectation that whenever they get deserialized, the deserialization will work as expected, even if the system has been upgraded and the classes have been evolved. Removing Serializable would instantly invalidate every single stored serialized graph. To avoid such failures, organizations would need to carefully plan and prepare a detailed migration strategy not only for their applications and infrastructure but also for every persisted serialized graph, which needs to be re-serialized using a new mechanism and re-persisted.
This migration would not be an easy task either, especially in the Java EE world. Numerous enterprise middleware, servers and JEE protocols, such as RMI, JMX, and JMS, are heavily dependent on native Java serialization and as such, are very difficult to change. It is highly probable that the Java EE Expert Group will raise objections to such change and the approach might be revised significantly. JEE vendors would also need significant time and effort - possibly years - to switch to any alternative technology while maintaining backward compatibility.
The information that has surfaced to date is scarce and we do not know Oracle’s exact strategy on how they are planning to introduce this incompatible change. It is clear that Oracle will need to bring forward this plan in a phased manner that will last several years. In such scenario it is very likely that the first step would be to deprecate the Serializable interface and the java.io related classes in a future Java release. So far, no JDK Enhancement Proposal (JEP) has been proposed publicly to deprecate the serialization mechanism in Java 11, which is expected to be released later this year. Therefore, it seems that the announcement was made too early and there are no public discussions or proposals for such a change in the immediate future.
Even after the depreciation period expires, removing the existing serialization mechanism will cause major disruptions and hinder the adoption of new Java releases. This is a highly undesirable scenario for Oracle, especially now that the Java release train is moving faster than ever.
To avoid this problem and to assist with this migration process of maintaining backward compatibility, Oracle will most likely keep native Java serialization as an option to the new plugin system. To achieve this, the serialization related classes will likely be moved out from the java.base module, which provides the fundamental APIs for the Java platform.
However, it is important to understand that applications using Java object serialization are not automatically vulnerable. The vulnerability occurs only if the application deserializes data from untrusted sources. In other words, only if the user has control of the object which is to be deserialized. Therefore, if an application depends on deserializing user data then simply switching to another serialization technology will not automatically make the application safe. Most other serialization technologies such as XML and JSON also suffer from similar critical vulnerabilities. For example, in the recent months, attackers have managed to exploit these vulnerabilities (such as CVE-2017-9805) to infect their targets with crypto-mining malware. This demonstrates that the underlying serialization mechanism is not the primary problem. To avoid deserialization vulnerabilities, the application must avoid deserializing untrusted data rather than switching into another serialization technology. This requires a significant engineering effort as the application may have to be redesigned.
It is reasonable then to ask why Oracle has fixed so many serialization vulnerabilities in the JVM if the vulnerability manifests only when an application exposes a deserialization endpoint to its users. The truth is that almost none of these vulnerabilities are exploitable in the JVM. They were fixed in order to harden the JVM against attacks in case the application exposes deserialization endpoints to unsafe user inputs. For this reason, in the latest Java release, most Serializable components have become immune to attacks in case of exposed deserialization endpoints by applications.
Despite the improvements that such as a change will bring into Java, legacy servers and applications that cannot be refactored or re-deployed on newer releases of the JVM will remain vulnerable. It has become difficult for most organizations to keep pace with Java updates. Oracle’s Co-CEO Mark Hurd recently acknowledged that Java users typically are months to years behind in their patching schedule. Upgrading versions or rewriting apps takes even longer if it is even possible. Recent releases of Java offer a serialization filtering mechanism that could help mitigate some attacks, but this mechanism requires a deep technical understanding of the problem and the application’s internals in order to be properly utilized and configured. Alternatively, applications can be protected against such attacks using a virtualization-based RASP technology that requires no configuration, profiling, tuning or source code changes.
Finally, even if serialization support is dropped in a future release of Java, organizations may still have cause for concern as deserialization vulnerabilities are not unique to the JVM. Python, Ruby, PHP, and .NET are also affected by deserialization vulnerabilities.
Java remains, by a wide margin, the most popular platform language in the world today. Oracle’s plan to improve the JVM’s serialization facility is definitely a positive one. However, this alone does not suffice to completely eliminate the scourge of deserialization vulnerabilities.
About the Author: Apostolos Giannakidis is a Security Architect at Waratek, a leading application security provider. Apostolos drives the research and the design of the security features of Waratek’s RASP container. Before starting his journey in Waratek in 2014, Apostolos worked in Oracle for 2 years focusing on Destructive Testing on the whole technology stack of Oracle and on Security Testing of the Solaris operating system. Apostolos has more than 10 years of experience in the software industry and holds an MSc in Computer Science from the University of Birmingham. Apostolos is acknowledged by Oracle for submitting two Java Deserialization vulnerabilities that were fixed in the Oracle January 2018 CPU and is featured on Google’s Vulnerability Reward Program Hall of Fame.