Researchers hack AI video analytics with color printout

May 10, 2019
Simple experiment demonstrates potential vulnerabilities in machine learning solutions

The evolution of Deep Learning algorithms in recent years has led to a proverbial boom in the development of machine learning-based video analytics. This new generation of Artificial Intelligence (AI) analytics has been hailed by many as the breakthrough innovation that will drive the industry forward from a technology perspective.

Unlike the previous generation of rules-based video analytics that largely overpromised and under-delivered, this latest crop of analytics engines – with their ability to “learn” and distinguish between people, vehicles and objects with a high-degree of accuracy – has once again captured the imaginations of the industry.

However, there are clearly still many hurdles the technology must overcome to gain widespread acceptance by security integrators and end-users; in fact, researchers at the Belgian university KU Leuven recently demonstrated that they could fool the human detection capabilities of AI video analytics with something as simple as a printed color patch.

Looking to build on previous research that showed how a patch could be placed on a stop sign to throw off object detectors in things such as autonomous vehicles, the researchers wanted to see if they could extend this premise to person detection and security applications. Despite the fact that people introduce much more variation into the equation than a stop sign via different clothing, skin color, poses, etc., Wiebe Van Ranst, who conducted the research along with Simen Thys and Toon Goedemé, says they were still able to effectively fool the analytics engine in a number of different scenarios (Download the full research paper here).

“Object detectors these days are used in all kind of applications, this includes applications where someone might perform a targeted attack (e.g. security systems). What we wanted to know is how easy it is to circumvent these systems,” Van Ranst says. “Our intentions here are, of course, not malicious. What we demonstrated in this work is that it is indeed possible to circumvent person detectors using adversarial attacks. One should not blindly trust a detection pipeline in security-critical applications.”

Implications for Security

Although person and object detection technologies have made great strides in recent years, Van Ranst says their research shows they are not infallible and can indeed be compromised. “What we touch on in this work is that high detection accuracy is not all that is important for a security system; we showed that it has vulnerabilities,” Van Ranst says. “While the current state of our research still has limitations, what it highlights is that there is a possibility of targeted attacks on person detectors in security systems. This should be kept in mind when unrolling such a system and, if necessary, defended against.”

Brent Boekestein, CEO of AI video analytics developer Vintra, says that research like this is good for the industry. “Research like this, into AI and computer vision models, is good for the industry overall and something we love to see because it represents work being done to advance the trustworthiness of these models,” he says. “Research sparks important conversations and provides an excellent platform for education.”

Additionally, Boekestein says that the research also demonstrates the importance of using video analytic solutions built on proprietary models versus those that are open source and publicly available like YOLOv2.

“These proprietary models serve as a barrier as they limit any ‘bad actors’ ability to easily obtain and understand how to defeat,” Boekestein adds. “Moving forward, we’re excited by the ways firms like ours and the research community are collectively working on new ways to protect the integrity of these models so that their trustworthiness and the faith we place in these systems may continue to improve.”    

For those companies involved in developing video analytics, Van Ranst recommends keeping the weights of your algorithm private or training datasets to make them harder to hack. “If that is not good enough, using methods to actively resist these patches might also help,” he says. “You could, for instance, retrain your person detector while including images of people holding these kinds of patches in the training dataset. While adversarial attacks are currently an active area of research, and more and more research interest is going into covering these holes, it would always be hard to say for sure that no attacks are possible.”

A Closer Look at the Experiment

According to Van Ranst, one part of the experiment involved leveraging a machine learning approach to generate patches that they then digitally rendered onto an image to try and fool the analytic algorithm – in this case, the YOLOv2 architecture trained on the MS COCO dataset, an open source, academically-available algorithm and object detection training database.   

“We tried a few different configurations,” Van Ranst explains. “One was to make the detector think that it saw something different from a person. We tried to minimize the class ‘person,’ which resulted in a patch being generated that was recognized as a different class in the model (e.g. teddy bear instead of person). Another approach, which we found worked best, was to make the detector think that no object was present at all. For this we minimized the object score which tells us how likely it is that there is an object in the scene.”

In most cases using these computer-generated patches, the researchers were able to successfully hide the person from the detector (see image below).

Image courtesy Simen Thys, Wiebe Van Ranst and Toon Goedemé         

Going beyond this, however, the researchers wanted to test if a printed version of a patch would work in a real-world application, which they were also able to demonstrate successfully in this video: 

Admittedly, there are a few limitations to the research carried out by Van Ranst and his colleagues. First, the patch only works when it is facing the camera, and Van Ranst says there is some “robustness to rotation and scale” that can render it ineffective if there is too much of either. In addition, because the experiment was carried out on a specific analytics configuration, Van Ranst explains that you would have to generate a new patch for each configuration you encounter in the real world.

“To hypothetically do a successful attack on a surveillance system using our approach, one would have to know which detector and model is used,” he says. “Moreover, our present patch pattern optimization procedure needs open source access to the network architecture and weights; it does not work for black box software.”

However, Van Ranst says they are currently looking into how they could make patches more general and thus more suited to fool multiple detectors simultaneously as well as how they could compromise proprietary analytic solutions. “A possible next step would also be to generalize the patch to something that works when it is transformed much more: rotation, scale and perspective, and also generating a pattern that can cover more of the body so that it can be used for cameras looking from different angles (think a T-shirt covered with a complete pattern),” he adds.  

About the Author: 

Joel Griffin is the Editor of SecurityInfoWatch.com and a veteran security journalist. You can reach him at [email protected].