Video analytics, once a pariah technology across the security industry due the inability of some vendors to deliver on the promised hype, have experienced a renaissance over the past several years driven by an evolution in artificial intelligence (AI)-powered algorithms.
While some question whether these advances constitute a “true” AI solution since computers have not yet achieved the capability to examine a scene and make determinations about potential security threats without first being trained what to look for, it’s undebatable that machine learning has progressed. Analytics can now make an impact helping humans wade through the mountains of video data generated daily quickly and efficiently. Not only are today’s intelligent video analysis offerings helping organizations reduce false alarms and proactively identify potential threats, but they are also being leveraged to improve business operations and the bottom line.
Despite these technological gains, however, many misconceptions remain about the technology throughout the market.SIW: How have video analytics evolved in the security industry and where do you see the technology heading?
Ploix: That's a big question and I'm going to try to break it down, if possible. Historically, you would buy cameras and then you would need to put some analytics on those cameras and more and more rules have been created in those cameras over time. So, you have rule for motion detection, line processing, and all those different things. And when those analytics were created about 30 years ago, they were going directly to the police, which was just two guys in a room getting millions and millions of alarms. At some point, they were like, “No, we can't do this. This is just too much.” So, third-party monitoring companies then became the filter between the cameras and the police or security, and they would review what was going on and only the real threats would be reported. That was the paradigm what was created about 20 years ago.
But with the advent of AI, you needed bigger machines. The required processing power would not fit one camera, so we went from just having cameras that were standalone to having big servers connecting to those cameras or on the sides where you would fit more advanced roles. You would still have the same motion detections, the same line crossings, but you would have a lot more maybe human detection and those sorts of things - in order to serve the same purpose to send that to security monitoring centers, which would filter those alarms. What we realized about five or six years ago, was you could do some of this in the cloud which offered several advantages. One, instead of being limited by a couple servers, you could have your existing infrastructure but then also have those feeds uploaded to the cloud in order to do the last layer of analytics. And because the cloud is elastic, you can replicate those machines in a click of a button. You could also be a lot more flexible in terms of what you want to analyze than if you had your all of your setups onsite.
Now, what's going to happen in the next 20 years? Instead of creating some rules, perhaps we will try to train some of the models in order to understand what an anomaly is in the first place. I don't think we're ready yet. I don't think the state of the machine learning is able to understand that to create that rule, so to speak. But in 20 or 30 years, this is really where I see the industry going. Instead of creating a set of algorithms in the computer, you could just pre-program it, install it somewhere and hope that it will filter out and detect the critical anomaly. Again, this belongs more to the realm of science fiction at the moment than the reality, but I think this is where it could go, and this is what we are leaning towards.
SIW: Speaking of training, what is the role of machine learning in creating reliable analytics today and what is the time and cost required to develop a platform or to add new objects for recognition?
Ploix: AI and machine learning has been around for a long, long time. Alan Turing created the first machine learning solution during World War II with the Turing Test. Theoretically, we could invent a machine that would learn from data instead of something that would pre-program some of the rules. When you look at the theory and when you look at the practice – even though machine learning has been around for 50 years – it's only been very recently that it got a lot of the spotlight. For machine learning to be really effective it needs two critical components: a lot of data and a lot of processing power. Some of the algorithms haven’t really changed, it has been the same the entire time. The big revolution came in 2012 when AlexNet was invented, which would recognize for the first time that there is a human or there is a cow - this standard object detection that you can see everywhere now.
When that was first invented, the algorithm was very similar to the things that were done in production five or 10 years ago. The only thing that changed was you had more and more GPUs that could work with special hardware that could process a lot more data. And you also have access to a lot more data. So, every year you have more data available, and you have more processing power available. At the moment, we have machines that are composed of hundreds of GPUs that are able to create models. Five years ago, if you had something with 10 GPUs, it was already something huge but now we are always going for bigger and bigger, and this is great because this is increasing the accuracy.
But in order to achieve what I just mentioned – creating the rules in the first place – I think something needs to fundamentally change in the AI world. We need to be able to either accumulate all that context just like a human would do. How do you get there? I don't know. I think you need to have made one of many breakthroughs to get there. It's not a function of time. It's really a function of breakthroughs.
SIW: What would you say are the biggest misconceptions in the market today about video analytics?
Ploix: It's all about expectations because we, as humans, are really, really good at video analytics. We understand whether there is a person there or not, we understand that this person has a gun and probably has some bad intentions, and we can understand whether somebody is trying to steal something just by looking at his or her behavior. For a computer, it's a lot more complicated because a computer does not understand what a person is. For the computer, it is just a collection of pixels and based on the shape, form, color, texture, and resolution, this shape of pixels looks very similar to another state of pixels in the previous image that the computer has seen before that was a human, therefore, for that computer it's a human in the scene.
If we see a person in the air, for us that would be weird because we know a person is not able to fly, but a computer would not necessarily understand this. All that context that we have accumulated as people make us really good at understanding what are the anomalies and because we are really good, we think it is extremely easy to replicate it. And because we think it's extremely easy to replicate it, we have extremely high expectations on what the accuracy of video analytics should be. Because we have those expectations, we are not very forgiving when a computer confuses a tree with a person. When you want to put a model out, you don’t promise the world. You say what you are going to do and do what you are going to say and make sure that people know the limitations and accuracy of the system.
SIW: Many people see the business enabling benefits of analytics as being more important than the security aspects, especially when it comes to obtaining funding for the technology in their organization. Do you agree with this sentiment or do you think it is off the mark?
Ploix: Sometimes it could help with funding because innovation is always good and should always be nurtured. I think the best way to look at it is go to the customer first and realize what the customer wants is not really those new technologies, rather they want to be secure within their own premise. The reason they put those cameras up is not to track people, it's not to do anything to spy on anyone, but it's to make sure that they have a secured premised, both from theft and intrusions but also – as in the case with construction sites – for safety. You don't want somebody to walk around certain areas of the construction site because it could be it could be dangerous, they could fall, they could hurt themselves and, ultimately, the construction site manager will be responsible. Once you understand this, you should be able to deliver technology in order to serve that purpose.
Joel Griffin is the Editor of SecurityInfoWatch.com and a veteran security journalist. You can reach him at [email protected].