Ray Coulombe is Founder and Managing Director of SecuritySpecifiers.com, enabling interaction with specifiers in the physical security and ITS markets; and Principal Consultant for Gilwell Technology Services. He can be reached at ray@SecuritySpecifiers.com or through LinkedIn or followed on Twitter at @RayCoulombe.
Video analytics algorithms and performance have gotten better, marginal companies have fallen out, and prices have become more reasonable. Apart from this, I had not seen any recent drastic changes until I chaired a panel at the recent ISC West show entitled “Control Center of the Future.” The panel included a compelling presentation on behavior-based analytics by Wes Cobb, Chief Science Officer of BRS Labs. What caught my attention was the concept of identifying aberrant or unusual behavior, not with respect to pre-set rules, but with respect to an automated “learned” normalcy.
Most video analytic technology has evolved from machine vision technology, as used in manufacturing automation. Along the way, technologies such as artificial intelligence have helped to create varying degrees of product differentiation. The ultimate objective of any analytic is to either identify anomalous situations or behavior, to provide an alert, or to yield some statistical analysis of a scene; however, almost all seem to rely on pre-defined criteria or rules to define what is “normal” and what is not.
Recently, I wrote in this column about video synopsis technology designed to pack events into a compressed time frame, allowing the human brain to more easily fix on situations of interest. So, if all of these technologies are supposed to be able to make it easier for the human brain to react, why not try to simulate the human brain more closely to provide even more pre-processing prior to human intervention?
Following another line of thought, TSA is apparently going down the behavior path through a “chat down” — a form of profiling involving a brief interview with security officials trained to spot suspicious behavior. According to one NPR report, TSA has a list of about 35 things they are listening and looking for, including facial expressions. As TSA officers obtain more and more real-world experience in these techniques, it’s logical to expect that they will get better at it and better able to flag erroneous reactions or unusual behavior.
Automating a system to learn what is “normal” requires an analysis of an overall scene dynamic and flagging instances that don’t normally occur — not because they have violated a rule, but because you don’t normally see that behavior. It is possible that no rules have been written to detect that particular behavior. This is akin to the human brain saving memories and building up long-term impressions of what has been seen before so that it knows that something is not right.
The underlying science is based in “neural networks”, described by Wikipedia as “composed of interconnecting artificial neurons (programming constructs that mimic the properties of biological neurons).”
The system from BRS Labs, for one, involves, at a high level, a two-part process. The first is analyzing the scene. Using computer vision techniques, the system plays its own version of “20 questions” to examine different visual properties of the scene with the ability to add special features. Techniques are used to detect, classify, track and report the positions, speeds, accelerations, sizes and shapes of objects in the scene. The scene is presented as metadata, continuously streamed to the “brain” for learning. This is only about 20 percent of the processing that is involved.
The other 80 percent is the learning part. The complex algorithms which attempt to mimic the long-term memory of the human brain create an impression of what’s normal and flag the exceptions. It is almost like the subconscious of a child, which knows that climbing onto that kitchen counter is not right; or of the child’s mother, who spies the stepstool next to the counter or chocolate smeared on the child’s lips.
The technology works best where there is a stable scene to evaluate — meaning one produced by a fixed camera or a pan-tilt camera with established pre-sets. The technology can be applied to megapixel cameras, but the sheer volume of data increases the required computational power unless the scene is downsized or down-sampled. Also, certain one-off or never seen before events may be a challenge, when the scene has not had enough longevity to allow any meaningful learning to have taken place.
Learning “normalcy” can potentially provide valuable information about trends, such as time of day activity, flow density, and dynamics associated with particular events. Also, the data analyzed and learned does not necessarily need to come from video, but almost any sensor from which analysis and metadata generation can be accomplished.
We are not likely to see the human assessment step eliminated anytime soon, but the more that human brain can move from the mundane to the exceptional, the more productive it can be.
Ray Coulombe is Founder and Managing Director of SecuritySpecifiers.com, enabling interaction with specifiers in the physical security and ITS markets; and Principal Consultant for Gilwell Technology Services. He can be reached at ray@SecuritySpecifiers.com or through LinkedIn or followed on Twitter at RayCoulombe.