Eye on Video: Specialized intelligent video applications

In the previous two months, I devoted columns to two broad categories of intelligent video (IV) applications: pixel-based video intelligence and object-based video intelligence. (See the articles Applying Pixel-Based Intelligent Video and Applying Object-based Intelligent Video). Pixel-based IV applications generally apply to detecting motion and camera tampering. Object-based IV applications are generally used to classify and track objects and people. A third category of IV applications combines techniques from both pixel- and object-based analytics to extract information from a video for specialized applications. These specialized IV tasks cover the range from automatic recognition of license plate numbers and individual faces to smoke and fire detection.

License plate recognition (LPR)
Having the intelligence to automatically recognize a license plate number can boost a facility's perimeter and interior security. It can also provide a wealth of information for astute retailers to leverage for target marketing.

• Access control. LPR could be used to restrict entry to vehicles with particular plate numbers, an appropriate application for high-security locations like embassies.
• Criminal investigation. LPR could help law enforcement locate a vehicle suspected of being involved in a crime by automatically look for vehicles with a particular plate number.
• Intelligent parking. Using LPR to automatically track vehicles entering a parking lot would be less expensive than a parking ticket kiosk. The technology could also automatically monitor how long a particular vehicle stays in a parking lot, which could help attendants determining if a vehicle has been abandoned or a shopper has monopolized reserved space beyond the legal time limit.
• Retail marketing. LPR could help retailers identify frequently cars parked in front of certain stores. This information could be used to analyze shopper demographics or design marketing programs to reach consumers in the right geographical areas.

How it works
License plate recognition is a multi-step process:
1. Find the car in an image - either through blob recognition or video motion detection.
2. Isolate the actual license plate - through object recognition and classification.
3. Extract the letters and numbers from the license plate - through image analysis.
4. Transform the collection of pixels into a stream of letter and numbers - using optical character recognition.
5. Store the resulting string in a database or compare it with existing entries - through data processing.

Challenges in deployment
Bad weather, light shining from headlights, dirty or bent license plates can all affect the accuracy of the analysis. Also, license plates have different physical attributes in different parts of the United States and around the world. So the IV application needs to be customized to local conditions and fine-tuned to specific implementations. Many integrators use specialty cameras in order to provide their clients with the best images possible.

Facial recognition
The profusion of crime scene investigation shows on television has made facial recognition one of the more high-profile applications of intelligent video. Police use the technology to receive alerts when certain people of interest are seen in public places or in sensitive areas. Companies use the technology to enhance access control, allowing only certain individuals to enter specific areas. Forensic investigators use the technology to search for individuals in stored video recordings. Casino owners use facial recognition to catch blacklisted players. And border agents improve checkpoint control by augmenting manual passport checking with automatic searches for individuals of interest.

How it works
The process for facial recognition is similar to automatic license number recognition, except that you cannot pre-define what the system should look for, such as a string of numbers and letters in a specific order. Instead, the facial recognition system draws on a database of "wanted" faces, such as a passport photo database or a police register. Getting that data is often the biggest challenge in deploying the system. Once that database is accessible, facial recognition undergoes a multi-step process to match the video image to a stored picture:
1. Find the person in the image.
2. Isolate the face from the rest of the body.
3. Locate and identify the various features - eyes, nose, mouth, chin, skin color, hair color, etc.
4. Construct a unique pattern of an individual face.
5. Match the extracted face with signature information from a database to identify the individual.

In some applications, simply finding a face is sufficient. For example, in an airport, the IV application can be used to measure the queue time between entering and exiting a check-in point. In this case, the actual identity of the individual is not important, just the ability to separate individuals from each other.

Challenges in deployment
Facial recognition applications are particularly challenging to deploy. Even under perfect lighting conditions, people generally move around and block each other. Appearances change over time-or can be easily modified with glasses or hair dye, a wig or facial hair. Also, people who freely move about rarely look straight into a camera. Three-dimensional recognition compensates for this problem by extracting 3D information from video streams and matching it with a database.

Fire and smoke detection
Fire and smoke detection systems search for visual cues of fire and/or smoke in the video stream. They react as soon as flames are visible in a room-or when reflected firelight is detected from an obstructed view-without having to wait until a certain level of ambient smoke appears or a pluming smoke cloud reaches the ceiling. This gives facilities managers a much earlier warning of problems than a simple smoke detector. But liability issues stemming from archaic fire codes and the labor intensity of sending engineers on-site to customize the algorithm's parameters to the specific installation have limited widespread adoption of the technology.

How it works
The IV systems processes video images and reacts when it detects the combination of color, light and movement that typically indicate the presence of fire and/or smoke. This could be the flickering frequency of certain pixels, the spatial dimensions of certain blob contours, or the existence of turbulent phenomena. Once detected, the IV system sends an alarm and live video images to the guard station or alarm center. Besides supplying vital situational information, the recorded video also provides forensic evidence for future fire investigation.

Challenges in deployment
The technology still faces a number of challenges. Foremost is matching images to an endless variety of smoke patterns. For instance, the visual pattern of smoke in a windless, open space differs from a noticeably windy environment. Upward smoke puffs look different from smoke spreading horizontally or downward. Because of the difficulty in adapting to light conditions, density and background scenes, setting thresholds for smoke detection can be very subjective.

Addressing privacy issues
Some view IV applications such as facial recognition and people counting as invading people's privacy. One way to counter this concern is to regularly purge the actual pictures or videos of the faces. In some ways, IV applications can even enhance privacy. A people tracking application, for example, may be able to find and mask out all the people in a video surveillance recording of a public area and then allow only law enforcement personnel to "unlock" these images when they are required in an investigation.

Keep in mind, however, that some countries place restrictions on audio and video surveillance. Before designing and installing any IV application, be sure to check with local authorities first to determine what is permissible.

Pointers for successful deployment
As with other intelligent video applications, whether you decide to deploy this video intelligence at the server or distribute it to your surveillance system endpoints depends on the equipment you use and the demands of the environment in which it is being operated. (For more information on intelligent video architecture, see the SecurityInfoWatch.com article Intelligent Video Architecture: Deciding whether to centralize or distribute your surveillance analytics.)

Beyond that decision, there are several technical factors to consider that will help you improve the accuracy of your facial recognition application, such as where you place the camera, the resolution of the camera lens, and the lighting conditions in the surveillance environment.

As was mentioned in the two previous months' articles on pixel-based intelligent video and object-based intelligent video, no IV application is infallible. When deploying this technology, you need to set realistic expectations. Reaching 95 percent accuracy is very challenging; achieving 99 percent or beyond can be extremely difficult and costly in a real-world environment. But with careful adherence to some best practices, installations can achieve between 90 and 95 percent accuracy.

About the author: Fredrik Nilsson is general manager of Axis Communications, a provider of IP-based network video solutions that include network cameras and video encoders for remote monitoring and security surveillance. He can be reached via email at fredrik.nilsson@axis.com.