Eye on Video: Pixel-based intelligent video

Understanding how intelligent video works -- at the pixel-by-pixel level


There are a number of intelligent video (IV) applications available today to assist security personnel in analyzing surveillance video. These video analytics extract and process information from a video frame in different ways and apply varying decision-making rules for subsequent action. Before deciding which IV algorithm is best suited for a particular installation, it is important to understand the technical capabilities of each. Basically, video analytics fall into three broad categories:

- Pixel-based intelligent video applications analyze the individual pixels in the surveillance images and are typically used for video motion detection and camera tampering applications.

- Object-based intelligent video applications recognize and categorize objects in an image and are commonly used for classifying and tracking objects in the camera's field of view.

- Specialized intelligent video applications combine pixel- and object-based intelligence to process video for specific applications such as number/license plate recognition, facial recognition, and fire and smoke detection.

This month's article focuses on pixel-based intelligent video applications. Upcoming articles will discuss the remaining two categories.

Fundamentals of intelligent video
At a basic level, intelligent video analysis software analyzes every pixel in every frame of video, characterizing those pixels and then making decisions based on those characteristics. In pixel-based motion detection, for instance, the system triggers an alert when a specific number of pixels change in size, color or brightness.

Blob recognition takes pixel change detection to the next level of intelligence. Essentially a collection of contiguous pixels that share particular characteristics, blobs have boundaries that delineate them from other parts of a video frame. Blobs can be identified and characterized as being particular object-such as a person or a car-by analyzing its shape, size, speed or other parameters. Object recognition applications employ the most sophisticated algorithms to classify and track the movement of specific objects (or individuals) within a video frame.

There are a number of factors to weigh in deciding whether to centralize this video intelligence at a server or distribute the intelligence processing to the surveillance system endpoints, such as the network cameras and video encoders. Each has its own pros and cons. (For more information on intelligent video architecture, see the previous article in this series: "Intelligent Video Architecture: Deciding whether to centralize or distribute your surveillance analytics.")

Motion detection
Video motion detection is the most prevalent pixel-based IV application in video surveillance because it reduces the amount of video stored by flagging video that contains changes and ignoring unchanged frames. This selectivity gives security personnel the option to store key video for longer periods on given storage capacity. The technique is also used to flag events, such as someone entering a restricted area, and send alerts to operators for immediate action.

How it works: In video motion detection applications, software algorithms continually compare images from a video stream to detect changes in an image. Early motion detection applications simply detected pixel changes from one frame of video to the next. While this schema certainly reduced video storage requirements, it was not very useful for real-time applications because it generated too many false alarms. Minor light changes, slight camera motion or even a tree swaying would raise an alarm.

Today's more advanced detection systems have the intelligence to excluded pixel-based changes from known sources such as naturally changing light conditions based on the time of day, or other known and unthreatening repetitive changes in the camera's field of view. With the exclusion of these regularly-occurring phenomena, the number of false alarms has dropped dramatically. These advanced systems also have the intelligence to group pixels together to constitute a larger object, such as a person or car, which further decreases the number of errors in motion detection.

This content continues onto the next page...