Eye on Video: Intelligent video architecture

July 1, 2008
What to consider when deciding upon centralized or distributed surveillance analytics

Studies have shown that as the scope of video surveillance installations continue to grow, the sheer volume of video being recorded increases the likelihood that operators and security guards will miss incidents or failed to notice suspicious behavior in time to prevent crime from happening. In fact, a 2002 study published in Security Oz Magazine concluded that, "After 12 minutes of continuous video monitoring, an operator will often miss up to 45 percent of screen activity. After 22 minutes of viewing, up to 95 percent is overlooked."

Other studies report that finding incidents in stored video is so time consuming that many businesses actually view as little as one percent or less of the video they record. So despite increasing their surveillance coverage, companies are actually starting to experience greater security risks to their people and facilities.

To help facilities improve their surveillance effectiveness, vendors are developing a host of intelligent video (IV) applications that automatically analyze video data and glean useful information for security personnel. IV systems use complex mathematical algorithms to extract moving objects or other recognizable forms from the recorded video, while filtering out irrelevant images or movement. Intelligent decision-making rules govern the data search to determine if the events recorded in the video are normal, or if they should be flagged as alerts to security staff or police.

Over the next six installations of this column (on SecurityInfoWatch.com and in Security Technology & Design magazine), we'll cover an array of topics associated with intelligent video applications that enhance the value of surveillance systems. We'll begin this series by first focusing on architectural options.

Where to locate video intelligence

There are two broad choices for IV system architecture: you can centralize the intelligence or distribute it to the endpoints. In a centralized architecture, cameras and sensors collect and transmit video and other information to a centralized server for analysis. In a distributed architecture, network cameras, video encoders and network components such as switches, have the intelligence to process the video and extract relevant information.

Centralized architecture options

In centralized architectures, the cameras transmit all the video back to the recording device for processing of the intelligent video algorithms. In infrastructures with analog cameras, this recording device is typically a multi-function DVR. In network video systems, it is usually a PC server with video management software.

DVR-based processing. A built-in encoder card converts the video from analog to digital format and then performs intelligent analysis -- anything from people counting to vehicle license plate recognition. The IV-enabled DVR compresses the video, records it, and distributes resulting alarms and video output to authorized operators. This architecture would be most effective for smaller systems with four to eight cameras.

There are several drawbacks to this DVR architecture:

1. Not scalable or flexible - Built with a specific number of inputs, adding even one additional camera entails adding another DVR, thus incremental expansion becomes costly.
2. Non-supportive of essential network utilities - Proprietary, embedded devices, DVRs cannot be easily networked and do not support risk mitigating tools such as firewalls and virus protection
3. Limited computing power - Traditionally designed to store and view a limited number of cameras, when running newer IV application that require a lot of processing power, DVRs can support only a fraction of the number of cameras for which they were originally designed.

PC server-based processing. Commercial off-the-shelf, PC servers overcome the scalability and flexibility limitations of DVRs by pushing digitization and compression out to the network cameras and encoders. Network camera video goes directly to the server over the network. Video encoders digitize analog camera video before transmitting it over the network to the server. This architecture would work best for medium-sized systems with eight to 16 cameras.

There are several drawbacks to this PC server architecture:

1. Consumes processing power - The server handles many of the processor-intensive tasks such as transcoding the video, managing the storage and processing the video for analysis.
2. Supports relatively few cameras - Since the processing tasks require considerable power, each server can only process video from a relatively small number of cameras.

Distributed architecture options

Instead of overloading a central point such as a DVR or PC server, distributed architectures spread the processing to different elements in the network. This reduces bandwidth consumption and improves system scalability.

Network-centric processing. In typical network video systems, switches and routers send video to appropriate components in the system. As video streams through these gateways, the data can be analyzed and only the essential content or even just the metadata about an image (such as the number of people passing through an area) can be extracted and streamed to the security system operator. This prescreening saves the network from the bandwidth overload that would occur if every frame of recorded video was streamed over the network for analysis at some central point. This architecture would work well in most systems, regardless of size.

There are two main drawbacks to network-centric architecture:

1. High cost - Switches and routers with the requisite additional processing power cost more.
2. Greater complexity - Additional components add design complexity to the network.

Intelligence at the edge. The most scalable, cost-effective and flexible architecture is based on processing as much of the video as possible inside the network cameras or video encoders. Analog cameras, however, lack this 'intelligence at the edge' ability to analyze video. This architecture would be a good match for any size surveillance system running from one to thousands of cameras.

There are numerous advantages to this distributed architecture:

1. Minimized bandwidth usage - Cameras and encoders can be programmed to only transmit video when they detect motion in a defined area of a scene. This dramatically reduces bandwidth consumption and the number of operators needed to review transmissions. They can extract license plate information or headcount from a frame and send just the essential data with a few snapshots instead of consuming bandwidth with several hours of unfiltered video.
2. Reduced server costs - Servers typically process four to 16 video streams in a centralized solution. When cameras do the processing, servers can handle more that 100 video streams. For people counting or license plate recognition applications, the resulting data (rather than the video stream) can be sent directly into a database, further reducing the load on servers.
3. Improved surveillance analysis - When network cameras process raw video data before it is tainted by lossy compression formats such as MPEG-4, the quality of analysis greatly increases. Server processing power is no longer consumed decompressing or transcoding the video packets prior to processing, which would otherwise dramatically increase the number of servers required to process transmissions.
4. Lower operating costs - With fewer servers needed, power consumption and maintenance costs drop. This also removes the burden from environments without server rooms to build special facilities to support their surveillance networks.
5. Lower equipment investment costs - Reducing network bandwidth usage by streaming only essential information (metadata and snapshots) gives users the option to deploy more moderately-priced network components that can easily support reduced data rates.

There are two main drawbacks with this architecture:

1. High processing power required - Not all network cameras and video encoders have enough processing power to run the intelligent video algorithms at the edge. This is something that is being addressed in newer-generation products.
2. Multiple camera inputs needed - Some complex intelligent video algorithms, like multi-camera tracking, would need information from several cameras to work properly, which only makes it possible to run this in a centralized server configuration.

Things to consider when deploying intelligent video

Architecture plays a key role in successful deployment of intelligent video. To ensure the system meets both immediate and long-term needs, the installation should be scalable and based on open standards. The architecture should be one that minimizes the risk of system failure and downtime. It should scale effortlessly from a few to many cameras and intelligently distribute processing to different system components. For maximum flexibility and cost-effectiveness, both the IV and video management applications should be interoperable with system components from different vendors. And the IV system should be as accurate as possible to minimize false positives that put undue stress on security personnel.

Fredrik Nilsson is general manager of Axis Communications, a provider of IP-based network video solutions that include network cameras and servers for surveillance. This story is part of Mr. Nilsson’s “Eye on Video” series appearing in ST&D and on SecurityInfoWatch.com and IPSecurityWatch.com.