Key considerations when selecting a video compression algorithm: Part 1

A comparison between H.264, MJPEG and other common compression schemes for surveillance video


With the emergence of H.264 and other types of compression, and the growing abundance of sales and marketing claims, it is increasingly difficult to determine what type of compression is right for your application. To make some sense of all the competing claims, you first need to understand the basics of how compression works.

There are two basic types of compression: frame-by-frame compression and temporal compression. Frame-by-frame compression takes a full picture for each frame, compresses it, and sends each picture one after another in a stream. The most popular frame-by-frame compression is Motion JPEG (MJPEG). It is widely used because it produces the highest quality video, is very simple to decode, and is non-proprietary. MJPEG's Achilles' heel, however, is that to achieve high image quality on every video frame, it produces relatively large files, which means it requires more bandwidth to transmit and storage to record.

Temporal compression algorithms were developed to stream video using less bandwidth by trading off image quality, as well as bandwidth and storage consistency, in many typical surveillance applications. In simple terms, temporal compression works like this: whereas MJPEG will take 50 "full" pictures to make 5 seconds of 10 frame-per-second (fps) video (5 seconds X 10 fps = 50), temporal compression algorithms will take 10 or fewer "full" pictures, usually referred to as "key frames", to reproduce 5 seconds of 10 fps temporal compression video. This is done by using mathematical calculations to predict what may change between the key frames, and then sending only what is changing instead of sending the "full" picture. If there is little or no motion between frames, temporal compression will simply reproduce the key frames, which in the example above is a 10:50 ratio or 80 percent more efficient than MJPEG. The efficiency of temporal compression comes with a caveat: Bandwidth and storage requirements can be highly variable depending on the scene recorded. To illustrate this think about a camera that is panning, tilting, or zooming. The entirety of the scene is changing with each frame, forcing the temporal compression to work overtime to interpret what is happening and then calculate what images to send. The result is a large increase in bandwidth and storage required but with marginal video quality to show for it.

So is temporal compression better than a frame-by-frame compression for your application? That will depend on a variety of considerations, but before we dive into those considerations let's look at the most promising of the temporal encoding algorithms, H.264.

What is H.264? There have been volumes written about this so we won't go into much detail, except to say it is the most sophisticated temporal encoding algorithm available in the market today. In some cases, it can deliver images that approach MJPEG quality. It does this by using more sophisticated mathematical algorithms than its temporal predecessors like MPEG-2, MPEG-4 and H.263. These mathematical algorithms are better able to predict motion changes between key frames so the video is more likely to be accurate with less blurriness and "blockiness" than the older temporal compression techniques.

 

H.264 Profiles

There are 8 types of H.264, referred to as profiles:

H.264 profiles for video compression

Each manufacturer customizes its H.264 profile to suit its needs, which means no two H.264 profiles are created equal. The decision on which to go with is driven primarily by three variables:

  1. Desired image quality
  2. Available bandwidth/storage
  3. Available processing power.

Make a note of those variables as they will ultimately impact your decision on which compression methodology is best for your application. Most H.264 camera manufacturers utilize either Baseline, Constrained Baseline or in some cases, a much lower performance profile with many quality features simply "turned off" because they do not have the computer power in the camera to support the higher quality features.

This content continues onto the next page...