Our Man in the Field: The Process of the IP Solution, Part X

Dec. 13, 2005
Charlie takes an in-depth look at the common IP-video compression schemes

In my last column I opened the door to compression engines but didn't really go into much detail about the differences in the ones that we hear about every day. So, let's dive in and see what we can find.

Repeating myself somewhat, we have several different engines that are available to us in the CCTV market. Each one of these compression formats is a standard in the computer world. However, as is true with every good recipe, you’re not really cooking until you put your own variation to it. Therefore, if a manufacturer says that they’re using MPEG-4, because it is the only really good compression scheme out there, what they mean is that they have taken the MPEG-4 standard and modified it to fit their interpretations of your needs. Are their interpretations accurate? Who knows?

Let's start with JPEG. This is an acronym that stands for: Joint Photographic Experts Group. Pretty cool, huh? This compression engine records every frame of video or every picture. It compresses by discarding the fine details of the image. To do this, it takes combinations of pixels and combines them. If you have a block of four squares (pixels) and each is a different color, the JPEG engine will combine the four pixels into a single one that is a rough average of the four colors combined. That is to say, if you can see the hairs on a man’s head, the resulting compression in JPEG would take away the individual hairs and give you a guy with dark hair ... sort of. The amount of compression in JPEG is determined in percentages. You can compress an image from 1 percent to 99 percent. Each percentage of compression relates to the percentage of fine detail that will be glazed over or combined. If you compress an image by 50 percent, it will literally appear to be 50 percent smaller on the screen. Please understand that this means 50 percent of the horizontal and the vertical, so the image is greatly reduced. Looking at the smaller image, it appears that nothing has been lost.

But then the time has come to cut brass tacks and enlarge the image for closer scrutiny. What you will notice, more and more with every enlargement is what is referred to as the "Jug head" affect. That is to say the edges of various objects in your image will become blocks or squares. There is no room for a curve or slow radius or circle in JPEG. Motion JPEG or M-JPEG is a variation of the JPEG standard. So are all other formats that have the JPEG mantra in them.

Wavelet is a simple adaptation of the somewhat shaky JPEG 2000 standard and it has literally an infinite possibility of variations. This is because anyone can write a wavelet algorithm and everyone has. The key difference between the wavelet compression and its cousin JPEG is that wavelet does not tile. That is to say that it does not combine blocks of pixels into a single pixel. It looks at the frequency of the whole frame or image. High frequencies are noise and low frequencies are black. It applies combinations of these frequencies from the bottom of the image up. The net result is that as you magnify an image that has been compressed with a wavelet format, the image appears to go out of focus as opposed to becoming blocks.

The bottom line is that wavelet engines are capable of doing massive compression, but you still loose the fine details in the image. As for what wavelet stands for in the world of acronyms, I'm not really sure. I think it was thought up by some surfer dude while shooting a tube. [Editor’s note: It’s actually referring to a mathematical principal of the decay of oscillating waveform, though the surfer reference is easier to remember – kind of like a crashing wave.]

MPEG which stands for Motion Picture Experts Group is one of the more pronounced forms of video compression is this day and age. We have the MPEG 1, MPEG 2, MPEG 3, MPEG 4, MPEG 7, and MPEG 21. For the most part, video uses MPEG 1, 2 and 4.

MPEG works on the principle of using three basic structure pieces. We have the "Intra Frames" which are referred to as the “I” frames. These are whole, uncompressed images of the whole scene. Next up are the "Bi-Linear" or "B" frames. These are recorded images of those objects in the scene that change ... that is to say, what moves. Lastly, we have the "P" frames. These frames are used to do comparisons between the preceding I-frame and subsequent B-frames.

OK, so let's try and put this complicated little puppy into action. The amount of compression will be determined by the number of B-frames that are interjected between each I-frame. The P-frame is injected at the end of the B-frame string and looks back to the previous I-frame and jots or pushes the objects of the recorded B-frames back into position as based upon their actual location in the overall scene. Despite the complicated sound of the process, MPEG can be an extremely efficient form of compression. File sizes will range from one point four (1.4) to twelve (12) kb.

However, here are the potential problems with MPEG video. First, there is no time, order or date stamp applied to the B-frames. Therefore, it is conceivable that I could take an image of you walking along the sidewalk and plant you into a video stream. No one would be able to prove it one way or the other. You would not appear in the I-frames, but what the heck. The bottom line is that in some court cases, only the I-frames would be allowed for viewing by a jury, as they are the only pieces of information that can be proved as whole or un-tampered. The second problem is that if a B-frame becomes corrupt and you are working with heavy compression (more B-frames), the corruption will remain and worsen until the next I-frame erases the corruption. So we end up with the famous car rolling down the street when it breaks in half and the rear portion of the car remains on the screen with other cars driving through it. This is caused by image memory, also known as bleed over. So, the more compression, the more room for corruption ... it sounds like city politics, eh?

H.263, H.263+, H.263++ are all variations of the original standard for video telephony or video conferencing via the internet. It starts by processing images in much the same way as JPEG, but then it adds the twist of "Motion Compensation Prediction". This means that is sees an object moving and literally maps its position into future frames in "anticipation" of the vehicle reaching the next point. This makes H2.63 very fast. It also makes it very susceptible to problems with sudden movements. Granted a car is fairly easy to predict in most cases. However a human waving her arms around in a frantic flurry may cause the engine to fail in its ability to predetermine the position of the hands and such, and so drop them from the recorded, compressed image completely. In actuality, the background shows through the fluttering hands so they appear to disappear. File sizes range between point five four (.54) to eight point seventy seven (8.77) kb.

The latest and greatest compression schemes are of a truly digital format. The designers are literally converting the visual information into a form of text ... pixel by pixel. I know this seems odd, but think about it. I literally write a picture. Pixel one is red, pixel two is green, pixel three is blue, etcetera. I can store hundreds or thousands of pages of text into the same space as it takes me to store fifty images. The net result is that the creators of these compression schemes are claiming 2,400:1 compression factors with up to two (2) CIF playback. I realize that I have over simplified the process, but not by much. I have basically told you that you can cut your two or three terabytes storage requirement to one hundred gigs or less. The problem is that these systems are extremely proprietary, so you may have to supply some translation when your surveillance video appears in a court room.

OK, I think that I have proven my case that all compression schemes are crap (as was my tenet in the last column). Anytime you take something that offers the requirements of detail and you compress it, you must give up something. So what do you do? You can't afford to work without some format of compression. You work with the compression scheme that best suits your application. You determine which one of the compression schemes is best for you through, on-site demonstration and testing. You cannot go to a show and pick the one that is best for you unless you have a fairly standard application with few or no surprises. Buyer beware.

See you next time when we cover more of “the IP Solution”.

About the Author: Richard R. "Charlie" Pierce has been an active member of the security industry since 1974. He is the founder and past president of LRC Electronics Company, a full service warranty/non-warranty repair center for CCTV equipment. In 1985, Charlie founded LeapFrog Training & Consulting (Formally LTC Training Center), a full service training center specializing in live seminars, video-format certification training programs, plain language technical manuals and educational support on CCTV. He has also recently become the director of integrated security technologies for IPC International, Corp., a firm which provides major retail (mall) surveillance solutions. He is an active member of ASIS, ALAS, CANASA, NBFAA, NAAA and SIA. He is the recipient of numerous security industry awards, and is a regular contributor to Security Technology & Design magazine. Look for his columns to also appear regularly via SecurityInfoWatch.com and this website's e-newsletters. He can be contacted via email at [email protected].