The long road to IP video standards

ONVIF and PSIA are giving end-users what they want: Usable specifications and the end of proprietary video systems

ONVIF functionality can be grouped into the following areas:

• Device discovery — so that a VMS can find ONVIF devices across the network and ignore all others;
• Device management — to set up the device, including its IP address, system time and categorize it as fixed or PTZ;
• Imaging / Media configuration — to understand the current video quality settings and then set them, e.g., 30 frames per second at 4CIF, with a bandwidth limit of 2Mbps;
• Real-time streaming — to stream video, audio and metadata from the IP camera to the VMS;
• Event handling — to receive alarms and other events; and
• PTZ control.

The ONVIF specification will eventually lead to the creation of an SDK that anyone can use to leverage the benefits of ONVIF without implementing it from scratch. Different manufacturers will create these SDKs — some will be more programmer-friendly than others — that should all provide the same ONVIF functionality.

An ONVIF SDK is not mandatory because of the use of an open standard called “web services,” where the interface itself is already described in the Web Services Description Language (WSDL) files from ONVIF. Additionally, ONVIF offers software test tools that enable developers to test for ONVIF compliance.

Streaming Compatible Video

People often mistakenly think that ONVIF defines how to stream video and how that video should be compressed. Actually, ONVIF simply re-uses existing industry standards for both. Streaming is done in the ONVIF specification via the well-established Real Time Streaming Protocol (RTSP). All ONVIF has to do is configure the devices to use the right settings, and RTSP does the rest. When the video reaches the destination — in our case, the VMS — it needs to be displayed, or “decoded.” There are numerous software decoders available off the shelf that you can use to view the streaming video, including Apple’s QuickTime and VideoLAN’s VLC player.

Here’s how it works: the VMS requests the status of the streaming settings and connects to the video stream. This is done by requesting the streaming Uniform Resource Identifier (URI) address to send video streams and connect to this URI. The VMS then plays the video via its integrated decoder or an RTSP-compatible decoder like QuickTime. We can use the same streaming address to record the video on a network video recorder or an ONVIF-compliant hybrid recorder.
Because of historical issues with incompatible video formats, referred to as CODECs, people often assume that ONVIF would have to specify a standard for compression. Instead, the ONVIF specification says you can use H.264, MPEG-4 (Part 2) and JPEG. The compression algorithm used in a real deployment will depend on what the camera can deliver and what the VMS can handle, but the wonderful thing is that JPEG is the bare minimum — a camera or VMS cannot declare ONVIF compatibility without supporting at least JPEGs.

An important note about video quality, especially MPEG-4 and H.264: both CODECs enable manufacturers to compress video without losing too much detail while minimizing bandwidth and storage. Manufacturers differentiate from one another by using the tools in different ways, which means that it is common for two IP cameras to present completely different image qualities and need different bitrates in a given scenario. Fortunately, this does not require a manufacturer-specific decoder at the other end, since all decoders can handle these streams. However, just like IP camera manufacturers vary in the quality of the compressed image, software decoders vary in the quality of how they decompress.

It is shocking to many that an IP camera streaming H.264 video can look completely different depending on the software being used to decode the video — even if they are running on the same PC. All H.264 decoders will work, but some look better than others. This is why it is so important for a VMS to use the best possible decoder, and this is a major point of differentiation between VMSs that cannot be articulated in datasheets, PowerPoint slides or advertising. Seeing is believing.