Video Surveillance: The Sound of Security Success

Nov. 2, 2013
How to integrate audio into a video surveillance system

While a picture is worth a thousand words, the spoken word can be worth much more in the physical security world. Video only tells half the story, regardless of how many megapixels are captured. Audio cues separate perception from reality.

Having audio as an integrated part of a video surveillance system can be an invaluable addition. A cry for help, the sound of breaking glass, a gunshot or an explosion in the vicinity of a camera — yet outside the camera’s field of view — could escape notice without audio.

Still, integrating audio into a video surveillance system is still not widespread. And frankly, excluding new compression methods, better quality microphones and improved analytics, not much has changed since I wrote a similar article for five years ago. Nevertheless the audio technology available today is more than enough to improve surveillance intelligence. It’s all about implementation and knowing when to add audio.

When to Add Audio

Outside of legal ramifications, audio uses in video surveillance are limitless. That’s not to say that audio capabilities should be suggested as part of every RFP, but if there’s an idea for how audio can improve a system, then there’s a solution.

First and foremost, audio detection complements video motion detection to react to events outside of the camera’s field of view or in a low-light situation when a highly light sensitive camera is not used. Video recording can be initiated when audio hits a certain decibel threshold, and it can be used to turn on lights or other systems.

Visitor management or intruder communication could also be an important auditory need for the customer. Instead of investing in an entire intercom system, the surveillance system could be used to meet the needs of communicating to visitors or warding off intruders.

Additionally, audio can be integrated into the surveillance system for remote monitoring of a restricted area or in a remote helpdesk capacity at, for instance, parking garages or bus stations.

IP-based access control systems provide another medium for leveraging audio. In a perfect visitor management system, all three — video, access and audio — combine for a true integrated solution with video and audio verification.

Then of course there are non-security uses of video surveillance with audio that can open doors to new business opportunities: telemedicine, distance learning, teaching tool packages, video conferencing, concert streaming and so on. Plenty of traditional security integrators have harnessed the power of video and audio to create completely separate and successful business units.

And as with any conversation today about surveillance technology, the focus often shifts to intelligence.  Audio intelligence can tell a camera to record on the sound of a voice, broken glass or a gunshot. It can enact an alarm to sound or alert a business owner to a potential situation. It can even instruct a PTZ camera to focus on a specific area of a scene and track movement. But as with all promises of intelligent analytics, expectations must be properly set and the system must be tested before deployment.

Audio Tech Tips

The first thing to understand is that integrating audio is much easier and less costly with network video as opposed to analog CCTV. Video and audio can be sent over the same cable, which reduces cabling cost and installation effort while helping to better synchronize data. Also, in an analog system, if the distance between the microphone and recording station is too long, balanced audio equipment will be needed — further increasing system cost and installation complexity.

When an IP camera or video encoder has support for audio, it typically includes a built-in microphone but rarely a speaker. This is important if you require two-way communications (more on this later). If the device does not have a built-in microphone, then often it will have a line-in jack for an external one. The built-in microphone may be appropriate for some applications, but an external microphone may provide a better solution for applications that require higher quality sound or need to place the microphone at a specific location. There are three main types of microphones: dynamic, condenser and electret.

Dynamic microphones are rarely used in surveillance because of poor audio sensitivity and ability to reproduce important low frequencies. If a dynamic microphone is required, typically it will use an XLR connector or, if that is not supported, an adapter.

The condenser microphone is known for the highest audio sensitivity and quality. While some back-electret microphones can provide similar quality, the condenser is often used in professional recording studios. Surveillance manufacturers and third-party accessory providers have recently launched new condenser microphones designed specifically for the security world with discreet and flexible installation options.

The third option, electret condenser microphones, are typically ones found in headsets and computer microphones — so it is not surprising that these are the types of microphones often built-in to network cameras. This type of microphone normally needs 1-10 Volts to work, which can be powered by the camera itself.

 Lastly, another sometimes overlooked audio device for network video applications is the audio module. These devices enable audio support and I/O ports that can be located far away from a network camera. Audio modules, for example, are great for a city surveillance application where a PTZ is located high up on a pole, but the sound should be captured at street level.

How to Select Audio Equipment

When selecting a camera  or encoder for video and audio capabilities, you must determine the type of communication that is important for the application. If the end-user requires two-way communication, then obviously a microphone and speaker (either internal or external) are required. From there, determine which of the three communication options will satisfy the needs: simplex, half-duplex or full-duplex.

In simplex mode, audio is sent in one direction only. In half-duplex mode, audio is sent in both directions, but only one party at a time can be heard (think walkie-talkie). In full-duplex mode, audio is sent to and from the operator simultaneously.

When audio is used merely to listen in on a scene or for video analytics, simplex will do the job. If you are working with an operator team and communication with people in the video scene is intermittent, then half-duplex will be acceptable; however, if the use of audio is for visitor management, emergency response or video conferencing, full-duplex is the choice. Remember, though, that while full duplex has the simultaneous audio advantage, it also increases the bandwidth required.

Five Best Practices for Installation

Once the appropriate IP camera, encoder and/or audio accessories are selected, there are a few other crucial installation and configuration best practices:

1. Audio equipment placement: Although an audio signal can be amplified later, appropriate placement of equipment will reduce noise. In full-duplex mode, the microphone should face away from the speaker at a reasonable distance to reduce feedback.

2. Amplify the signal as early as possible: This minimizes noise in the signal chain. In addition, make sure the signal levels are close to, but not over, the clipping level where audio becomes distorted. Proper gain selection is especially important to surveillance installations. If possible, the gain should be applied as early in the signal path as possible, preferably in the microphone itself.

3. Apply appropriate signal processing technologies to improve audio quality: Audio quality can be improved by adjusting the input gain and using different features such as echo cancellation and speech filter.

4. Select the right codec and bit rate: The type of codec and bit rate selected will affect audio quality. In general, the higher bit rate, the better the audio quality.

5. Understand legal implications: States and countries have different restrictions on the use of audio and video surveillance. Check with the local authorities about the legality of the system – or what’s required to make it legal – before product procurement.

The Future?

Most articles end with a look toward the future of technology and what it’ll bring. No need here. The technological marriage of audio, video and access control technology is here and it’s a strong one. It’s up to the expertise of the integrator to determine if audio is needed and then how to apply it.

Fredrik Nilsson is the general manager for Axis Communications in North America. He has more than 15 years of experience with IP video systems and is the author of “Intelligent Network Video: Understanding Modern Video Surveillance Systems,” published by CRC Press. His popular “Eye on Video” series can be found on To request more information about Axis, visit

About the Author

Fredrik Nilsson

Fredrik Nilsson is VP Americas for Axis Communications, and is the author of “Intelligent Network Video: Understanding Modern Video Surveillance Systems” published by CRC Press and now in its second edition. Request more info about Axis at