Snapshots for Our Times: Face Recognition in 2005

Manufacturers and advocates of face recognition products have grown accustomed to fielding questions—and sometimes fending off attacks—about the technology. Isn’t it a violation of personal privacy? Hasn’t it been shown unreliable? Is it really necessary for security? These cultural and technological concerns have kept face recognition from realizing its full potential, but they may not hold it back much longer.

Privacy Concerns
The American Civil Liberties Union has demonstrated the most serious opposition to face recognition technology within the United States. Their main argument has been that widespread surveillance is likely to become increasingly invasive and abusive over time. They have frequently referred to some highly publicized airport installation fiascos from several years back, as well as the infamous 2001 Super Bowl incident in Tampa, FL, buttressing their criticism by noting that since face recognition “does not even work,” it doesn’t really increase safety and security.

This question of accuracy has been the biggest sticking point. Face recognition has worked fairly well in controlled environments, but has had serious problems in more realistic, uncontrolled settings. Orientation and lighting variations, and to some degree changes in expression, can produce markedly different facial images for the same person. Changes in hairstyle, facial hair and body weight, the effects of aging, and deliberate disguise can also hamper performance.

Given these problems, why choose face recognition? The truth is, face recognition is still the best passive and non-intrusive biometric available. It’s also easy to use: An operator needs neither special hardware nor expert skills to interpret the results.

In recent years, attacks on face recognition seem to be dampening. In part, this is due to the public’s growing understanding and acceptance of video surveillance generally and familiarity with extensive surveillance efforts outside the U.S., particularly in venues like London.

Additionally, after well over a decade of gradual improvements, the technology has begun to show signs of maturation as it quietly gains ground by becoming more reliable.

New, Improved, Accurate
Face recognition advocates have actively pursued major developments that are now leading to increased accuracy.

3-D Representation. Faces are part of a 3-D world, so representing them with 3-D data makes intuitive sense. There are two main ways to do that. In one alternative, structured near-visible infrared light is projected on a face. A 3-D representation is constructed based on the distortions of the pattern of light on the face. This technique is used by A4Vision.

An alternative method employs two or more cameras and builds a 3-D shape via correspondence analysis and triangulation methods. This is Geometrix’s approach.

Both of these systems claim sub-millimeter accuracy. Because they need high resolution to achieve good results, they work best when the person is a couple of feet away from the capture device(s). The most attractive part of 3-D technologies is that they virtually eliminate the effects of orientation and illumination changes, the two major difficulties associated with traditional 2-D recognition techniques. On the other hand, 3-D systems are typically more expensive than their 2-D counterparts, and their computational load is quite a bit higher as well.

High-Resolution 2-D. Due to advances in sensor technology, today’s cameras are able to acquire increasingly higher-resolution face images. You can expect better recognition results from a system that captures more details on a face. Although skin recognition technology is supposed to work with mid-range cameras, higher resolution helps the analysis of skin patterns on the face.

The addition of skin pattern data can improve face recognition performance. That promise was most likely behind Identix’s acquisition of DeLean Vision in 2004. Indeed, Identix claims that face recognition performance improves by 20 to 25% when skin pattern data is included.

Preprocessing. Recent advances in computer graphics and computer vision produce much better algorithms for automatic lighting and pose correction on 2-D face images. Such preprocessing methods detect and automatically correct for changes in illumination and pose before the face image is sent to the recognition system. By removing the external variations caused by chance effects, these corrections improve the system’s ability to focus on meaningful, identity-based variations. Consequently, the recognition algorithm now has a significantly improved chance of coming up with the correct result.

Multiple Images. Another way to improve performance by using more data is to store multiple images of each person in the database. The hope here is that there is a better chance of finding someone in a database when multiple matches are possible.

Some systems synthesize multiple images from a single shot. XID Technologies’ predictive face synthesis technique simulates the effects of changing light, and orientation, as well as the addition of glasses, facial hair and other variable features, automatically producing several images from one 2-D picture.

An alternative way of using multiple images is to work with video data. Although the most recent government testing of face recognition systems, FVRT 2002, showed no advantage of video over still images, one might argue that if processed appropriately, more video data should eventually result in better performance for a face recognition system.

Standards Development
Shortly after passing the USA Patriot Act and the Enhanced Border and Visa Entry Reform Act 2002, the U.S. government turned to the International Organization for Standardization for guidance in the international standardization of biometric information. The American National Standards Institute was approached with similar requests on a domestic level.

ANSI’s International Committee for Information Technology Standards (INCITS) developed its face recognition format for data interchange last year, and ISO is to follow this year. Because of the quick growth in this area, INCITS has already decided to amend its standard to address 3-D face images. In addition, both organizations have already made progress in standardizing the biometric vocabulary. INCITS is adopting the following terms: False Accept Rate (FAR) and False Reject Rate (FRR) in its Biometric Performance Testing and Reporting Standard, currently under development. The ISO will likely do the same.

How to Evaluate Systems
Since the standardization of testing regimes for biometric systems is still developing, an objective evaluation and comparison of face recognition products can be quite a challenge. Manufacturers and vendors are typically exuberant about the performance of their own systems, but that doesn’t always make for quality, accurate assessments.

Government Testing. The U.S. Department of Defense has been evaluating face recognition systems since 1993, and its tests have become the most reliable indicators of system viability.

Both research enterprises and commercial organizations around the world have been invited to take part in the Face Recognition Grand Challenge administered from May 2004 through July 2005. The FRGC will be followed by the Face Recognition Vendor Test 2005, scheduled for August/September. These evaluation venues will perform tests on the same databases for all participants in order to achieve fair and objective comparisons. FVRT 2005 could become the most important industry shaping event for several years ahead. Companies wanting to integrate face recognition solutions into their products pay close attention to the FVRT test results and use them as guidelines in deciding which vendor to partner with.

ROC Curves and Other Metrics. Probably one of the most important graphs to describe the performance of a face recognition system is its receiver operating characteristic (ROC) curve. The curve plots the FAR of the system against its identification rate (IR) at various threshold levels, but essentially it could be thought of as showing how the system trades off FAR for FRR. Note that if we used 1-to-1 matching we would talk about verification rate (VR) instead of IR.

For better viewing, you will often see the FAR data plotted on a logarithmic scale on the X-axis.

The ROC is an accepted standard for demonstrating the performance of a recognition system, but plotting the FAR and FRR values directly together at various threshold levels can elucidate their complementary nature more readily. At lower threshold levels there is virtually no false reject error (meaning everyone is matched), but the false accept error is high (meaning many are matched who were not supposed to be).

At high threshold levels the pattern reverses: Now you see the false accept rate dropping sharply, but at the same time the false reject rate is increasing. It is becoming harder and harder to match even legitimate enrollees.

The green curve, total error rate (TER), shows the sum of these two error types (FAR and FRR) at increasing threshold levels. If the cost of making both types of error is the same, then the system’s minimum overall error can be found at the lowest point of the green curve. Note, however, that for a low-security situation, such as a public cafeteria, one might set the threshold a bit lower than that in order to minimize false rejections. At the same time, for a nuclear facility, one might set the threshold higher to eliminate false accepts as much as possible.

Another good indicator of system performance is the Equal Error Rate (EER), where the FAR is equal to the FRR, or graphically, where the FAR and FRR curves cross one another. The minimum of the total error rate, min(TER), does not necessarily equal 2xEER, although these numbers are usually close to each other.

Practically any system can show perfect or close to perfect performance on FAR or FRR data separately, but these numbers are only meaningful when presented together. In other words, ask either for the minimum total error of the system, min(TER), or FAR and FRR data presented at the same threshold level. Alternatively, the EER value is a threshold-independent good overall indicator of performance. Equally important is the quality and the size of the database that was used to test the system. The FRVT 2002 results show that face recognition performance drops linearly with the logarithm of the database size.

Bad quality or uncontrolled imagery can also dramatically reduce performance. Indeed, this is one of the reasons so much effort is being invested to standardize picture taking and image quality.

There is a host of technological advances accelerating the improvement in face recognition accuracy. Just as important, public attitudes are becoming less negative—the specter of DNA identification, for example, makes face recognition look less intimidating. The policy efforts, such as the development of rigorous standards, also underscore the likelihood that face recognition has now gained the traction needed to become an enduring security tool. As deployments increase, and as face recognition increasingly becomes a component in multi-modality solutions, revenues and investor confidence will grow, providing the resources for further research and development. The innovations that are developed, in turn, will deepen performance, and the cycle will repeat itself. For some time ahead, in other words, industry dynamics will move the entire cycle upwards.

Peter Kalocsai, PhD. (pkalocsai@pelco.com), is a senior research scientist at Pelco’s R&D department, focusing on face and object recognition, intelligent video applications and statistical analysis.

Nicholas Imparato, PhD. (imparato@usfca.edu), is a professor at the School of Business, University of San Francisco, a Research Fellow at the Hoover Institution, Stanford University and an industry advisor.

Loading