Attribute and Simile Classifiers for Face Verification

Keywords: face verification, attributes, similes, attribute classifiers
Spring 2009 - Summer 2011
teaser image

Description

In this work, we advance the state-of-the-art for face verification ("are these two images of the same person?") in uncontrolled settings with non-cooperative subjects. To this end, we present two novel and complementary methods for face verification. Common to both methods is the idea of extracting and comparing "high-level" visual features, or traits, of a face image that are insensitive to pose, illumination, expression, and other imaging conditions. Our first method -- based on attribute classifiers -- uses binary classifiers trained to recognize the presence, absence, or degree of describable visual attributes (gender, race, age, hair color, etc.). Our second method -- based on simile classifiers -- removes the manual labeling required to train attribute classifiers. The simile classifiers are binary classifiers trained to recognize the similarity of faces, or regions of faces, to specific reference people. The idea is to automatically learn similes that distinguish a person from the general population. An unseen face might be described as having a mouth that looks like Barack Obama's and a nose that looks like Owen Wilson's.

Comparing two faces is simply a matter of comparing trait vectors (i.e., from the attribute and/or simile classifiers). We present experimental evaluation results on the challenging Labeled Faces in the Wild (LFW) data set. This data set is remarkable in its variability, exhibiting all of the differences mentioned above. Remarkably, both the attribute and simile classifiers achieve state-of-the-art results on the LFW "restricted images" benchmark, and a hybrid of the two results in a 31.68% drop in error rates compared to the previous best. To our knowledge, this is the first time that a list of such visual traits have been used for face verification. For testing beyond the LFW data set, we introduce PubFig -- a new data set of real-world images of public figures (celebrities and politicians) acquired from the internet. The PubFig data set is both larger (60,000 images) and deeper (on average 300 images per individual) than existing data sets, and allows us to present verification results broken out by pose, illumination, and expression. Finally, we measure human performance on LFW, showing that humans do very well on it -- given image pairs, verification of identity can be performed almost without error.

This research was funded in part by NSF award IIS-03-25867 and ONR award N00014-08-1-0638. We are grateful to Omron Technologies for providing us the OKAO face detection system.

Publications

  • "Describable Visual Attributes for Face Images," (PhD Thesis)
    Technical Report CUCS-035-11, Department of Computer Science, Columbia University,
    August 2011.
  • "Describable Visual Attributes for Face Verification and Image Search,"
    IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),
    vol. 33, no. 10, pp. 1962--1977, October 2011.
  • "Attribute and Simile Classifiers for Face Verification," (oral presentation)
    Proceedings of the 12th IEEE International Conference on Computer Vision (ICCV),
    October 2009.

Databases

PubFig: Public Figures Face Database

PubFig: Public Figures Face Database:

60,000 face images with several face verification benchmarks & 65 automatically computed attribute labels for 42,000 images

Videos

Describable Visual Attributes for Face Search and Recognition

Describable Visual Attributes for Face Search and Recognition:

Talk given at UNC Chapel Hill, May 25, 2011

Images

Training images for attribute classifiers

Training images for attribute classifiers:

Each row shows training examples of face images that match the given attribute label (positive examples) and those that don't (negative examples). We have over a thousand training images for each of our 65 attributes. Accuracies for each attribute classifier are shown in the next image.
Accuracies of attribute classifiers

Accuracies of attribute classifiers:

We present accuracies of the 65 attribute classifiers trained for our system. Example training images for the attributes in bold are shown in the previous images
Amazon Mechanical Turk job for labeling attributes

Amazon Mechanical Turk job for labeling attributes:

We use Amazon Mechanical Turk to label images with attributes. This online service allows us to easily and inexpensively label images using large numbers of human workers. This image shows an example of our attribute labeling jobs. We were able to collect over 125,000 human labels in a month, for $5,000.
Attribute classifier outputs

Attribute classifier outputs:

An attribute classifier can be trained to recognize the presence or absence of a describable aspect of visual appearance. The responses for several such attribute classifiers are shown for a pair of images of Halle Berry. Note that the "flash" and "shiny skin" attributes produce very different responses, while the responses for the remaining attributes are in strong agreement despite the changes in pose, illumination, expression, and image quality.
Training images for simile classifiers

Training images for simile classifiers:

Each simile classifier is trained using several images of a specific reference person, limited to a small face region such as the eyes, nose, or mouth. We show here three positive and three negative examples for four regions on two of the reference people used to train these classifiers.
Simile classifier outputs

Simile classifier outputs:

We use a large number of "simile" classifiers trained to recognize the similarities of parts of faces to specific reference people. The responses for several such simile classifiers are shown for a pair of images of Harrison Ford. R_j denotes reference person j, so the first bar on the left displays the similarity to the eyes of reference person 1. Note that the responses are, for the most part, in agreement despite the changes in pose, illumination, and expression.
Face Verification Results on LFW

Face Verification Results on LFW:

Performance of our attribute classifiers, simile classifiers, and a hybrid of the two are shown in solid red, blue, and green, respectively. All 3 of our methods outperform all previous methods (dashed lines). Our highest accuracy is 85.29%, which corresponds to a 31.68% lower error rate than the previous state-of-the-art.
Amazon Mechanical Turk job for human verification

Amazon Mechanical Turk job for human verification:

We asked human users on Amazon Mechanical Turk to perform the face verification task on the LFW data set. This image shows an example of what these jobs looked like. Using a total of 240,000 user responses, we were able to plot human performance on LFW
Human Face Verification Results on LFW

Human Face Verification Results on LFW:

Human performance on LFW is almost perfect (99.20%) when people are shown the original images (red line). Showing a tighter cropped version of the images (blue line) drops their accuracy to 97.53%, due to the lack of context available. The green line shows that even with an inverse crop, i.e., when only the context is shown, humans still perform amazingly well, at 94.27%. This highlights the strong context cues available on the LFW data set. All of our methods mask out the background to avoid using this information.
The PubFig Data Set

The PubFig Data Set:

We show example images for the 140 people used for verification tests on the PubFig benchmark. Below each image is the total number of face images for that person in the entire data set.
Face Verification Results on PubFig

Face Verification Results on PubFig:

Our performance on the entire benchmark set of 20,000 pairs using attribute classifiers is shown in black. Performance on the pose, illumination, and expression subsets of the benchmark are shown in red, blue, and green, respectively. For each subset, the solid lines show results for the "easy" case (frontal pose/lighting or neutral expression), and dashed lines show results for the "difficult" case (non-frontal pose/lighting, non-neutral expression a).