Attribute and Simile Classifiers for Face Verification
Spring 2009 - Summer 2011
- Jump to:
- Description
- Publications
- Databases
- Videos
- Images
- Slides
- Related
Description
In this work, we advance the state-of-the-art for face verification ("are these two images of the same person?") in uncontrolled settings with non-cooperative subjects. To this end, we present two novel and complementary methods for face verification. Common to both methods is the idea of extracting and comparing "high-level" visual features, or traits, of a face image that are insensitive to pose, illumination, expression, and other imaging conditions. Our first method -- based on attribute classifiers -- uses binary classifiers trained to recognize the presence, absence, or degree of describable visual attributes (gender, race, age, hair color, etc.). Our second method -- based on simile classifiers -- removes the manual labeling required to train attribute classifiers. The simile classifiers are binary classifiers trained to recognize the similarity of faces, or regions of faces, to specific reference people. The idea is to automatically learn similes that distinguish a person from the general population. An unseen face might be described as having a mouth that looks like Barack Obama's and a nose that looks like Owen Wilson's.
Comparing two faces is simply a matter of comparing trait vectors (i.e., from the attribute and/or simile classifiers). We present experimental evaluation results on the challenging Labeled Faces in the Wild (LFW) data set. This data set is remarkable in its variability, exhibiting all of the differences mentioned above. Remarkably, both the attribute and simile classifiers achieve state-of-the-art results on the LFW "restricted images" benchmark, and a hybrid of the two results in a 31.68% drop in error rates compared to the previous best. To our knowledge, this is the first time that a list of such visual traits have been used for face verification. For testing beyond the LFW data set, we introduce PubFig -- a new data set of real-world images of public figures (celebrities and politicians) acquired from the internet. The PubFig data set is both larger (60,000 images) and deeper (on average 300 images per individual) than existing data sets, and allows us to present verification results broken out by pose, illumination, and expression. Finally, we measure human performance on LFW, showing that humans do very well on it -- given image pairs, verification of identity can be performed almost without error.
This research was funded in part by NSF award IIS-03-25867 and ONR award N00014-08-1-0638. We are grateful to Omron Technologies for providing us the OKAO face detection system.
Publications
-
"Describable Visual Attributes for Face Images," (PhD Thesis)Technical Report CUCS-035-11, Department of Computer Science, Columbia University,August 2011.[pdf] [bibtex] [slides (ppt)]@InProceedings{nkthesis,
author = {Neeraj Kumar},
title = {Describable Visual Attributes for Face Images},
booktitle = {Technical Report CUCS-035-11, Department of Computer Science, Columbia University},
month = {August},
year = {2011}
} -
"Describable Visual Attributes for Face Verification and Image Search,"IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI),vol. 33, no. 10, pp. 1962--1977, October 2011.[pdf] [bibtex] [slides (ppt)] [video]@InProceedings{facever_pami2011,
author = {Neeraj Kumar and Alexander C. Berg and Peter N. Belhumeur and Shree K. Nayar},
title = {Describable Visual Attributes for Face Verification and Image Search},
booktitle = {IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI)},
volume={33},
number={10},
pages={1962-1977},
month = {October},
year = {2011}
} -
"Attribute and Simile Classifiers for Face Verification," (oral presentation)Proceedings of the 12th IEEE International Conference on Computer Vision (ICCV),October 2009.[pdf] [bibtex] [slides (ppt)]@InProceedings{facever_iccv2009,
author = {Neeraj Kumar and Alexander C. Berg and Peter N. Belhumeur and Shree K. Nayar},
title = {Attribute and Simile Classifiers for Face Verification},
booktitle = {The 12th IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2009}
}
Databases
PubFig: Public Figures Face Database:60,000 face images with several face verification benchmarks & 65 automatically computed attribute labels for 42,000 images
|
Videos
Describable Visual Attributes for Face Search and Recognition:Talk given at UNC Chapel Hill, May 25, 2011
|
Images
Training images for attribute classifiers:Each row shows training examples
of face images that match the given attribute label (positive examples) and
those that don't (negative examples). We have over a thousand training images
for each of our 65 attributes. Accuracies for each attribute classifier are
shown in the next image.
|
|
Accuracies of attribute classifiers:We present accuracies of the 65 attribute classifiers trained for our system.
Example training images for the attributes in bold are shown in the
previous images
|
|
Amazon Mechanical Turk job for labeling attributes:We use Amazon Mechanical Turk to label images
with attributes. This online service allows us to easily and inexpensively
label images using large numbers of human workers. This image shows an example
of our attribute labeling jobs. We were able to collect over 125,000 human
labels in a month, for $5,000.
|
|
Attribute classifier outputs:An attribute classifier can be trained to recognize the presence or absence of
a describable aspect of visual appearance. The responses for several
such attribute classifiers are shown for a pair of images of Halle Berry. Note
that the "flash" and "shiny skin" attributes produce very different responses,
while the responses for the remaining attributes are in strong agreement
despite the changes in pose, illumination, expression, and image quality.
|
|
Training images for simile classifiers:Each simile classifier is trained using several images of a specific reference
person, limited to a small face region such as the eyes, nose, or mouth. We
show here three positive and three negative examples for four regions on two of
the reference people used to train these classifiers.
|
|
Simile classifier outputs:We use a large number of "simile" classifiers trained to recognize the
similarities of parts of faces to specific reference people. The
responses for several such simile classifiers are shown for a pair of images of
Harrison Ford. R_j denotes reference person j, so the first bar on the left
displays the similarity to the eyes of reference person 1. Note that the
responses are, for the most part, in agreement despite the changes in pose,
illumination, and expression.
|
|
Face Verification Results on LFW:Performance of our attribute classifiers, simile classifiers, and a hybrid of
the two are shown in solid red, blue, and green, respectively. All 3 of our
methods outperform all previous methods (dashed lines). Our highest accuracy is
85.29%, which corresponds to a 31.68% lower error rate than the previous
state-of-the-art.
|
|
Amazon Mechanical Turk job for human verification:We asked human users on Amazon Mechanical Turk
to perform the face verification task on the LFW data set. This image shows an
example of what these jobs looked like. Using a total of 240,000 user
responses, we were able to plot human performance on LFW
|
|
Human Face Verification Results on LFW:Human performance on LFW is almost perfect (99.20%) when people are shown the
original images (red line). Showing a tighter cropped version of the images
(blue line) drops their accuracy to 97.53%, due to the lack of context
available. The green line shows that even with an inverse crop, i.e., when
only the context is shown, humans still perform amazingly well, at
94.27%. This highlights the strong context cues available on the LFW data set.
All of our methods mask out the background to avoid using this information.
|
|
The PubFig Data Set:We show example images for the 140 people used for verification tests on the
PubFig benchmark. Below each image is the total number of face images for that
person in the entire data set.
|
|
Face Verification Results on PubFig:Our performance on the entire benchmark set of 20,000 pairs using attribute
classifiers is shown in black. Performance on the pose, illumination, and
expression subsets of the benchmark are shown in red, blue, and green,
respectively. For each subset, the solid lines show results for the "easy" case
(frontal pose/lighting or neutral expression), and dashed lines show results
for the "difficult" case (non-frontal pose/lighting, non-neutral
expression a).
|