Photo Recall: Using the Internet to Label Your Photos

Keywords: photo organization, image search, web mining, content-based image retrieval, gps, visual classifiers, natural language search, layered graph inference, events
Fall 2012 - Ongoing
teaser image

Description

We describe a system for searching your personal photos using an extremely wide range of text queries, including dates and holidays (Halloween), named and categorical places (Empire State Building or park), events and occasions (Radiohead concert or wedding), activities (skiing), object categories (whales), attributes (outdoors), and object instances (Mona Lisa), and any combination of these -- all with no manual labeling required. We accomplish this by correlating information in your photos -- the timestamps, GPS locations, and image pixels -- to information mined from the Internet. This includes matching dates to holidays listed on Wikipedia, GPS coordinates to places listed on Wikimapia, places and dates to find named events using Google, visual categories using classifiers either pre-trained on ImageNet or trained on-the-fly using results from Google Image Search, and object instances using interest point-based matching, again using results from Google Images. We tie all of these disparate sources of information together in a unified way, allowing for fast and accurate searches using whatever information you remember about a photo. We represent all information in our system in a layered graph which prevents duplication of effort and data storage, while simultaneously allowing for fast searches, generating meaningful descriptions of search results, and even suggesting query completions to the user as she types, via auto-complete. We quantitatively evaluate several aspects of our system and show excellent performance in all respects.

This work was supported by funding from National Science Foundation grant IIS-1250793, Google, Adobe, Microsoft, Pixar, and the UW Animation Research Labs.

Publications

  • "Photo Recall: Using the Internet to Label Your Photos,"
    2nd Workshop on Web-scale Vision and Social Media (VSM) at CVPR 2014,
    June 2014.
  • "Photo Recall: Using the Internet to Label Your Photos," (extended abstract)
    Proceedings of the 23rd International Conference on World Wide Web companion,
    April 2014.

Videos

Photo Recall Introduction

Photo Recall Introduction:

Video describing Photo Recall and showcasing several results

Images

Methods

Methods:

An overview of the different kinds of queries our system supports.
Indexing

Indexing:

Our system associates images with labels by matching different types of image data to various online sources, either in an initial indexing step, or on-the-fly when the user issues a query. Here we show the different methods we handle and results returned from sample queries. (a) We match the datestamps stored in photos to a list of holidays from Wikipedia, allowing for queries like Saint Patrick's Day. (b) We lookup GPS coordinates from photo metadata on Wikimapia to get place names and categories, allowing for searches like FAO Schwarz or toy shop. (c) We issue searches on Google for pairs of dates and place names to find what event took place there. We parse the results and apply some simple NLP to get event tags, like boston celtics. (d) We pretrain thousands of binary visual classifiers using categories from ImageNet, such as wedding or whale. (e) For things not covered in ImageNet, we issue queries on Google Images and train a binary visual classifier on-the-fly, such as green dress. (f) For finding object instances rather than categories, we can also match SIFT descriptors on-the-fly from Google Image search results, such as for a photo of the Liegende painting. Despite several sources of noise in the data and matching processes, we are able to return accurate results.
Layered Graph Representation

Layered Graph Representation:

We represent all information in our system as nodes in a layered graph. Each colored box contains many nodes -- individual bits of information -- of a particular type (denoted by the name in the box). Lines between boxes indicate weighted connections between nodes of the two layers. Images are connected to their sensor values -- timestamp and GPS, and low-level visual features. These are mapped into semantic concepts (i.e., the things that people care about) through the use of Internet data sources, shown in parentheses. Finally, semantic nodes are exposed to search queries through the language layer, which contains text tags. By unifying all sources of information in this graph, we can easily incorporate new types of data to support novel types of queries, and perform fast and accurate search using any combination of terms.
Flow computation

Flow computation:

Given the graph shown in (a) with specified edge weights, we examine what happens when the user does a search for wedding in Israel. First, the search flow Fsearch, shown in the 3rd column of (b), is computed by assigning scores at the language layer (top) based on string similarity, and then propagating scores down. Since Image 2 gets the higher score, it will be displayed first in the results. Then, to generate the description for this image, we first compute its flow FI_2 by propagating up the layers (4th column), and then add this to λ Fsearch (λ =10) to obtain the final description flow FdescI2 (5th column). We pick the highest ranked language nodes of each type (what, place, city, country, etc.) to fill out the description template.
Results for the query <i>burning man</i>

Results for the query burning man:

Results for the query <i>christmas</i>

Results for the query christmas:

Results for the query <i>Deerhoof</i>

Results for the query Deerhoof:

Results for the query <i>fireworks</i>

Results for the query fireworks:

Results for the query <i>flight of the conchords</i>

Results for the query flight of the conchords:

Results for the query <i>flowers</i>

Results for the query flowers:

Results for the query <i>grand canyon</i>

Results for the query grand canyon:

Results for the query <i>independence day</i>

Results for the query independence day:

Results for the query <i>Italian Grand Prix</i>

Results for the query Italian Grand Prix:

Results for the query <i>La Vie by Chagall</i>

Results for the query La Vie by Chagall:

Results for the query <i>Metallica</i>

Results for the query Metallica:

Results for the query <i>paintball</i>

Results for the query paintball:

Results for the query <i>PLUG awards</i>

Results for the query PLUG awards:

Results for the query <i>portrait</i>

Results for the query portrait:

Results for the query <i>railway station</i>

Results for the query railway station:

Results for the query <i>skiing</i>

Results for the query skiing:

Results for the query <i>sunset at the bay</i>

Results for the query sunset at the bay:

Results for the query <i>wedding in israel</i>

Results for the query wedding in israel:

Results for the query <i>wedding, new jersey</i>

Results for the query wedding, new jersey:

Results for the query <i>whale</i>

Results for the query whale: