Photo Recall: Using the Internet to Label Your Photos
Fall 2012 - Ongoing
- Jump to:
- Description
- Publications
- Videos
- Images
Description
We describe a system for searching your personal photos using an extremely wide range of text queries, including dates and holidays (Halloween), named and categorical places (Empire State Building or park), events and occasions (Radiohead concert or wedding), activities (skiing), object categories (whales), attributes (outdoors), and object instances (Mona Lisa), and any combination of these -- all with no manual labeling required. We accomplish this by correlating information in your photos -- the timestamps, GPS locations, and image pixels -- to information mined from the Internet. This includes matching dates to holidays listed on Wikipedia, GPS coordinates to places listed on Wikimapia, places and dates to find named events using Google, visual categories using classifiers either pre-trained on ImageNet or trained on-the-fly using results from Google Image Search, and object instances using interest point-based matching, again using results from Google Images. We tie all of these disparate sources of information together in a unified way, allowing for fast and accurate searches using whatever information you remember about a photo. We represent all information in our system in a layered graph which prevents duplication of effort and data storage, while simultaneously allowing for fast searches, generating meaningful descriptions of search results, and even suggesting query completions to the user as she types, via auto-complete. We quantitatively evaluate several aspects of our system and show excellent performance in all respects.
This work was supported by funding from National Science Foundation grant IIS-1250793, Google, Adobe, Microsoft, Pixar, and the UW Animation Research Labs.
Publications
-
"Photo Recall: Using the Internet to Label Your Photos,"2nd Workshop on Web-scale Vision and Social Media (VSM) at CVPR 2014,June 2014.
-
"Photo Recall: Using the Internet to Label Your Photos," (extended abstract)Proceedings of the 23rd International Conference on World Wide Web companion,April 2014.
Videos
![]() |
Photo Recall Introduction:Video describing Photo Recall and showcasing several results
|
Images
![]() |
Methods:An overview of the different kinds of queries our system supports.
|
![]() |
Indexing:Our system associates images with labels by matching different
types of image data to various online sources, either in an initial
indexing step, or on-the-fly when the user issues a query. Here we show the
different methods we handle and results returned from sample queries. (a)
We match the datestamps stored in photos to a list of holidays from
Wikipedia, allowing for queries like Saint Patrick's Day. (b) We
lookup GPS coordinates from photo metadata on Wikimapia to get place names
and categories, allowing for searches like FAO Schwarz or toy
shop. (c) We issue searches on Google for pairs of dates and place
names to find what event took place there. We parse the results and apply
some simple NLP to get event tags, like boston celtics. (d) We
pretrain thousands of binary visual classifiers using categories from
ImageNet, such as wedding or whale. (e) For things not
covered in ImageNet, we issue queries on Google Images and train a binary
visual classifier on-the-fly, such as green dress. (f) For finding
object instances rather than categories, we can also match SIFT descriptors
on-the-fly from Google Image search results, such as for a photo of the
Liegende painting. Despite several sources of noise in the data and
matching processes, we are able to return accurate results.
|
![]() |
Layered Graph Representation:We represent all information in our system as nodes in a layered
graph. Each colored box contains many nodes -- individual bits of information
-- of a particular type (denoted by the name in the box). Lines between boxes
indicate weighted connections between nodes of the two layers. Images are connected to
their sensor values -- timestamp and GPS, and low-level visual features. These
are mapped into semantic concepts (i.e., the things that people care about)
through the use of Internet data sources, shown in parentheses. Finally,
semantic nodes are exposed to search queries through the language layer, which
contains text tags. By unifying all sources of information in this graph, we
can easily incorporate new types of data to support novel types of queries,
and perform fast and accurate search using any combination of terms.
|
![]() |
Flow computation:Given the graph shown in (a) with specified edge weights, we examine what happens when the user does a search for wedding in Israel. First, the search flow Fsearch, shown in the 3rd column of (b), is computed by assigning scores at the language layer (top) based on string similarity, and then propagating scores
|
![]() |
Results for the query burning man: |
![]() |
Results for the query christmas: |
![]() |
Results for the query Deerhoof: |
![]() |
Results for the query fireworks: |
![]() |
Results for the query flight of the conchords: |
![]() |
Results for the query flowers: |
![]() |
Results for the query grand canyon: |
![]() |
Results for the query independence day: |
![]() |
Results for the query Italian Grand Prix: |
![]() |
Results for the query La Vie by Chagall: |
![]() |
Results for the query Metallica: |
![]() |
Results for the query paintball: |
![]() |
Results for the query PLUG awards: |
![]() |
Results for the query portrait: |
![]() |
Results for the query railway station: |
![]() |
Results for the query skiing: |
![]() |
Results for the query sunset at the bay: |
![]() |
Results for the query wedding in israel: |
![]() |
Results for the query wedding, new jersey: |
![]() |
Results for the query whale: |