|LANGREITER.COM plain, simple|
Google (finally!) launched a web-scale (and very well-implemented) image similarity search service, and there's some reason to believe that one of the algorithms at the core is FMRIQ, which some of my trusty and ever-attentive readers might recognize as the one used by retrievr as well.
Google's billions-of-images implementation clearly doesn't do a live similarity computation (as retrievr does) but seems to rely on pre-computed clusters; probably along the lines of the scheme described in Chuck Rosenberg et al.'s 2007 paper Clustering Billions of Images with Large Scale Nearest Neighbor Search.
According to this set of slides summarizing said paper (assembled by Dafna Bitton as part of coursework with Charles Elkan), the authors started by computing an FMRIQ-style image signature (0/1-quantized top-k Haar wavelet coefficients), reduced dimensionality by Random Projections (to 100) and finally added back color averages and aspect ratio information to arrive at rather compact 104-dimensional descriptors. To understand the clustering procedure, I'd have to read up on spill trees, which I'm not familiar with.
It's pretty exciting to know that the extremely simple method of dimensionality reduction by Random Projections seems to work so well in that particular context; this means that I (or you!) can skip straight ahead to experiments with [create Semantic Hashing]/Spectral Hashing. Using those methods, descriptors have been compressed to mere (machine) words (i.e. 64 bits and less); should that turn out to preserve enough information (at least for short-listing), then a system like that could be served from a single machine.
Disclaimer: I have no inside information as to whether the procedure described in the paper matches the technique used by the live implementation, but team overlap and some characteristics of the results certainly suggest as much.
Should I find out more, I'll certainly let you know.
GET YOUR MOVE ON
ALMOST ALL ABOUT YOU
So log in, fella — or finally get your langreiter.com account. You always wanted one.
Nearby in the temporal dimension:
... and 37 of the anonymous kind.
Click on for a moderate dose of lcom-talk. This will probably not work in Lynx and other browser exotica.
RECENT EDITS (MORE)
Uncut Games bei Gameware
Offenlegung gem. §25 MedienG:
Christian Langreiter, Langkampfen