KD-Trees and missing values (vector comparison)

Posted by labratmatt on Stack Overflow See other posts from Stack Overflow or by labratmatt
Published on 2009-12-22T17:12:52Z Indexed on 2010/03/26 4:03 UTC
Read the original article Hit count: 323

Filed under:
|
|
|
|

I have a system that stores vectors and allows a user to find the n most similar vectors to the user's query vector. That is, a user submits a vector (I call it a query vector) and my system spits out "here are the n most similar vectors." I generate the similar vectors using a KD-Tree and everything works well, but I want to do more. I want to present a list of the n most similar vectors even if the user doesn't submit a complete vector (a vector with missing values). That is, if a user submits a vector with three dimensions, I still want to find the n nearest vectors (stored vectors are of 11 dimensions) I have stored.

I have a couple of obvious solutions, but I'm not sure either one seem very good:

  1. Create multiple KD-Trees each built using the most popular subset of dimensions a user will search for. That is, if a user submits a query vector of thee dimensions, x, y, z, I match that query to my already built KD-Tree which only contains vectors of three dimensions, x, y, z.

  2. Ignore KD-Trees when a user submits a query vector with missing values and compare the query vector to the vectors (stored in a table in a DB) one by one using something like a dot product.

This has to be a common problem, any suggestions? Thanks for the help.

© Stack Overflow or respective owner

Related posts about algorithm

Related posts about vector