What algorithms are suitable for this simple machine learning problem?

Posted by user213060 on Stack Overflow See other posts from Stack Overflow or by user213060
Published on 2010-03-25T22:54:07Z Indexed on 2010/03/25 23:23 UTC
Read the original article Hit count: 391

I have a what I think is a simple machine learning question.

Here is the basic problem: I am repeatedly given a new object and a list of descriptions about the object. For example: new_object: 'bob' new_object_descriptions: ['tall','old','funny']. I then have to use some kind of machine learning to find previously handled objects that had similar descriptions, for example, past_similar_objects: ['frank','steve','joe']. Next, I have an algorithm that can directly measure whether these objects are indeed similar to bob, for example, correct_objects: ['steve','joe']. The classifier is then given this feedback training of successful matches. Then this loop repeats with a new object. a Here's the pseudo-code:

Classifier=new_classifier()

while True:
    new_object,new_object_descriptions = get_new_object_and_descriptions()
    past_similar_objects = Classifier.classify(new_object,new_object_descriptions)
    correct_objects = calc_successful_matches(new_object,past_similar_objects)
    Classifier.train_successful_matches(object,correct_objects)

But, there are some stipulations that may limit what classifier can be used:

  • There will be millions of objects put into this classifier so classification and training needs to scale well to millions of object types and still be fast. I believe this disqualifies something like a spam classifier that is optimal for just two types: spam or not spam. (Update: I could probably narrow this to thousands of objects instead of millions, if that is a problem.)

  • Again, I prefer speed when millions of objects are being classified, over accuracy.

What are decent, fast machine learning algorithms for this purpose?

© Stack Overflow or respective owner

Related posts about machine-learning

Related posts about python