When to choose which machine learning classifier?

Posted by LM on Stack Overflow See other posts from Stack Overflow or by LM
Published on 2010-04-07T19:10:24Z Indexed on 2010/04/07 19:13 UTC
Read the original article Hit count: 342

Filed under:

machine-learning

Suppose I'm working on some classification problem. (Fraud detection and comment spam are two problems I'm working on right now, but I'm curious about any classification task in general.)

How do I know which classifier I should use? (Decision tree, SVM, Bayesian, logistic regression, etc.) In which cases is one of them the "natural" first choice, and what are the principles for choosing that one?

Examples of the type of answers I'm looking for (from Manning et al.'s "Introduction to Information Retrieval book": http://nlp.stanford.edu/IR-book/html/htmledition/choosing-what-kind-of-classifier-to-use-1.html):

a. If your data is labeled, but you only have a limited amount, you should use a classifier with high bias (for example, Naive Bayes). [I'm guessing this is because a higher-bias classifier will have lower variance, which is good because of the small amount of data.]

b. If you have a ton of data, then the classifier doesn't really matter so much, so you should probably just choose a classifier with good scalability.

What are other guidelines? Even answers like "if you'll have to explain your model to some upper management person, then maybe you should use a decision tree, since the decision rules are fairly transparent" are good. I care less about implementation/library issues, though.
Also, for a somewhat separate question, besides standard Bayesian classifiers, are there 'standard state-of-the-art' methods for comment spam detection (as opposed to email spam)?

[Not sure if stackoverflow is the best place to ask this question, since it's more machine learning than actual programming -- if not, any suggestions for where else?]

Related posts about machine-learning

Machine learning challenge: diagnosing program in java/groovy (datamining, machine learning)

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi All! I'm planning to develop program in Java which will provide diagnosis. The data set is divided into two parts one for training and the other for testing. My program should learn to classify from the training data (BTW which contain answer for 30 questions each in new column, each record in… >>> More
Is it possible to predict future using machine learning and/or AI?

as seen on Programmers - Search for 'Programmers'
Recently I have started reading about machine learning. From 3000 feet view, machine learning seems really great thing but as if now I have found that machine learning is limited to only 3 types of algorithms namely classification, clustering and recommendations. I would like to know if my assumption… >>> More
Design for a machine learning artificial intelligence framework

as seen on Stack Overflow - Search for 'Stack Overflow'
This is a community wiki which aims to provide a good design for a machine learning/artificial intelligence framework (ML/AI framework). Please contribute to the design of a language-agnostic framework which would allow multiple ML/AI algorithms to be plugged into a single framework which: runs… >>> More
A good machine learning technique to weed out good URLs from bad

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I have an application that needs to discriminate between good HTTP GET requests and bad. For example: http://somesite.com?passes=dodgy+parameter # BAD http://anothersite.com?passes=a+good+parameter # GOOD My system can make a binary decision about whether or not a… >>> More
Design for a machine learning artificial intelligence framework (community wiki)

as seen on Stack Overflow - Search for 'Stack Overflow'
This is a community wiki which aims to provide a good design for a machine learning/artificial intelligence framework (ML/AI framework). Please contribute to the design of a language-agnostic framework which would allow multiple ML/AI algorithms to be plugged into a single framework which: runs… >>> More

Developer IT