Naive Bayesian for Topic detection using "Bag of Words" approach

Posted by AlgoMan on Stack Overflow See other posts from Stack Overflow or by AlgoMan
Published on 2010-05-06T14:18:17Z Indexed on 2010/05/06 19:58 UTC
Read the original article Hit count: 431

Filed under:

bayesian

|

machine-learning

|

data-mining

|

nlp

|

natural-language

I am trying to implement a naive bayseian approach to find the topic of a given document or stream of words. Is there are Naive Bayesian approach that i might be able to look up for this ?

Also, i am trying to improve my dictionary as i go along. Initially, i have a bunch of words that map to a topics (hard-coded). Depending on the occurrence of the words other than the ones that are already mapped. And depending on the occurrences of these words i want to add them to the mappings, hence improving and learning about new words that map to topic. And also changing the probabilities of words.

How should i go about doing this ? Is my approach the right one ?

Which programming language would be best suited for the implementation ?

© Stack Overflow or respective owner

Related posts about bayesian

naive bayesian spam filter question

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi guys, I am planning to implement spam filter using Naive Bayesian classification model. Online I see a lot of info on Naive Bayesian classification, but the problem is its a lot of mathematical stuff, than clearly stating how its done. And the problem is I am more of a programmer than a mathematician… >>> More
Naive Bayesian spam filtering effectiveness

as seen on Stack Overflow - Search for 'Stack Overflow'
How effective is naive Bayesian filtering for filtering spam? I heard that spammers easily bypass them by stuffing extra non-spam-related words. What programming techniques can you use with Bayesian filters to prevent that? >>> More
ClassNotFoundException error in implementing Bayesian algorithm in Apache Mahout on Hadoop

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I have a problem in executing the Bayesian algorithm in Mahout. I built it with Maven and the job file is in target directory. When run from terminal using hadoop, I'm getting the ClassNotFoundException error. What should be done? $HADOOP_HOME/bin/hadoop jar mahout-core-0.3-SNAPSHOT.job… >>> More
Any Naive Bayesian Classifier in python?

as seen on Stack Overflow - Search for 'Stack Overflow'
Dear Everyone I have tried the Orange Framework for Naive Bayesian classification. The methods are extremely unintuitive, and the documentation is extremely unorganized. Does anyone here have another framework to recommend? I use mostly NaiveBayesian for now. I was thinking of using nltk's NaiveClassification… >>> More
Bayesian filtering for forum posts

as seen on Stack Overflow - Search for 'Stack Overflow'
Has anyone used a Bayesian filter to let forum members classify posts and so over time only display interesting posts? A Bayesian filter seems to work well for detecting email spam. Is this a viable approach to filter forum posts for users? >>> More

Related posts about machine-learning

Machine learning challenge: diagnosing program in java/groovy (datamining, machine learning)

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi All! I'm planning to develop program in Java which will provide diagnosis. The data set is divided into two parts one for training and the other for testing. My program should learn to classify from the training data (BTW which contain answer for 30 questions each in new column, each record in… >>> More
Is it possible to predict future using machine learning and/or AI?

as seen on Programmers - Search for 'Programmers'
Recently I have started reading about machine learning. From 3000 feet view, machine learning seems really great thing but as if now I have found that machine learning is limited to only 3 types of algorithms namely classification, clustering and recommendations. I would like to know if my assumption… >>> More
Design for a machine learning artificial intelligence framework

as seen on Stack Overflow - Search for 'Stack Overflow'
This is a community wiki which aims to provide a good design for a machine learning/artificial intelligence framework (ML/AI framework). Please contribute to the design of a language-agnostic framework which would allow multiple ML/AI algorithms to be plugged into a single framework which: runs… >>> More
Design for a machine learning artificial intelligence framework (community wiki)

as seen on Stack Overflow - Search for 'Stack Overflow'
This is a community wiki which aims to provide a good design for a machine learning/artificial intelligence framework (ML/AI framework). Please contribute to the design of a language-agnostic framework which would allow multiple ML/AI algorithms to be plugged into a single framework which: runs… >>> More
A good machine learning technique to weed out good URLs from bad

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I have an application that needs to discriminate between good HTTP GET requests and bad. For example: http://somesite.com?passes=dodgy+parameter # BAD http://anothersite.com?passes=a+good+parameter # GOOD My system can make a binary decision about whether or not a… >>> More