Search Results

Search found 442 results on 18 pages for 'the naive'.

Page 1/18 | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • naive bayesian spam filter question

    - by Microkernel
    Hi guys, I am planning to implement spam filter using Naive Bayesian classification model. Online I see a lot of info on Naive Bayesian classification, but the problem is its a lot of mathematical stuff, than clearly stating how its done. And the problem is I am more of a programmer than a mathematician (yes I had learnt Probability and Bayesian theorem back in school, but out of touch for a long long time, and I don't have luxury of learning it now (Have nearly 3 weeks to come-up with a working prototype)). So if someone can explain or point me to location where its explained for programmers than a mathematician, it would be a great help. PS: By the way I have to implement it in C, if you want to know. :( Regards, Microkernel

    Read the article

  • Naive Bayesian for Topic detection using "Bag of Words" approach

    - by AlgoMan
    I am trying to implement a naive bayseian approach to find the topic of a given document or stream of words. Is there are Naive Bayesian approach that i might be able to look up for this ? Also, i am trying to improve my dictionary as i go along. Initially, i have a bunch of words that map to a topics (hard-coded). Depending on the occurrence of the words other than the ones that are already mapped. And depending on the occurrences of these words i want to add them to the mappings, hence improving and learning about new words that map to topic. And also changing the probabilities of words. How should i go about doing this ? Is my approach the right one ? Which programming language would be best suited for the implementation ?

    Read the article

  • choose the best class if 2 class have same P (c|d), naive bayes

    - by ryandi
    Hello I have some question about naive bayes classifier . In my project I have to classify a text into a class from 4 available class. In naive bayes we have formula like cmap=argmax.P(d|c).P(c) I have standarize the amount of training document of each class, so I got a same P(c) value for each class (0.25). Here's my question: What if a testing document token doesn't have any token which belong to any of those 4 class(in document training)? Resulted to all of the class have same value of P(d|c).P(c). Which class should i pick? What if the token exist, and 2 class or more have same value of P(d|c).P(c) what should I do? Thank you..

    Read the article

  • Implementing Naïve Bayes algorithm in Java - Need some guidance

    - by techventure
    hello stackflow people As a School assignment i'm required to implement Naïve Bayes algorithm which i am intending to do in Java. In trying to understand how its done, i've read the book "Data Mining - Practical Machine Learning Tools and Techniques" which has a section on this topic but am still unsure on some primary points that are blocking my progress. Since i'm seeking guidance not solution in here, i'll tell you guys what i thinking in my head, what i think is the correct approach and in return ask for correction/guidance which will very much be appreciated. please note that i am an absolute beginner on Naïve Bayes algorithm, Data mining and in general programming so you might see stupid comments/calculations below: The training data set i'm given has 4 attributes/features that are numeric and normalized(in range[0 1]) using Weka (no missing values)and one nominal class(yes/no) 1) The data coming from a csv file is numeric HENCE * Given the attributes are numeric i use PDF (probability density function) formula. + To calculate the PDF in java i first separate the attributes based on whether they're in class yes or class no and hold them into different array (array class yes and array class no) + Then calculate the mean(sum of the values in row / number of values in that row) and standard divination for each of the 4 attributes (columns) of each class + Now to find PDF of a given value(n) i do (n-mean)^2/(2*SD^2), + Then to find P( yes | E) and P( no | E) i multiply the PDF value of all 4 given attributes and compare which is larger, which indicates the class it belongs to In temrs of Java, i'm using ArrayList of ArrayList and Double to store the attribute values. lastly i'm unsure how to to get new data? Should i ask for input file (like csv) or command prompt and ask for 4 values? I'll stop here for now (do have more questions) but I'm worried this won't get any responses given how long its got. I will really appreciate for those that give their time reading my problems and comment.

    Read the article

  • Any Naive Bayesian Classifier in python?

    - by asldkncvas
    Dear Everyone I have tried the Orange Framework for Naive Bayesian classification. The methods are extremely unintuitive, and the documentation is extremely unorganized. Does anyone here have another framework to recommend? I use mostly NaiveBayesian for now. I was thinking of using nltk's NaiveClassification but then they don't think they can handle continuous variables. What are my options?

    Read the article

  • A simple explanation of Naive Bayes Classification

    - by Jaggerjack
    I am finding it hard to understand the process of Naive Bayes, and I was wondering if someone could explained it with a simple step by step process in English. I understand it takes comparisons by times occurred as a probability, but I have no idea how the training data is related to the actual dataset. Please give me an explanation of what role the training set plays. I am giving a very simple example for fruits here, like banana for example training set--- round-red round-orange oblong-yellow round-red dataset---- round-red round-orange round-red round-orange oblong-yellow round-red round-orange oblong-yellow oblong-yellow round-red

    Read the article

  • Can SQLAlchemy DateTime Objects Only Be Naive?

    - by Sean M
    I am working with SQLAlchemy, and I'm not yet sure which database I'll use under it, so I want to remain as DB-agnostic as possible. How can I store a timezone-aware datetime object in the DB without tying myself to a specific database? Right now, I'm making sure that times are UTC before I store them in the DB, and converting to localized at display-time, but that feels inelegant and brittle. Is there a DB-agnostic way to get a timezone-aware datetime out of SQLAlchemy instead of getting naive datatime objects out of the DB?

    Read the article

  • Naive question about implementing RSS

    - by interstar
    I have a naive question about RSS feeds. I have a series of timed events which appear on my site and that I make available as an RSS feed for other applications to import. Who is typically responsible for truncating this feed? Over the next year, I can see my feed having thousands of items. Should the URL mysite.com/rss always return all items? And leave it to the readers to just show the most recent? Or is it more customary that I only return, say, the top 50? Expecting the readers to cache older items? (And, if so, is there a convention for readers to ask the server for the "next page")? What is the typical behaviour of something like FriendFeed when it pulls in an RSS stream?

    Read the article

  • Naive Bayes matlab, row classification

    - by Jungle Boogie
    How do you classify a row of seperate cells in matlab? Atm I can classify single coloums like so: training = [1;0;-1;-2;4;0;1]; % this is the sample data. target_class = ['posi';'zero';'negi';'negi';'posi';'zero';'posi']; % target_class are the different target classes for the training data; here 'positive' and 'negetive' are the two classes for the given training data % Training and Testing the classifier (between positive and negative) test = 10*randn(25, 1); % this is for testing. I am generating random numbers. class = classify(test,training, target_class, 'diaglinear') % This command classifies the test data depening on the given training data using a Naive Bayes classifier Unlike the above im looking at wanting to classify: A B C Row A | 1 | 1 | 1 = a house Row B | 1 | 2 | 1 = a garden Can anyone help? Here is a code example from matlabs site: nb = NaiveBayes.fit(training, class) nb = NaiveBayes.fit(..., 'param1',val1, 'param2',val2, ...) I dont understand what param1 is or what val1 etc should be?

    Read the article

  • Name for a "naive" timekeeping system?

    - by Robert L
    I am thinking of a "naive" timekeeping system of the sort I believe would be likely to be implemented by non-specialists. A day is exactly 24 hours. An hour is exactly 60 minutes. A minute is exactly 60 seconds. No exceptions (i.e. no Daylight Saving or leap seconds). A leap year occurs exactly once every four years: if the year modulo 4 equals 0, it is a leap year. The month lengths are the normal 31 days for January, 28 or 29 days for February, etc., that you would expect to find on a wall calendar. Days of the week, if they are used, are what you would get by taking your contemporary (late 1900's / early 2000's) wall calendar and, using the above rules for leap years and month lengths, extrapolating in both directions: if the calendar goes far back enough, February 29, 1900 exists and is a Wednesday; and if the calendar goes far forward enough, February 29, 2100 exists and is a Monday. What name, if any, is used to describe precisely this system?

    Read the article

  • Mahout Naive Bayes Classifier for Items

    - by Nimesh Parikh
    Team, I am working on a project where i need to classify Items into certain category. I have a single file as input; which contains target variable and space separated features. My training data will look like Category Name [Tab] DataString Plumbing [Tab] Pipe Tap Plastic Pipe PVC Pipe Cold Water Line Hot Water Line Tee outlet up Elbow turned up Elbow turned down Gate valve Globe valve Paint [Tab] Ivory Black Burnt Umber Caput Mortuum Violet Earth Red Yellow Ochre Titanium White Cadmium Yellow Light Cadmium Yellow Deep Cloths [Tab] Shirt T-Shirt Pent Jeans Tee Cargo Well, I have really big set of Category. I have couple of question here am i using correct data for Training? If no then what should i use? Once I train and Test my model, what is next step? How can i use output? Please help me with this Thanks, Nimesh

    Read the article

  • Naive question of memory references in Operating system

    - by darkie15
    Hi All, I am learning memory references pertaining to Operating systems and don't seem to get to the crux of understanding it. For example, I am not able to visualize this scenario properly: "A 36 bit address employs both paging and segmentation. Both PTE and STE are 4 bytes each". How are they related? I can guess that this question might be too simple for many. But any help understanding the above basic concept would be appreciable. Regards, darkie15

    Read the article

  • please help me to interpret the naive bayes result in weka..

    - by resmi
    Anybody please help me to interpret the following result generated in weka for classification using naive bayes.....Please explain clearly what is this Normal Distribution , Mean , StandardDev , WeightSum and Precision.Please help me.Am new in weka. ** Naive Bayes Classifier Class Normal: Prior probability = 0.5 1374195_at: Normal Distribution. Mean = 218.06 StandardDev = 6.0572 WeightSum = 3 Precision = 36.34333334 1373315_at: Normal Distribution. Mean = 1142.58 StandardDev = 21.1589 WeightSum = 3 Precision = 126.95333339999999

    Read the article

  • Why should the "prime-based" hashcode implmentation be used instead of the "naive" one?

    - by Wilhelm
    I have seen that a prime number implmentation of the GetHashCode function is being recommend, for example here. However using the following code (in VB, sorry), it seems as if that implementation gives the same hash density as a "naive" xor implementation. If the density is the same, I would suppose there is the same probability of cllision in both implementations. Am I missing anything on why is the prime approach preferred? I am supossing that if the hash code is a byte I do not lose generality for the integer case. Sub Main() Dim XorHashes(255) As Integer Dim PrimeHashes(255) As Integer For i = 0 To 255 For j = 0 To 255 For k = 0 To 255 XorHashes(GetXorHash(i, j, k)) += 1 PrimeHashes(GetPrimeHash(i, j, k)) += 1 Next Next Next For i = 0 To 255 Console.WriteLine("{0}: {1}, {2}", i, XorHashes(i), PrimeHashes(i)) Next Console.ReadKey() End Sub Public Function GetXorHash(ByVal valueOne As Integer, ByVal valueTwo As Integer, ByVal valueThree As Integer) As Byte Return CByte((valueOne Xor valueTwo Xor valueThree) Mod 256) End Function Public Function GetPrimeHash(ByVal valueOne As Integer, ByVal valueTwo As Integer, ByVal valueThree As Integer) As Byte Dim TempHash = 17 TempHash = 31 * TempHash + valueOne TempHash = 31 * TempHash + valueTwo TempHash = 31 * TempHash + valueThree Return CByte(TempHash Mod 256) End Function

    Read the article

  • Naive Bayesian classification (spam filtering) - Doubt in one calculation? Which one is right? Plz c

    - by Microkernel
    Hi guys, I am implementing Naive Bayesian classifier for spam filtering. I have doubt on some calculation. Please clarify me what to do. Here is my question. In this method, you have to calculate P(S|W) - Probability that Message is spam given word W occurs in it. P(W|S) - Probability that word W occurs in a spam message. P(W|H) - Probability that word W occurs in a Ham message. So to calculate P(W|S), should I do (1) (Number of times W occuring in spam)/(total number of times W occurs in all the messages) OR (2) (Number of times word W occurs in Spam)/(Total number of words in the spam message) So, to calculate P(W|S), should I do (1) or (2)? (I thought it to be (2), but I am not sure, so plz clarify me) I am refering http://en.wikipedia.org/wiki/Bayesian_spam_filtering for the info by the way. I got to complete the implementation by this weekend :( Thanks and regards, MicroKernel :) @sth: Hmm... Shouldn't repeated occurrence of word 'W' increase a message's spam score? In the your approach it wouldn't, right?. Lets take a scenario and discuss... Lets say, we have 100 training messages, out of which 50 are spam and 50 are Ham. and say word_count of each message = 100. And lets say, in spam messages word W occurs 5 times in each message and word W occurs 1 time in Ham message. So total number of times W occuring in all the spam message = 5*50 = 250 times. And total number of times W occuring in all Ham messages = 1*50 = 50 times. Total occurance of W in all of the training messages = (250+50) = 300 times. So, in this scenario, how do u calculate P(W|S) and P(W|H) ? Naturally we should expect, P(W|S) P(W|H)??? right. Please share your thought...

    Read the article

  • Switching to a career in Machine Learning

    - by Naive Machine Learner
    My day job is plain old software development. I am also doing my Masters in CS (part time, course based). I took a course on AI and found machine learning quite fascinating but like most courses it only offered a basic intro. I intend to learn more about Machine Learning and if possible get a job in that field. When I look at job postings in this field it is clear that a Phd in Machine learning (or prior experience in the field with considerable expertise) is required for most of them. I'm looking for advice on self learning to gain experience that'll useful in industry. At least, enough experience to get my foot in. I will do the obvious ones like reading text books, papers etc. Perhaps any open source efforts that I can participate in or something I could do on my own? Apologies if I'm being vague here but I hope there are at least a few of you who done a similar switch and can advise. Thanks !

    Read the article

  • How do I tell my boss he made the wrong choice? [migrated]

    - by SomeKittens
    Recently, our biggest product failed majorly because we'd only used outsourced labor to do it, and they never tested anything, etc. Finally, our CEO decided that the US team should learn the code and fix it up. (Not a total rewrite, but lots of formatting/style changes, refactoring, etc). However, he knows next to nothing about programming (thankfully, he admits it). He had been grooming me to take on the project manager position, but I had to go back to college. Now he gave it to another programmer who is naive and inexperienced. I don't feel the naive programmer will do nearly as well. The CEO's reasoning is that the naive programmer can work full time and I can only do part time, so the less senior programmer could put more work into it. How can I convince him that 15 hours of my time is worth more than the other guy's 40?

    Read the article

  • CanCan polymorphic resource access problem

    - by Call 'naive' True
    Hi everybody, i don't quite understand how to restrict access to links in this particular case with CanCan. I always get "Edit" link displayed. So i believe the problem is in my incorrect definition of cancan methods(load_ and authorize_). I have CommentsController like that: class CommentsController < ApplicationController before_filter :authenticate_user! load_resource :instance_name => :commentable authorize_resource :article def index @commentable = find_commentable #loading our generic object end ...... private def find_commentable params.each { |name, value| if name =~ /(.+)_id$/ return $1.classify.constantize.includes(:comments => :karma).find(value) end } end end and i have in comments/index.html.erb following code that render file from other controller: <%= render :file => "#{get_commentable_partial_name(@commentable)}/show.html.erb", :collection => @commentable %> you can think about "#{get_commentable_partial_name(@commentable)}" like just "articles" in this case. Content of "articles/show.html.erb": <% if can? :update, @commentable %> <%= link_to 'Edit', edit_article_path(@commentable) %> | <% end %> my ability.rb: class Ability include CanCan::Ability def initialize(user) user ||= User.new # guest user if user.role? :admin can :manage, :all elsif user.role? :author can :read, [Article, Comment, Profile] can :update, Article, :user_id => user.id end end end relations with models is: class Comment < ActiveRecord::Base belongs_to :commentable, :polymorphic => true, :dependent => :destroy ... end class Article < ActiveRecord::Base has_many :comments, :as => :commentable, :dependent => :destroy ... end i have tried debug this issue like that user = User.first article = Article.first ability = Ability.new(user) ability.can?(:update, article) and i always get "= true" in ability check Note: user.role == author and article.user_id != user.id if you need more information please write thank's for your time && sorry for my english

    Read the article

  • Shortest Common Superstring: find shortest string that contains all given string fragments

    - by occulus
    Given some string fragments, I would like to find the shortest possible single string ("output string") that contains all the fragments. Fragments can overlap each other in the output string. Example: For the string fragments: BCDA AGF ABC The following output string contains all fragments, and was made by naive appending: BCDAAGFABC However this output string is better (shorter), as it employs overlaps: ABCDAGF ^ ABC ^ BCDA ^ AGF I'm looking for algorithms for this problem. It's not absolutely important to find the strictly shortest output string, but the shorter the better. I'm looking for an algorithm better than the obvious naive one that would try appending all permutations of the input fragments and removing overlaps (which would appear to be NP-Complete). I've started work on a solution and it's proving quite interesting; I'd like to see what other people might come up with. I'll add my work-in-progress to this question in a while.

    Read the article

  • Unit testing a text index

    - by jplot
    Consider a text index such as a suffix tree or a suffix array supporting Count queries (number of occurrences of a pattern) and Locate queries (the positions of all the occurrences of a pattern) over a given text. How would you go about unit testing such a class ? What I have in mind is to generate a big random string then extract a random substring from this big string and compare the results of both queries with naive implementations (such as string::find). Another idea I have is to find the most frequent substring of length l appearing in the original string (using perhaps a naive method) and use these substrings for testing the index. This isn't the best way, so what would be a good design of the unit tests for a text index ? In case it matters, this is in C++ using google test.

    Read the article

  • Where in the stack is Software Restriction Policies implemented?

    - by Knox
    I am a big fan of Software Restriction Policies for Microsoft Windows and was recently updating our settings for this. I became curious as to where Microsoft implemented this technology in the stack. I can imagine a very naive implementation being in Windows Explorer where when you double click on an exe or other blocked file type, that Explorer would check against the policy. I call this naive because obviously this wouldn't protect against someone typing something in a CMD window. Or worse, Adobe Reader running an external application. On the other hand, I can imagine that software restriction policies could be implemented deep in the stack almost at the metal. In this case, the low level loader would load into memory the questionable file, but mark the memory in the memory manager as non-executable data. I'm pretty sure that Microsoft did not do the most naive implementation, because if I block Java using a path block, Internet Explorer will crash if it attempts to load Java. Which is what I want. But I'm not sure how deep in the stack it's implemented and any insight would be appreciated.

    Read the article

  • Finding the largest subtree in a BST

    - by rakeshr
    Given a binary tree, I want to find out the largest subtree which is a BST in it. Naive approach: I have a naive approach in mind where I visit every node of the tree and pass this node to a isBST function. I will also keep track of the number of nodes in a sub-tree if it is a BST. Is there a better approach than this ?

    Read the article

  • How can I force a ListView with a custom panel to re-measure when the ListView width goes below the

    - by Scott Whitlock
    Sorry for the long winded question (I'm including background here). If you just want the question, go to the end. I have a ListView with a custom Panel implementation that I'm using to implement something similar to a WrapPanel, but not quite. I'm overriding the MeasureOverride and ArrangeOverride methods in the custom panel. If I do the naive implementation of a WrapPanel in the MeasureOverride method it doesn't work when the ListView is resized. Let's say the custom panel does a measure and the constraint is a width of 100 and let's say I have 3 items that are 40 wide each. The naive approach is to return a size of 80,80 but when I resize the window that the ListView is in, down to say 75, it just turns on the horizontal scrollbar and never calls measure or arrange again (it does keep measuring and arranging if the width is greater than 80). To get around this, I hard coded the measurement to only have a width of the widest item. Then in the arrange, it gives me more space than I asked for and I use as much horizontal space as I can before wrapping. If I resize the window smaller than the smallest item in the ListView, then it turns on the scrollbar, which is great. Unfortunately this is causing a big problem when I have one of these ListViews with a custom panel nested inside of another one. The outside one works ok, but I can't get the inside one to "take as much as it needs". It always sizes to the smallest item, and the only way around it is to set the MinWidth to be something greater than zero. Anyway, stepping back for a second, I think the real way to fix this is to go back to the Naive implementation of the WrapPanel but force it to re-measure when the ListView width goes below the Size I previously returned as a measurement. That should solve my problem with the nested one. So, that's my question: I have a ListView with a custom panel If I return a measurement width on the panel and the ListView is resized to less than that width, it stops calling MeasureOverride How can I get it to continue calling MeasureOverride?

    Read the article

1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >