Search Results

Search found 5079 results on 204 pages for 'bankers algorithm'.

Page 81/204 | < Previous Page | 77 78 79 80 81 82 83 84 85 86 87 88  | Next Page >

  • Ideas for designing an automated content tagging system needed

    - by Benjamin Smith
    I am currently designing a website that amongst other is required to display and organise small amounts of text content (mainly quotes, article stubs, etc.). I currently have a database with 250,000+ items and need to come up with a method of tagging each item with relevant tags which will eventually allow for easy searching/browsing of the content for users. A very simplistic idea I have (and one that I believe is employed by some sites that I have been looking to for inspiration (http://www.brainyquote.com/quotes/topics.html)), is to simply search the database for certain words or phrases and use these words as tags for the content. This can easily be extended so that if for example a user wanted to show all items with a theme of love then I would just return a list of items with words and phrases relating to this theme. This would not be hard to implement but does not provide very good results. For example if I were to search for the month 'May' in the database with the aim of then classifying the items returned as realting to the topic of Spring then I would get back all occurrences of the word May, regardless of the semantic meaning. Another shortcoming of this method is that I believe it would be quite hard to automate the process to any large scale. What I really require is a library that can take an item, break it down and analyse the semantic meaning and also return a list of tags that would correctly classify the item. I know this is a lot to ask and I have a feeling I will end up reverting to the aforementioned method but I just thought I should ask if anyone knew of any pre-existing solution. I think that as the items in the database are short then it is probably quite a hard task to analyse any meaning from them however I may be mistaken. Another path to possibly go down would be to use something like amazon turk to outsource the task which may produce good results but would be expensive. Eventually I would like users to be able to (and want to!) tag content and to vote for the most relevant tags, possibly using a gameification mechanic as motivation however this is some way down the line. A temporary fix may be the best thing if this were the route I decided to go down as I could use the rough results I got as the starting point for a more in depth solution. If you've read this far, thanks for sticking with me, I know I'm spitballing but any input would be really helpful. Thanks.

    Read the article

  • Converting to a column oriented array in Java

    - by halfwarp
    Although I have Java in the title, this could be for any OO language. I'd like to know a few new ideas to improve the performance of something I'm trying to do. I have a method that is constantly receiving an Object[] array. I need to split the Objects in this array through multiple arrays (List or something), so that I have an independent list for each column of all arrays the method receives. Example: List<List<Object>> column-oriented = new ArrayList<ArrayList<Object>>(); public void newObject(Object[] obj) { for(int i = 0; i < obj.length; i++) { column-oriented.get(i).add(obj[i]); } } Note: For simplicity I've omitted the initialization of objects and stuff. The code I've shown above is slow of course. I've already tried a few other things, but would like to hear some new ideas. How would you do this knowing it's very performance sensitive?

    Read the article

  • List of Big-O for PHP functions?

    - by Kendall Hopkins
    After using PHP for a while now, I've noticed that not all PHP built in functions as fast as expected. Consider the below two possible implementations of a function that finds if a number is prime using a cached array of primes. //very slow for large $prime_array $prime_array = array( 2, 3, 5, 7, 11, 13, .... 104729, ... ); $result_array = array(); foreach( $array_of_number => $number ) { $result_array[$number] = in_array( $number, $large_prime_array ); } //still decent performance for large $prime_array $prime_array => array( 2 => NULL, 3 => NULL, 5 => NULL, 7 => NULL, 11 => NULL, 13 => NULL, .... 104729 => NULL, ... ); foreach( $array_of_number => $number ) { $result_array[$number] = array_key_exists( $number, $large_prime_array ); } This is because in_array is implemented with a linear search O(n) which will linearly slow down as $prime_array grows. Where the array_key_exists function is implemented with a hash lookup O(1) which will not slow down unless the hash table gets extremely populated (in which case it's only O(logn)). So far I've had to discover the big-O's via trial and error, and occasionally looking at the source code. Now for the question... I was wondering if there was a list of the theoretical (or practical) big O times for all* the PHP built in functions. *or at least the interesting ones For example find it very hard to predict what the big O of functions listed because the possible implementation depends on unknown core data structures of PHP: array_merge, array_merge_recursive, array_reverse, array_intersect, array_combine, str_replace (with array inputs), etc.

    Read the article

  • Naive Bayesian classification (spam filtering) - Doubt in one calculation? Which one is right? Plz c

    - by Microkernel
    Hi guys, I am implementing Naive Bayesian classifier for spam filtering. I have doubt on some calculation. Please clarify me what to do. Here is my question. In this method, you have to calculate P(S|W) - Probability that Message is spam given word W occurs in it. P(W|S) - Probability that word W occurs in a spam message. P(W|H) - Probability that word W occurs in a Ham message. So to calculate P(W|S), should I do (1) (Number of times W occuring in spam)/(total number of times W occurs in all the messages) OR (2) (Number of times word W occurs in Spam)/(Total number of words in the spam message) So, to calculate P(W|S), should I do (1) or (2)? (I thought it to be (2), but I am not sure, so plz clarify me) I am refering http://en.wikipedia.org/wiki/Bayesian_spam_filtering for the info by the way. I got to complete the implementation by this weekend :( Thanks and regards, MicroKernel :) @sth: Hmm... Shouldn't repeated occurrence of word 'W' increase a message's spam score? In the your approach it wouldn't, right?. Lets take a scenario and discuss... Lets say, we have 100 training messages, out of which 50 are spam and 50 are Ham. and say word_count of each message = 100. And lets say, in spam messages word W occurs 5 times in each message and word W occurs 1 time in Ham message. So total number of times W occuring in all the spam message = 5*50 = 250 times. And total number of times W occuring in all Ham messages = 1*50 = 50 times. Total occurance of W in all of the training messages = (250+50) = 300 times. So, in this scenario, how do u calculate P(W|S) and P(W|H) ? Naturally we should expect, P(W|S) P(W|H)??? right. Please share your thought...

    Read the article

  • Base X string encoding

    - by Paul Stone
    I'm looking for a routine that will encode a string (stream of bytes) into an arbitrary base/alphabet (like base64 encoding but I get to choose the alphabet). I've seen a few routines that do base X encoding for a number, but not for a string.

    Read the article

  • Why does Java's hashCode() in String use 31 as a multiplier?

    - by jacobko
    In Java, the hash code for a String object is computed as s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. Why is 31 used as a multiplier? I understand that the multiplier should be a relatively large prime number. So why not 29, or 37, or even 97?

    Read the article

  • How does Amazon's Statistically Improbable Phrases work?

    - by ??iu
    How does something like Statistically Improbable Phrases work? According to amazon: Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book. SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements. For instance, for Joel's first book, the SIPs are: leaky abstractions, antialiased text, own dog food, bug count, daily builds, bug database, software schedules One interesting complication is that these are phrases of either 2 or 3 words. This makes things a little more interesting because these phrases can overlap with or contain each other.

    Read the article

  • Genetic programming in c++, library suggestions?

    - by shuttle87
    I'm looking to add some genetic algorithms to an Operations research project I have been involved in. Currently we have a program that aids in optimizing some scheduling and we want to add in some heuristics in the form of genetic algorithms. Are there any good libraries for generic genetic programming/algorithms in c++? Or would you recommend I just code my own? I should add that while I am not new to c++ I am fairly new to doing this sort of mathematical optimization work in c++ as the group I worked with previously had tended to use a proprietary optimization package. We have a fitness function that is fairly computationally intensive to evaluate and we have a cluster to run this on so parallelized code is highly desirable. So is c++ a good language for this? If not please recommend some other ones as I am willing to learn another language if it makes life easier. thanks!

    Read the article

  • question about api functions

    - by davit-datuashvili
    i have question we have API functions in java can user create it's own function and add to his java IDE? for example i am using netbeans can i create my own function add to netbean IDE?let say create binary function or something else thanks

    Read the article

  • question about qsort in c++

    - by davit-datuashvili
    i have following code in c++ #include <iostream> using namespace std; void qsort5(int a[],int n){ int i; int j; if (n<=1) return; for (i=1;i<n;i++) j=0; if (a[i]<a[0]) swap(++j,i,a); swap(0,j,a); qsort5(a,j); qsort(a+j+1,n-j-1); } int main() { return 0; } void swap(int i,int j,int a[]) { int t=a[i]; a[i]=a[j]; a[j]=t; } i have problem 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(13) : error C2780: 'void std::swap(std::basic_string<_Elem,_Traits,_Alloc> &,std::basic_string<_Elem,_Traits,_Alloc> &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\xstring(2203) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(13) : error C2780: 'void std::swap(std::pair<_Ty1,_Ty2> &,std::pair<_Ty1,_Ty2> &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\utility(76) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(13) : error C2780: 'void std::swap(_Ty &,_Ty &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\utility(16) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(14) : error C2780: 'void std::swap(std::basic_string<_Elem,_Traits,_Alloc> &,std::basic_string<_Elem,_Traits,_Alloc> &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\xstring(2203) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(14) : error C2780: 'void std::swap(std::pair<_Ty1,_Ty2> &,std::pair<_Ty1,_Ty2> &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\utility(76) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(14) : error C2780: 'void std::swap(_Ty &,_Ty &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\utility(16) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(16) : error C2661: 'qsort' : no overloaded function takes 2 arguments 1>Build log was saved at "file://c:\Users\dato\Documents\Visual Studio 2008\Projects\qsort5\qsort5\Debug\BuildLog.htm" please help

    Read the article

  • Convert arbitrary size of byte[] to BigInteger[] and then safely convert back to exactly the same by

    - by PatlaDJ
    I believe conversion exactly to BigInteger[] would be optimal in my case. Anyone had done or found this written in Java and willing to share? So imagine I have arbitrary size byte[] = {0xff,0x3e,0x12,0x45,0x1d,0x11,0x2a,0x80,0x81,0x45,0x1d,0x11,0x2a,0x80,0x81} How do I convert it to array of BigInteger's and then be able to recover it back the original byte array safely? ty in advance.

    Read the article

  • Position elements without overlap

    - by eWolf
    I have a number of rectangular elements that I want to position in a 2D space. I calculate an ideal position for each element. Now my problem is that many elements overlap as very often the ideal positions are concentrated in one region. I want to avoid overlap as much as possible (doesn't have to be perfect, though). How can I do this? I've heard physics simulations are suitable for this - is that correct? And can anyone provide an example/tutorial? By the way: I'm using XNA, if you know any .NET library that does exactly this job - tell me!

    Read the article

  • How do I find all paths through a set of given nodes in a DAG?

    - by Hanno Fietz
    I have a list of items (blue nodes below) which are categorized by the users of my application. The categories themselves can be grouped and categorized themselves. The resulting structure can be represented as a Directed Acyclic Graph (DAG) where the items are sinks at the bottom of the graph's topology and the top categories are sources. Note that while some of the categories might be well defined, a lot is going to be user defined and might be very messy. Example: On that structure, I want to perform the following operations: find all items (sinks) below a particular node (all items in Europe) find all paths (if any) that pass through all of a set of n nodes (all items sent via SMTP from example.com) find all nodes that lie below all of a set of nodes (intersection: goyish brown foods) The first seems quite straightforward: start at the node, follow all possible paths to the bottom and collect the items there. However, is there a faster approach? Remembering the nodes I already passed through probably helps avoiding unnecessary repetition, but are there more optimizations? How do I go about the second one? It seems that the first step would be to determine the height of each node in the set, as to determine at which one(s) to start and then find all paths below that which include the rest of the set. But is this the best (or even a good) approach? The graph traversal algorithms listed at Wikipedia all seem to be concerned with either finding a particular node or the shortest or otherwise most effective route between two nodes. I think both is not what I want, or did I just fail to see how this applies to my problem? Where else should I read?

    Read the article

  • Is there a name for the technique of using base-2 numbers to encode a list of unique options?

    - by Lunatik
    Apologies for the rather vague nature of this question, I've never been taught programming and Google is rather useless to a self-help guy like me in this case as the key words are pretty ambiguous. I am writing a couple of functions that encode and decode a list of options into a Long so they can easily be passed around the application, you know this kind of thing: 1 - Apple 2 - Orange 4 - Banana 8 - Plum etc. In this case the number 11 would represent Apple, Orange & Plum. I've got it working but I see this used all the time so assume there is a common name for the technique, and no doubt all sorts of best practice and clever algorithms that are at the moment just out of my reach.

    Read the article

  • Compact data structure for storing a large set of integral values

    - by Odrade
    I'm working on an application that needs to pass around large sets of Int32 values. The sets are expected to contain ~1,000,000-50,000,000 items, where each item is a database key in the range 0-50,000,000. I expect distribution of ids in any given set to be effectively random over this range. The operations I need on the set are dirt simple: Add a new value Iterate over all of the values. There is a serious concern about the memory usage of these sets, so I'm looking for a data structure that can store the ids more efficiently than a simple List<int>or HashSet<int>. I've looked at BitArray, but that can be wasteful depending on how sparse the ids are. I've also considered a bitwise trie, but I'm unsure how to calculate the space efficiency of that solution for the expected data. A Bloom Filter would be great, if only I could tolerate the false negatives. I would appreciate any suggestions of data structures suitable for this purpose. I'm interested in both out-of-the-box and custom solutions. EDIT: To answer your questions: No, the items don't need to be sorted By "pass around" I mean both pass between methods and serialize and send over the wire. I clearly should have mentioned this. There could be a decent number of these sets in memory at once (~100).

    Read the article

  • What is it about Fibonacci numbers?

    - by Ian Bishop
    Fibonacci numbers have become a popular introduction to recursion for Computer Science students and there's a strong argument that they persist within nature. For these reasons, many of us are familiar with them. They also exist within Computer Science elsewhere too; in surprisingly efficient data structures and algorithms based upon the sequence. There are two main examples that come to mind: Fibonacci heaps which have better amortized running time than binomial heaps. Fibonacci search which shares O(log N) running time with binary search on an ordered array. Is there some special property of these numbers that gives them an advantage over other numerical sequences? Is it a density quality? What other possible applications could they have? It seems strange to me as there are many natural number sequences that occur in other recursive problems, but I've never seen a Catalan heap.

    Read the article

  • incremental way of counting quantiles for large set of data

    - by Gacek
    I need to count the quantiles for a large set of data. Let's assume we can get the data only through some portions (i.e. one row of a large matrix). To count the Q3 quantile one need to get all the portions of the data and store it somewhere, then sort it and count the quantile: List<double> allData = new List<double>(); foreach(var row in matrix) // this is only example. In fact the portions of data are not rows of some matrix { allData.AddRange(row); } allData.Sort(); double p = 0.75*allData.Count; int idQ3 = (int)Math.Ceiling(p) - 1; double Q3 = allData[idQ3]; Now, I would like to find a way of counting this without storing the data in some separate variable. The best solution would be to count some parameters od mid-results for first row and then adjust it step by step for next rows. Note: These datasets are really big (ca 5000 elements in each row) The Q3 can be estimated, it doesn't have to be an exact value. I call the portions of data "rows", but they can have different leghts! Usually it varies not so much (+/- few hundred samples) but it varies! This question is similar to this one: http://stackoverflow.com/questions/1058813/on-line-iterator-algorithms-for-estimating-statistical-median-mode-skewness But I need to count quantiles. ALso there are few articles in this topic, i.e.: http://web.cs.wpi.edu/~hofri/medsel.pdf http://portal.acm.org/citation.cfm?id=347195&dl But before I would try to implement these, I wanted to ask you if there are maybe any other, qucker ways of counting the 0.25/0.75 quantiles?

    Read the article

  • algorithm q: Fuzzy matching of structured data

    - by user86432
    I have a fairly small corpus of structured records sitting in a database. Given a tiny fraction of the information contained in a single record, submitted via a web form (so structured in the same way as the table schema), (let us call it the test record) I need to quickly draw up a list of the records that are the most likely matches for the test record, as well as provide a confidence estimate of how closely the search terms match a record. The primary purpose of this search is to discover whether someone is attempting to input a record that is duplicate to one in the corpus. There is a reasonable chance that the test record will be a dupe, and a reasonable chance the test record will not be a dupe. The records are about 12000 bytes wide and the total count of records is about 150,000. There are 110 columns in the table schema and 95% of searches will be on the top 5% most commonly searched columns. The data is stuff like names, addresses, telephone numbers, and other industry specific numbers. In both the corpus and the test record it is entered by hand and is semistructured within an individual field. You might at first blush say "weight the columns by hand and match word tokens within them", but it's not so easy. I thought so too: if I get a telephone number I thought that would indicate a perfect match. The problem is that there isn't a single field in the form whose token frequency does not vary by orders of magnitude. A telephone number might appear 100 times in the corpus or 1 time in the corpus. The same goes for any other field. This makes weighting at the field level impractical. I need a more fine-grained approach to get decent matching. My initial plan was to create a hash of hashes, top level being the fieldname. Then I would select all of the information from the corpus for a given field, attempt to clean up the data contained in it, and tokenize the sanitized data, hashing the tokens at the second level, with the tokens as keys and frequency as value. I would use the frequency count as a weight: the higher the frequency of a token in the reference corpus, the less weight I attach to that token if it is found in the test record. My first question is for the statisticians in the room: how would I use the frequency as a weight? Is there a precise mathematical relationship between n, the number of records, f(t), the frequency with which a token t appeared in the corpus, the probability o that a record is an original and not a duplicate, and the probability p that the test record is really a record x given the test and x contain the same t in the same field? How about the relationship for multiple token matches across multiple fields? Since I sincerely doubt that there is, is there anything that gets me close but is better than a completely arbitrary hack full of magic factors? Barring that, has anyone got a way to do this? I'm especially keen on other suggestions that do not involve maintaining another table in the database, such as a token frequency lookup table :). This is my first post on StackOverflow, thanks in advance for any replies you may see fit to give.

    Read the article

  • R: Forecast package: Automatic algorithm for composite model involving ETS and AR

    - by phanikishan
    Hey, I would like to write a code involving automatic selection of a best composite model using ETS as well as autoregressive models. What is the criteria I should base my selection on? Also if I'm using the auto.arima function for deducing number of AR terms and corresponding coefficients from the forecast package in R, does my input series necessarily have to be stationary? or the value for d would be automatically selected thus returning a non-stationary model? Thanks, Phani

    Read the article

  • Clamping a vector to a minimum and maximum?

    - by user146780
    I came accross this: t = Clamp(t/d, 0, 1) but I'm not sure how to perform this operation on a vector. What are the steps to clamp a vector if one was writing their own vector implementation? Thanks clamp clamping a vector to a minimum and a maximum ex: pc = # the point you are coloring now p0 = # start point p1 = # end point v = p1 - p0 d = Length(v) v = Normalize(v) # or Scale(v, 1/d) v0 = pc - p0 t = Dot(v0, v) t = Clamp(t/d, 0, 1) color = (start_color * t) + (end_color * (1 - t))

    Read the article

  • Whats the difference between Paxos and W+R>=N in Cassandra?

    - by user1128016
    Dynamo-like databases (e.g. Cassandra) provide ability to enforce consistency by means of quorum, i.e. a number of synchronously written replicas (W) and a number of replicas to read (R) should be chosen in such a way that W+RN where N is a replication factor. On the other hand, PAXOS-based systems like Zookeeper are also used as a consistent fault-tolerant storage. What is the difference between these two approaches? Does PAXOS provide guarantees that are not provided by W+RN schema?

    Read the article

  • Echo mysql results in a loop?

    - by Roy D. Porter
    I am using turn.js to make a book. Every div within the 'deathnote' div becomes a new page. <div id="deathnote"> //starts book <div style="background-image:url(images/coverpage.jpg);"></div> //creates new page <div style="background-image:url(images/paper.jpg);"></div> //creates new page <div style="background-image:url(images/paper.jpg);"></div> //creates new page </div> //ends book What I am doing is trying to get 3 'content' (content being a name and cause of death) divs onto 1 page, and then generate a new page. So here is what i want: <div id="deathnote"> //starts book <div style="background-image:url(images/coverpage.jpg);"></div> //creates new page <div style="background-image:url(images/paper.jpg);"></div> //creates new page <div style="background-image:url(images/paper.jpg);"> //creates new page but leaves it open <div> CONTENT </div> <div> CONTENT </div> <div> CONTENT </div> </div> //ends the page </div> //ends book Seems simple enough, however the content is data from a MySQL DB, so i have to echo it in using PHP. Here is what i have so far <div id="deathnote"> <div style="background-image:url(images/coverpage.jpg);"></div> <div style="background-image:url(images/paper.jpg);"></div> <div style="background-image:url(images/paper.jpg);"></div> <div style="background-image:url(images/paper.jpg);"></div> <div style="background-image:url(images/paper.jpg);"></div> <div style="background-image:url(images/paper.jpg);"></div> <?php $pagecount = 0; $db = new mysqli('localhost', 'username', 'passw', 'DB'); if($db->connect_errno > 0){ die('Unable to connect to database [' . $db->connect_error . ']'); } $sql = <<<SQL SELECT * FROM `TABLE` SQL; if(!$result = $db->query($sql)){ die('There was an error running the query [' . $db->error . ']'); } //IGNORE ALL OF THE GARBAGE ABOVE. IT IS SIMPLE CONNECTING SCRIPT THAT I KNOW WORKS //THE METHOD I AM HAVING TROUBLE WITH IS BELOW $pagecount = 0; while($row = $result->fetch_assoc()){ //GETS THE VALUE (and makes sure it isn't nothing echo '<div style="background-image:url(images/paper.jpg);">'; //THIS OPENS A NEW PAGE while ($pagecount !== 3) { //KEEPS COUNT OF HOW MUCH CONTENT DIVS IS ON THE PAGE while($row = $result->fetch_assoc()){ //START A CONTENT DIV echo '<div class="content"><div class="name">' . $row['victim'] . '</div><div class="cod">' . $row['cod'] . '</div></div>'; //END A CONTENT DIV $pagecount++; //UP THE PAGE COUNT } } $pagecount=0; //PUT IT BACK TO 0 echo '</div>'; //END PAGE } $db->close(); ?> <div style="background-image:url(images/backpage.jpg);"></div> //BACK PAGE </div> At the moment i seem to be causing and infinite loop so the page won't load. The problem resides within the while loops. Any help is greatly appreciated. Thanks in advance guys. :)

    Read the article

< Previous Page | 77 78 79 80 81 82 83 84 85 86 87 88  | Next Page >