Search Results

Search found 5655 results on 227 pages for 'stl algorithm'.

Page 98/227 | < Previous Page | 94 95 96 97 98 99 100 101 102 103 104 105  | Next Page >

  • algorithm q: Fuzzy matching of structured data

    - by user86432
    I have a fairly small corpus of structured records sitting in a database. Given a tiny fraction of the information contained in a single record, submitted via a web form (so structured in the same way as the table schema), (let us call it the test record) I need to quickly draw up a list of the records that are the most likely matches for the test record, as well as provide a confidence estimate of how closely the search terms match a record. The primary purpose of this search is to discover whether someone is attempting to input a record that is duplicate to one in the corpus. There is a reasonable chance that the test record will be a dupe, and a reasonable chance the test record will not be a dupe. The records are about 12000 bytes wide and the total count of records is about 150,000. There are 110 columns in the table schema and 95% of searches will be on the top 5% most commonly searched columns. The data is stuff like names, addresses, telephone numbers, and other industry specific numbers. In both the corpus and the test record it is entered by hand and is semistructured within an individual field. You might at first blush say "weight the columns by hand and match word tokens within them", but it's not so easy. I thought so too: if I get a telephone number I thought that would indicate a perfect match. The problem is that there isn't a single field in the form whose token frequency does not vary by orders of magnitude. A telephone number might appear 100 times in the corpus or 1 time in the corpus. The same goes for any other field. This makes weighting at the field level impractical. I need a more fine-grained approach to get decent matching. My initial plan was to create a hash of hashes, top level being the fieldname. Then I would select all of the information from the corpus for a given field, attempt to clean up the data contained in it, and tokenize the sanitized data, hashing the tokens at the second level, with the tokens as keys and frequency as value. I would use the frequency count as a weight: the higher the frequency of a token in the reference corpus, the less weight I attach to that token if it is found in the test record. My first question is for the statisticians in the room: how would I use the frequency as a weight? Is there a precise mathematical relationship between n, the number of records, f(t), the frequency with which a token t appeared in the corpus, the probability o that a record is an original and not a duplicate, and the probability p that the test record is really a record x given the test and x contain the same t in the same field? How about the relationship for multiple token matches across multiple fields? Since I sincerely doubt that there is, is there anything that gets me close but is better than a completely arbitrary hack full of magic factors? Barring that, has anyone got a way to do this? I'm especially keen on other suggestions that do not involve maintaining another table in the database, such as a token frequency lookup table :). This is my first post on StackOverflow, thanks in advance for any replies you may see fit to give.

    Read the article

  • Position elements without overlap

    - by eWolf
    I have a number of rectangular elements that I want to position in a 2D space. I calculate an ideal position for each element. Now my problem is that many elements overlap as very often the ideal positions are concentrated in one region. I want to avoid overlap as much as possible (doesn't have to be perfect, though). How can I do this? I've heard physics simulations are suitable for this - is that correct? And can anyone provide an example/tutorial? By the way: I'm using XNA, if you know any .NET library that does exactly this job - tell me!

    Read the article

  • Non-Linear color interpolation?

    - by user146780
    If I have a straight line that mesures from 0 to 1, then I have colorA(255,0,0) at 0 on the line, then at 0.3 I have colorB(20,160,0) then at 1 on the line I have colorC(0,0,0). How could I find the color at 0.7? Thanks

    Read the article

  • How can I test if a point lies within a 3d shape with its surface defined by a point cloud?

    - by Ben
    Hi I have a collection of points which describe the surface of a shape that should be roughly spherical, and I need a method with which to determine if any other given point lies within this shape. I've previously been approximating the shape as an exact sphere, but this has proven too inaccurate and I need a more accurate method. Simplicity and speed is favourable over complete accuracy, a good approximation will suffice. I've come across techniques for converting a point cloud to a 3d mesh, but most things I have found have been very complicated, and I am looking for something as simple as possible. Any ideas? Many thanks, Ben.

    Read the article

  • simple plot algorithm with autoscale

    - by adrin
    I need to implement a simple plotting component in C#(WPF to be more precise). What i have is a collection of data samples containing time (X axis) and a value (both double types). I have a drawing canvas of a fixed size (Width x Height) and a DrawLine method/function that can draw on it. The problem I am facing now is how do I draw the plot so that it is autoscaled? In other words how do I map the samples I have to actual pixels on my Width x Height canvas?

    Read the article

  • k-combinations of a set of integers in ascending size order

    - by Adamski
    Programming challenge: Given a set of integers [1, 2, 3, 4, 5] I would like to generate all possible k-combinations in ascending size order in Java; e.g. [1], [2], [3], [4], [5], [1, 2], [1, 3] ... [1, 2, 3, 4, 5] It is fairly easy to produce a recursive solution that generates all combinations and then sort them afterwards but I imagine there's a more efficient way that removes the need for the additional sort.

    Read the article

  • Fast dot product for a very special case

    - by psihodelia
    Given a vector X of size L, where every scalar element of X is from a binary set {0,1}, it is to find a dot product z=dot(X,Y) if vector Y of size L consists of the integer-valued elements. I suggest, there must exist a very fast way to do it. Let's say we have L=4; X[L]={1, 0, 0, 1}; Y[L]={-4, 2, 1, 0} and we have to find z=X[0]*Y[0] + X[1]*Y[1] + X[2]*Y[2] + X[3]*Y[3] (which in this case will give us -4). It is obvious that X can be represented using binary digits, e.g. an integer type int32 for L=32. Then, all what we have to do is to find a dot product of this integer with an array of 32 integers. Do you have any idea or suggestions how to do it very fast?

    Read the article

  • Algorithm for sentence analysis and tokenization

    - by Andrea Nagar
    I need to analyze a document and compile statistics as to how many times each a sequence of words is used (so the analysis is not on single words but of batch of recurring words). I read that compression algorithms do something similar to what I want - creating dictionaries of blocks of text with a piece of information reporting its frequency. It should be something similar to http://www.codeproject.com/KB/recipes/Patterns.aspx Do you have anything written in C#?

    Read the article

  • Finding the largest subtree in a BST

    - by rakeshr
    Given a binary tree, I want to find out the largest subtree which is a BST in it. Naive approach: I have a naive approach in mind where I visit every node of the tree and pass this node to a isBST function. I will also keep track of the number of nodes in a sub-tree if it is a BST. Is there a better approach than this ?

    Read the article

  • how to elegantly duplicate a graph (neural network)

    - by macias
    I have a graph (network) which consists of layers, which contains nodes (neurons). I would like to write a procedure to duplicate entire graph in most elegant way possible -- i.e. with minimal or no overhead added to the structure of the node or layer. Or yet in other words -- the procedure could be complex, but the complexity should not "leak" to structures. They should be no complex just because they are copyable. I wrote the code in C#, so far it looks like this: neuron has additional field -- copy_of which is pointer the the neuron which base copied from, this is my additional overhead neuron has parameterless method Clone() neuron has method Reconnect() -- which exchanges connection from "source" neuron (parameter) to "target" neuron (parameter) layer has parameterless method Clone() -- it simply call Clone() for all neurons network has parameterless method Clone() -- it calls Clone() for every layer and then it iterates over all neurons and creates mappings neuron=copy_of and then calls Reconnect to exchange all the "wiring" I hope my approach is clear. The question is -- is there more elegant method, I particularly don't like keeping extra pointer in neuron class just in case of being copied! I would like to gather the data in one point (network's Clone) and then dispose it completely (Clone method cannot have an argument though).

    Read the article

  • question about LSD radix sort

    - by davit-datuashvili
    hello i have following code public class LSD{ public static int R=1<<8; public static int bytesword=4; public static void radixLSD(int a[],int l,int r){ int aux[]=new int[a.length]; for (int d=bytesword-1;d>=0;d--){ int i, j; int count[]=new int[R+1]; for ( j=0;j<R;j++) count[j]=0; for (i=l;i<=r;i++) count[digit(a[i],d)+1]++; for (j=1;j<R;j++) count[j]+=count[j-1]; for (i=l;i<=r;i++) aux[count[digit(a[i],d)]++]=a[i]; for (i=l;i<=r;i++) a[i]=aux[i-1]; } } public static void main(String[]args){ int a[]=new int[]{3,6,5,7,4,8,9}; radixLSD(a,0,a.length-1); for (int i=0;i<a.length;i++){ System.out.println(a[i]); } } public static int digit(int n,int d){ return (n>>d)&1; } } but it show me mistake java.lang.ArrayIndexOutOfBoundsException: -1 at LSD.radixLSD(LSD.java:19) at LSD.main(LSD.java:29) please help me

    Read the article

  • Whats the difference between Paxos and W+R>=N in Cassandra?

    - by user1128016
    Dynamo-like databases (e.g. Cassandra) provide ability to enforce consistency by means of quorum, i.e. a number of synchronously written replicas (W) and a number of replicas to read (R) should be chosen in such a way that W+RN where N is a replication factor. On the other hand, PAXOS-based systems like Zookeeper are also used as a consistent fault-tolerant storage. What is the difference between these two approaches? Does PAXOS provide guarantees that are not provided by W+RN schema?

    Read the article

  • question about mergesort

    - by davit-datuashvili
    i have write code on mergesort here is code public class mergesort{ public static int a[]; public static void merges(int work[],int low,int high){ if (low==high) return ; else{ int mid=(low+high)/2; merges(work,low,mid); merges(work,mid+1,high); merge(work,low,mid+1,high); } } public static void main(String[]args){ int a[]=new int[]{64,21,33,70,12,85,44,99,36,108}; merges(a,0,a.length-1); for (int i=0;i<a.length;i++){ System.out.println(a[i]); } } public static void merge(int work[],int low,int high,int upper){ int j=0; int l=low; int mid=high-1; int n=upper-l+1; while(low<=mid && high<=upper) if ( a[low]<a[high]) work[j++]=a[low++]; else work[j++]=a[high++]; while(low<=mid) work[j++]=a[low++]; while(high<=upper) work[j++]=a[high++]; for (j=0;j<n;j++) a[l+j]=work[j]; } } but it does nort work after compile this code here is mistake java.lang.NullPointerException at mergesort.merge(mergesort.java:45) at mergesort.merges(mergesort.java:12) at mergesort.merges(mergesort.java:10) at mergesort.merges(mergesort.java:10) at mergesort.merges(mergesort.java:10) at mergesort.main(mergesort.java:27)

    Read the article

  • Calculating holidays

    - by Ralph Shillington
    A number of holidays move around from year to year. For example, in Canada Victoria day (aka the May two-four weekend) is the Monday before May 25th, or Thanksgiving is the 2nd Monday of October (in Canada). I've been using variations on this Linq query to get the date of a holiday for a given year: var year = 2011; var month = 10; var dow = DayOfWeek.Monday; var instance = 2; var day = (from d in Enumerable.Range(1,DateTime.DaysInMonth(year,month)) let sample = new DateTime(year,month,d) where sample.DayOfWeek == dow select sample).Skip(instance-1).Take(1); While this works, and is easy enough to understand, I can imagine there is a more elegant way of making this calculation versus this brute force approach. Of course this doesn't touch on holidays such as Easter and the many other lunar based dates.

    Read the article

  • Pointer-based binary heap implementation

    - by Derek Chiang
    Is it even possible to implement a binary heap using pointers rather than an array? I have searched around the internet (including SO) and no answer can be found. The main problem here is that, how do you keep track of the last pointer? When you insert X into the heap, you place X at the last pointer and then bubble it up. Now, where does the last pointer point to? And also, what happens when you want to remove the root? You exchange the root with the last element, and then bubble the new root down. Now, how do you know what's the new "last element" that you need when you remove root again?

    Read the article

  • Finding the maximum weight subsequence of an array of positive integers?

    - by BeeBand
    I'm tring to find the maximum weight subsequence of an array of positive integers - the catch is that no adjacent members are allowed in the final subsequence. The exact same question was asked here, and a recursive solution was given by MarkusQ. He provides an explanation, but can anyone help me understand how he has expanded the function? How does this solution take into consideration non-adjacent members?

    Read the article

  • Revisions: algorithm and data structure

    - by SODA
    Hi, I need ideas for structuring and processing data with revisions. For example, I have a database of objects (e.g. cars). Each object has a number of properties, which can be arbitrary, so there's no a set schema to describe these objects. These objects are probably saved as key-value pairs. Now I need to change property of an object. I don't want to completely rewrite it - I want to be able to go back and see history of changes to these properties, that's why I want to add new property and keep the old one (so I guess a timestamp would do the job of telling which property is the latest). At the same time I want to be able to get info about any object in a snap, with only latest versions of each of the properties. Any ideas what would be the best approach? At least please point me in the right direction. Thanks!

    Read the article

  • How can I generate an "unlimited" world?

    - by snowlord
    I would like to create a game with an endless (in reality an extremely large) world in which the player can move about. Whether or not I will ever get around to implement the game is one matter, but I find the idea interesting and would like some input on how to do it. The point is to have a world where all data is generated randomly on-demand, but in a deterministic way. Currently I focus on a large 2D map from which it should be possible to display any part without knowledge about the surrounding parts. I have implemented a prototype by writing a function that gives a random-looking, but deterministic, integer given the x and y of a pixel on the map (see my recent question about this function). Using this function I populate the map with "random" values, and then I smooth the map using a simple filter based on the surrounding pixels. This makes the map dependent on a few pixels outside its edge, but that's not a big problem. The final result is something that at least looks like a map (especially with a good altitude color map). Given this, one could maybe first generate a coarser map which is used to generate bigger differences in altitude to create mountain ranges and seas. Anyway, that was my idea, but I am sure that there exist ways to do this already and I also believe that given the specification, many of you can come up with better ideas. EDIT: Forgot the link to my question.

    Read the article

  • Aligning music notes using String matching algorithms or Dynamic Programming

    - by Dolphin
    Hi I need to compare 2 sets of musical pieces (i.e. a playing-taken in MIDI format-note details extracted and saved in a database table, against sheet music-taken into XML format). When evaluating playing against sheet music (i.e.note details-pitch, duration, rhythm), note alignment needs to be done - to identify missed/extra/incorrect/swapped notes that from the reference (sheet music) notes. I have like 1800-2500 notes in one piece approx (can even be more-with polyphonic, right now I'm doing for monophonic). So will I have to have all these into an array? Will it be memory overloading or stack overflow? There are string matching algorithms like KMP, Boyce-Moore. But note alignment can also be done through Dynamic Programming. How can I use Dynamic Programming to approach this? What are the available algorithms? Is it about approximate string matching? Which approach is much productive? String matching algos like Boyce-Moore, or dynamic programming? How can I assess which is more effective? Greatly appreciate any insight or suggestions Thanks in advance

    Read the article

  • How does Amazon's Statistically Improbable Phrases work?

    - by ??iu
    How does something like Statistically Improbable Phrases work? According to amazon: Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book. SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements. For instance, for Joel's first book, the SIPs are: leaky abstractions, antialiased text, own dog food, bug count, daily builds, bug database, software schedules One interesting complication is that these are phrases of either 2 or 3 words. This makes things a little more interesting because these phrases can overlap with or contain each other.

    Read the article

  • Interview question: How do I detect a loop in this linked list?

    - by jjujuma
    Say you have a linked list structure in Java. It's made up of Nodes: class Node { Node next; // some user data } and each Node points to the next node, except for the last Node, which has null for next. Say there is a possibility that the list can contain a loop - i.e. the final Node, instead of having a null, has a reference to one of the nodes in the list which came before it. What's the best way of writing boolean hasLoop(Node first) which would return true if the given Node is the first of a list with a loop, and false otherwise? How could you write so that it takes a constant amount of space and a reasonable amount of time? Here's a picture of what a list with a loop looks like: Node->Node->Node->Node->Node->Node--\ \ | ----------------

    Read the article

  • Find a repeated numbers out of 3 boxes

    - by james1
    I have 3 boxes, each box contain 10 piece of numbered paper (1 - 10) but there is a number the same in all 3 boxes eg: box1 has number 4 and box2 has number 4 and box3 also has number 4. How to find that repeated number in java with an efficient/fastest way possible?

    Read the article

  • How would you write a program to find the shortest pangram in a list of words?

    - by jonathanasdf
    Given a list of words which contains the letters a-z at least once, how would you write a program to find the shortest pangram counted by number of characters (not counting spaces) as a combination of the words? Since I am not sure whether short answers exist, this is not code golf, but rather just a discussion of how you would approach this. However, if you think you can manage to write a short program that would do this, then go ahead, and this might turn into code golf :)

    Read the article

  • Is there a name for the technique of using base-2 numbers to encode a list of unique options?

    - by Lunatik
    Apologies for the rather vague nature of this question, I've never been taught programming and Google is rather useless to a self-help guy like me in this case as the key words are pretty ambiguous. I am writing a couple of functions that encode and decode a list of options into a Long so they can easily be passed around the application, you know this kind of thing: 1 - Apple 2 - Orange 4 - Banana 8 - Plum etc. In this case the number 11 would represent Apple, Orange & Plum. I've got it working but I see this used all the time so assume there is a common name for the technique, and no doubt all sorts of best practice and clever algorithms that are at the moment just out of my reach.

    Read the article

< Previous Page | 94 95 96 97 98 99 100 101 102 103 104 105  | Next Page >