Search Results

Search found 5655 results on 227 pages for 'stl algorithm'.

Page 95/227 | < Previous Page | 91 92 93 94 95 96 97 98 99 100 101 102  | Next Page >

  • Algorithm detect repeating/similiar strings in a corpus of data -- say email subjects, in Python

    - by RizwanK
    I'm downloading a long list of my email subject lines , with the intent of finding email lists that I was a member of years ago, and would want to purge them from my Gmail account (which is getting pretty slow.) I'm specifically thinking of newsletters that often come from the same address, and repeat the product/service/group's name in the subject. I'm aware that I could search/sort by the common occurrence of items from a particular email address (and I intend to), but I'd like to correlate that data with repeating subject lines.... Now, many subject lines would fail a string match, but "Google Friends : Our latest news" "Google Friends : What we're doing today" are more similar to each other than a random subject line, as is: "Virgin Airlines has a great sale today" "Take a flight with Virgin Airlines" So -- how can I start to automagically extract trends/examples of strings that may be more similar. Approaches I've considered and discarded ('because there must be some better way'): Extracting all the possible substrings and ordering them by how often they show up, and manually selecting relevant ones Stripping off the first word or two and then count the occurrence of each sub string Comparing Levenshtein distance between entries Some sort of string similarity index ... Most of these were rejected for massive inefficiency or likelyhood of a vast amount of manual intervention required. I guess I need some sort of fuzzy string matching..? In the end, I can think of kludgy ways of doing this, but I'm looking for something more generic so I've added to my set of tools rather than special casing for this data set. After this, I'd be matching the occurring of particular subject strings with 'From' addresses - I'm not sure if there's a good way of building a data structure that represents how likely/not two messages are part of the 'same email list' or by filtering all my email subjects/from addresses into pools of likely 'related' emails and not -- but that's a problem to solve after this one. Any guidance would be appreciated.

    Read the article

  • Better way to summarize data about stop times?

    - by Vimvq1987
    This question is close to this: http://stackoverflow.com/questions/2947963/find-the-period-of-over-speed Here's my table: Longtitude Latitude Velocity Time 102 401 40 2010-06-01 10:22:34.000 103 403 50 2010-06-01 10:40:00.000 104 405 0 2010-06-01 11:00:03.000 104 405 0 2010-06-01 11:10:05.000 105 406 35 2010-06-01 11:15:30.000 106 403 60 2010-06-01 11:20:00.000 108 404 70 2010-06-01 11:30:05.000 109 405 0 2010-06-01 11:35:00.000 109 405 0 2010-06-01 11:40:00.000 105 407 40 2010-06-01 11:50:00.000 104 406 30 2010-06-01 12:00:00.000 101 409 50 2010-06-01 12:05:30.000 104 405 0 2010-06-01 11:05:30.000 I want to summarize times when vehicle had stopped (velocity = 0), include: it had stopped since "when" to "when" in how much minutes, how many times it stopped and how much time it stopped. I wrote this query to do it: select longtitude, latitude, MIN(time), MAX(time), DATEDIFF(minute, MIN(Time), MAX(time)) as Timespan from table_1 where velocity = 0 group by longtitude,latitude select DATEDIFF(minute, MIN(Time), MAX(time)) as minute into #temp3 from table_1 where velocity = 0 group by longtitude,latitude select COUNT(*) as [number]from #temp select SUM(minute) as [totaltime] from #temp3 drop table #temp This query return: longtitude latitude (No column name) (No column name) Timespan 104 405 2010-06-01 11:00:03.000 2010-06-01 11:10:05.000 10 109 405 2010-06-01 11:35:00.000 2010-06-01 11:40:00.000 5 number 2 totaltime 15 You can see, it works fine, but I really don't like the #temp table. Is there anyway to query this without use a temp table? Thank you.

    Read the article

  • Byte-Pairing for data compression

    - by user1669533
    Question about Byte-Pairing for data compression. If byte pairing converts two byte values to a single byte value, splitting the file in half, then taking a gig file and recusing it 16 times shrinks it to 62,500,000. My question is, is byte-pairing really efficient? Is the creation of a 5,000,000 iteration loop, to be conservative, efficient? I would like some feed back on and some incisive opinions please. Dave, what I read was: "The US patent office no longer grants patents on perpetual motion machines, but has recently granted at least two patents on a mathematically impossible process: compression of truly random data." I was not inferring the Patent Office was actually considering what I am inquiring about. I was merely commenting on the notion of a "mathematically impossible process." If someone has, in some way created a method of having a "single" data byte as a placeholder of 8 individual bytes of data, that would be a consideration for a patent. Now, about the mathematically impossibility of an 8 to 1 compression method, it is not so much a mathematically impossibility, but a series of rules and conditions that can be created. As long as there is the rule of 8 or 16 bit representation of storing data on a medium, there are ways to manipulate data that mirrors current methods, or creation by a new way of thinking.

    Read the article

  • CMYK + CMYK = ? CMYK / 2 = ?

    - by Pete
    Suppose there are two colors defined in CMYK: color1 = 30, 40, 50, 60 color2 = 50, 60, 70, 80 If they were to be printed what values would the resulting color have? color_new = min(cyan1 + cyan2, 100), min(magenta1 + magenta2, 100), min(yellow1 + yellow2, 100), min(black1 + black2, 100)? Suppose there is a color defined in CMYK: color = 40, 30, 30, 100 It is possible to print a color at partial intensity, i.e. as a tint. What values would have a 50% tint of that color? color_new = cyan / 2, magenta / 2, yellow / 2, black / 2? I'm asking this to better understand the "tintTransform" function in PDF Reference 1.7, 4.5.5 Special Color Spaces, DeviceN Color Spaces Update: To better clarify: I'm not entirely concerned with human perception or how the CMYK dyies react to the paper. If someone specifies 90% tint which, when printed, looks like full intensity colorant, that's ok. In other words, if I asking how to compute 50% of cmyk(40, 30, 30, 100) I'm asking how to compute the new values, regardless of whether the result looks half-dark or not. Update 2: I'm confused now. I checked this in InDesign and Acrobat. For example Pantone 3005 has CMYK 100, 34, 0, 2, and its 25% tint has CMYK 25, 8.5, 0, 0.5. Does it mean I can "monkey around in a linear way"?

    Read the article

  • Detecting one point's location compared to two other points.

    - by WizardOfOdds
    Hi all, you can check my profile, this is not homework. I've got an interesting little problem to solve in a very real software and I'm looking for an easy way to solve it. I've got two fixed points on screen (they're fixed, but I don't know beforehand their position) that are not at the same location. These two fixed points form an imaginary line. Now I've got a third point that is "on one side" of that line (it cannot be on the line). The user can grab the point (the user actually grab an object, whose I track by its center, which is the point I'm interested in) and drag it. But it cannot "cross" the imaginary line. What is the easiest way to detect if the user is crossing the imaginary line? Example: a c / / (c cannot be dragged here) / b Or: c b -------------- a (c cannot be dragged here) So what is an easy to detect if c is staying on the correct "side" of the line (I draw segments here, but it really can be thought of as a line). One way to detect this is to take the destination point d and see if segment (c,d) intersects with line (a,b), but isn't there an easier way? Can't I just do some 2D dot-product magic here and have basically a one or two liner solving my issue?

    Read the article

  • How do I detect a loop in this linked list?

    - by jjujuma
    Say you have a linked list structure in Java. It's made up of Nodes: class Node { Node next; // some user data } and each Node points to the next node, except for the last Node, which has null for next. Say there is a possibility that the list can contain a loop - i.e. the final Node, instead of having a null, has a reference to one of the nodes in the list which came before it. What's the best way of writing boolean hasLoop(Node first) which would return true if the given Node is the first of a list with a loop, and false otherwise? How could you write so that it takes a finite amount of space and a reasonable amount of time?

    Read the article

  • Take the intersection of an arbitrary number of lists in python

    - by thepandaatemyface
    Suppose I have a list of lists of elements which are all the same (i'll use ints in this example) [range(100)[::4], range(100)[::3], range(100)[::2], range(100)[::1]] What would be a nice and/or efficient way to take the intersection of these lists (so you would get every element that is in each of the lists)? For the example that would be: [0, 12, 24, 36, 48, 60, 72, 84, 96]

    Read the article

  • Mapping Hilbert values to 3D points

    - by Alexander Gladysh
    I have a set of Hilbert values (length from the start of the Hilbert curve to the given point). What is the best way to convert these values to 3D points? Original Hilbert curve was not in 3D, so I guess I have to pick by myself the Hilbert curve rank I need. I do have total curve length though (that is, the maximum value in the set). Perhaps there is an existing implementation? Some library that would allow me to work with Hilbert curve / values? Language does not matter much.

    Read the article

  • Android Compass orientation on unreliable (Low pass filter)

    - by madsleejensen
    Hi all Im creating an application where i need to position a ImageView depending on the Orientation of the device. I use the values from a MagneticField and Accelerometer Sensors to calculate the device orientation with SensorManager.getRotationMatrix(rotationMatrix, null, accelerometerValues, magneticFieldValues) SensorManager.getOrientation(rotationMatrix, values); double degrees = Math.toDegrees(values[0]); My problem is that the positioning of the ImageView is very sensitive to changes in the orientation. Making the imageview constantly jumping around the screen. (because the degrees change) I read that this can be because my device is close to things that can affect the magneticfield readings. But this is not the only reason it seems. I tried downloading some applications and found that the "3D compass" application remains extremely steady in its readings, i would like the same behavior in my application. I read that i can tweak the "noise" of my readings by adding a "Low pass filter", but i have no idea how to implement this (because of my lack of Math). Im hoping someone can help me creating a more steady reading on my device, Where a little movement to the device wont affect the current orientation. Right now i do a small if (Math.abs(lastReadingDegrees - newReadingDegrees) > 1) { updatePosition() } To filter abit of the noise. But its not working very well :)

    Read the article

  • How to notice unusual news activity

    - by ??iu
    Suppose you were able keep track of the news mentions of different entities, like say "Steve Jobs" and "Steve Ballmer". What are ways that could you tell whether the amount of mentions per entity per a given time period was unusual relative to their normal degree of frequency of appearance? I imagine that for a more popular person like Steve Jobs an increase of like 50% might be unusual (an increase of 1000 to 1500), while for a relatively unknown CEO an increase of 1000% for a given day could be possible (an increase of 2 to 200). If you didn't have a way of scaling that your unusualness index could be dominated by unheard-ofs getting their 15 minutes of fame.

    Read the article

  • Orbital equations, and power required to run them

    - by Adam Davis
    Due to a discussion on the SO IRC today, I'm curious about orbital mechanics, and The equations needed to solve orbital problems The computing power required to solve complex problems The question in particular is calculating when the Earth will plow into the Sun (or vice versa, depending on the frame of reference). I suspect that all the gravitational pulls within our solar system may need to be calculated, which makes me wonder what type of computer cluster is required, or can this be done on a single box? I don't have the experience to do a back of the napkin test here, but perhaps you do? Also, much thx to Gortok for the original inspiration (see comments).

    Read the article

  • Is this a variation of the traveling salesman problem?

    - by Ville Koskinen
    I'm interested in a function of two word lists, which would return an order agnostic edit distance between them. That is, the arguments would be two lists of (let's say space delimited) words and return value would be the minimum sum of the edit (or Levenshtein) distances of the words in the lists. Distance between "cat rat bat" and "rat bat cat" would be 0. Distance between "cat rat bat" and "fat had bad" would be the same as distance between "rat bat cat" and "had fat bad", 4. In the case the number of words in the lists are not the same, the shorter list would be padded with 0-length words. My intuition (which hasn't been nurtured with computer science classes) does not find any other solution than to use brute force: |had|fat|bad| a solution ---+---+---+---+ +---+---+---+ cat| 2 | 1 | 2 | | | 1 | | ---+---+---+---+ +---+---+---+ rat| 2 | 1 | 2 | | 3 | | | ---+---+---+---+ +---+---+---+ bat| 2 | 1 | 1 | | | | 4 | ---+---+---+---+ +---+---+---+ Starting from the first row, pick a column and go to the next rows without ever revisiting a column you have already visited. Do this over and over again until you've tried all combinations. To me this sounds a bit like the traveling salesman problem. Is it, and how would you solve my particular problem?

    Read the article

  • Storing a bucket of numbers in an efficient data structure

    - by BlitzKrieg
    I have a buckets of numbers e.g. - 1 to 4, 5 to 15, 16 to 21, 22 to 34,.... I have roughly 600,000 such buckets. The range of numbers that fall in each of the bucket varies. I need to store these buckets in a suitable data structure so that the lookups for a number is as fast as possible. So my question is what is the suitable data structure and a sorting mechanism for this type of problem. Thanks in advance

    Read the article

  • question about polynomial multiplication

    - by davit-datuashvili
    i know that horners method for polynomial pultiplication is faster but here i dont know what is happening here is code public class horner{ public static final int n=10; public static final int x=7; public static void main(String[] args){ //non fast version int a[]=new int[]{1,2,3,4,5,6,7,8,9,10}; int xi=1; int y=a[0]; for (int i=1;i<n;i++){ xi=x*xi; y=y+a[i]*xi; } System.out.println(y); //fast method int y1=a[n-1]; for (int i=n-2;i>=0;i--){ y1=x*y+a[i]; } System.out.println(y1); } } result of this two methods are not same result of first method is 462945547 and result of second method is -1054348465 please help

    Read the article

  • Find the element with the most "neighbors" in a sequence

    - by Bao An
    Assume that we have a sequence S with n elements <x1,x2,...,xn>. A pair of elements xi,xj are considered neighbors if |xi-xj|<d, with d a given distance between two neighbors. So how can find out the element that has most neighbors in the sequence? (A simply way is sorting the sequence and then calculating number of each element but it's time complexity is quite large): O(nlogn) May you please help me find a better way to reduce time complexity?

    Read the article

  • Generate 2D cross-section polygon from 3D mesh

    - by nornagon
    I'm writing a game which uses 3D models to draw a scene (top-down orthographic projection), but a 2D physics engine to calculate response to collisions, etc. I have a few 3D assets for which I'd like to be able to automatically generate a hitbox by 'slicing' the 3D mesh with the X-Y plane and creating a polygon from the resultant edges. Google is failing me on this one (and not much helpful material on SO either). Suggestions?

    Read the article

  • Calculating holidays

    - by Ralph Shillington
    A number of holidays move around from year to year. For example, in Canada Victoria day (aka the May two-four weekend) is the Monday before May 25th, or Thanksgiving is the 2nd Monday of October (in Canada). I've been using variations on this Linq query to get the date of a holiday for a given year: var year = 2011; var month = 10; var dow = DayOfWeek.Monday; var instance = 2; var day = (from d in Enumerable.Range(1,DateTime.DaysInMonth(year,month)) let sample = new DateTime(year,month,d) where sample.DayOfWeek == dow select sample).Skip(instance-1).Take(1); While this works, and is easy enough to understand, I can imagine there is a more elegant way of making this calculation versus this brute force approach. Of course this doesn't touch on holidays such as Easter and the many other lunar based dates.

    Read the article

  • Graph diffing and versioning tool

    - by hashable
    I am working with a team that edits large DAGs represented as single files. Currently we are unable to work with multiple users concurrently modifying the DAG. Is there a tool (somewhat like the Eclipse SVN plugin) that can do do revision control on the file (manage timestamps/revision stamps) to identify incoming/outgoing/conflicting changes (Node/Link insertion/deletion/modification) and merge changes just like programmers do with source code files? The system should be able to do dependency management also. E.g. an incoming Link must not be accepted when one of the two Nodes is absent. That is, it should not "break" the existing DAG by allowing partial updates. If there is a framework to do this using generic "Node" and "Link" interfaces? Note: I am aware of Protege and its plugins. They currently do not satisfy my requirements.

    Read the article

  • Lexographical sorting problem

    - by Shawn Mclean
    I'm doing a problem that says concatenate the words to generate the lexicographically lowest possible string. from a competition. Take for example this string: jibw ji jp bw jibw The actual output turns out to be: bw jibw jibw ji jp When I do sorting on this, I get: bw ji jibw jibw jp. Does this mean that this is not sorting? If it is sorting, does lexicographic sorting take into consideration pushing the shorter strings to the back or something? I've been doing some reading on lexigographical order and I dont see any point or scenarios on which this is used, do you have any?

    Read the article

  • Single dimension peak fitting

    - by bufferz
    I have a single dimensional array of floating point values (c# doubles FYI) and I need to find the "peak" of the values ... as if graphed. I can't just take the highest value, as the peak is actually a plateau that has small fluctuations. This plateau is in the middle of a bunch of noise. I'm looking find a solution that would give me the center of this plateau. An example array might look like this: 1,2,1,1,2,1,3,2,4,4,4,5,6,8,8,8,8,7,8,7,9,7,5,4,4,3,3,2,2,1,1,1,1,1,2,1,1,1,1 where the peak is somewhere in the bolded section. Any ideas?

    Read the article

  • flood fill algorithm

    - by user335593
    i want to implement the flood fill algorthm...so that when i get the x and y co-od of a point...it should start flooding from that point and fill till it finds a boundary but it is not filling the entire region...say a pentagon this is the code i am using void setpixel(struct fill fillcolor,int x,int y) { glColor3f(fillcolor.r,fillcolor.g,fillcolor.b); glBegin(GL_POINTS); glVertex2i(x,y); glEnd(); glFlush(); } struct fill getpixcol(int x,int y) { struct fill gotpixel; glReadPixels(x,y,1,1,GL_RGB,GL_UNSIGNED_BYTE,pick_col); gotpixel.r =(float) pick_col[0]/255.0; gotpixel.g =(float) pick_col[1]/255.0; gotpixel.b =(float) pick_col[2]/255.0; return(gotpixel); } void floodFill(int x, int y,struct fill fillcolor,struct fill boundarycolor) { struct fill tmp; // if ((x < 0) || (x >= 500)) return; // if ((y < 0) || (y >= 500)) return; tmp=getpixcol(x,y); while (tmp.r!=boundarycolor.r && tmp.g!=boundarycolor.g && tmp.b!=boundarycolor.b) { setpixel(fillcolor,x,y); setpixel(fillcolor,x+1,y); setpixel(fillcolor,x,y+1); setpixel(fillcolor,x,y-1); setpixel(fillcolor,x-1,y); floodFill(x-1,y+1,fillcolor,boundarycolor); floodFill(x-1,y,fillcolor,boundarycolor); floodFill(x-1,y-1,fillcolor,boundarycolor); floodFill(x,y+1,fillcolor,boundarycolor); floodFill(x,y-1,fillcolor,boundarycolor); floodFill(x+1,y+1,fillcolor,boundarycolor); floodFill(x+1,y,fillcolor,boundarycolor); floodFill(x+1,y-1,fillcolor,boundarycolor); } }

    Read the article

  • Point data structure for a sketching application

    - by bebraw
    I am currently developing a little sketching application based on HTML5 Canvas element. There is one particular problem I haven't yet managed to find a proper solution for. The idea is that the user will be able to manipulate existing stroke data (points) quite freely. This includes pushing point data around (ie. magnet tool) and manipulating it at whim otherwise (ie. altering color). Note that the current brush engine is able to shade by taking existing stroke data in count. It's a quick and dirty solution as it just iterates the points in the current stroke and checks them against a distance rule. Now the problem is how to do this in a nice manner. It is extremely important to be able to perform efficient queries that return all points within given canvas coordinate and radius. Other features, such as space usage, should be secondary to this. I don't mind doing some extra processing between strokes while the user is not painting. Any pointers are welcome. :)

    Read the article

< Previous Page | 91 92 93 94 95 96 97 98 99 100 101 102  | Next Page >