memory efficient - Page 509

Random Pairings that don't Repeat

- by Andrew Robinson

This little project / problem came out of left field for me. Hoping someone can help me here. I have some rough ideas but I am sure (or at least I hope) a simple, fairly efficient solution exists. Thanks in advance.... pseudo code is fine. I generally work in .NET / C# if that sheds any light on your solution. Given: A pool of n individuals that will be meeting on a regular basis. I need to form pairs that have not previously meet. The pool of individuals will slowly change over time. For the purposes of pairing, (A & B) and (B & A) constitute the same pair. The history of previous pairings is maintained. For the purpose of the problem, assume an even number of individuals. For each meeting (collection of pairs) and individual will only pair up once. Is there an algorithm that will allow us to form these pairs? Ideally something better than just ordering the pairs in a random order, generating pairings and then checking against the history of previous pairings. In general, randomness within the pairing is ok.

Read the article

efficiently determining if a polynomial has a root in the interval [0,T]

- by user168715

I have polynomials of nontrivial degree (4+) and need to robustly and efficiently determine whether or not they have a root in the interval [0,T]. The precise location or number of roots don't concern me, I just need to know if there is at least one. Right now I'm using interval arithmetic as a quick check to see if I can prove that no roots can exist. If I can't, I'm using Jenkins-Traub to solve for all of the polynomial roots. This is obviously inefficient since it's checking for all real roots and finding their exact positions, information I don't end up needing. Is there a standard algorithm I should be using? If not, are there any other efficient checks I could do before doing a full Jenkins-Traub solve for all roots? For example, one optimization I could do is to check if my polynomial f(t) has the same sign at 0 and T. If not, there is obviously a root in the interval. If so, I can solve for the roots of f'(t) and evaluate f at all roots of f' in the interval [0,T]. f(t) has no root in that interval if and only if all of these evaluations have the same sign as f(0) and f(T). This reduces the degree of the polynomial I have to root-find by one. Not a huge optimization, but perhaps better than nothing.

Read the article

How toget a list of "fastest miles" from a set of GPS Points

- by santiagobasulto

I'm trying to solve a weird problem. Maybe you guys know of some algorithm that takes care of this. I have data for a cargo freight truck and want to extract some data. Suppose I've got a list of sorted points that I get from the GPS. That's the route for that truck: [ { "lng": "-111.5373066", "lat": "40.7231711", "time": "1970-01-01T00:00:04Z", "elev": "1942.1789265256325" }, { "lng": "-111.5372056", "lat": "40.7228762", "time": "1970-01-01T00:00:07Z", "elev": "1942.109892409177" } ] Now, what I want to get is a list of the "fastest miles". I'll do an example: Given the points: A, B, C, D, E, F the distance from point A to point B is 1 mile, and the cargo took 10:32 minutes. From point B to point D i've got other mile, and the cargo took 10 minutes, etc. So, i need a list sorted by time. Similar to: B -> D: 10 A -> B: 10:32 D -> F: 11:02 Do you know any efficient algorithm that let me calculate that? Thank you all. PS: I'm using Python. EDIT: I've got the distance. I know how to calculate it and there are plenty of posts to do that. What I need is an algorithm to tokenize by mile and get speed from that. Having a distance function is not helpful enough: results = {} for point in points: aux_points = points.takeWhile(point>n) #This doesn't exist, just trying to be simple for aux_point in aux_points: d = distance(point, aux_point) if d == 1_MILE: time_elapsed = time(point, aux_point) results[time_elapsed] = (point, aux_point) I'm still doing some pretty inefficient calculations.

Read the article

What is a data structure for quickly finding non-empty intersections of a list of sets?

- by Andrey Fedorov

I have a set of N items, which are sets of integers, let's assume it's ordered and call it I[1..N]. Given a candidate set, I need to find the subset of I which have non-empty intersections with the candidate. So, for example, if: I = [{1,2}, {2,3}, {4,5}] I'm looking to define valid_items(items, candidate), such that: valid_items(I, {1}) == {1} valid_items(I, {2}) == {1, 2} valid_items(I, {3,4}) == {2, 3} I'm trying to optimize for one given set I and a variable candidate sets. Currently I am doing this by caching items_containing[n] = {the sets which contain n}. In the above example, that would be: items_containing = [{}, {1}, {1,2}, {2}, {3}, {3}] That is, 0 is contained in no items, 1 is contained in item 1, 2 is contained in itmes 1 and 2, 2 is contained in item 2, 3 is contained in item 2, and 4 and 5 are contained in item 3. That way, I can define valid_items(I, candidate) = union(items_containing[n] for n in candidate). Is there any more efficient data structure (of a reasonable size) for caching the result of this union? The obvious example of space 2^N is not acceptable, but N or N*log(N) would be.

Read the article

Alternative to Galileo GWS

- by Anton Gogolev

Please note that this Galileo is absolutely not related to Java. Galileo is basically a set of web services which can be used to book airline tickets. Originally, it was supposed to be used via Galileo Desktop, whereby operators would enter various commands to perform required operations. For example, SA*AZ610J20JULFCOJFK will "Display seat availability map for specified flight and class". Granted, humans can get used to it and be very efficient, but here comes a problem of integrating this with other systems. For that, folks at TravelPort basically slapped a SOAP interface to this system (which must have been written in COBOL or something), without even thinking about actually embracing XML. For example, it can contain <Ind1>N</Ind1> <Ind2>N</Ind2> <Ind3>N</Ind3> ... <Ind72>N</Ind72>  or, better yet <Text>P/RU/4xxx24528/RU/11MAY67/M/23DEC12/AxxxxxxV/MxxxM</Text> In light of this, my question is as follows: are there any sane airline tickets booking systems we can integrate with? Or are there companies which have products that can abstract away all this?

Read the article

Best way to randomly select columns from random rows of SQL results.

- by LesterDove

A search of SO yields many results describing how to select random rows of data from a database table. My requirement is a bit different, though, in that I'd like to select individual columns from across random rows in the most efficient/random/interesting way possible. To better illustrate: I have a large Customers table, and from that I'd like to generate a bunch of fictitious demo Customer records that aren't real people. I'm thinking of just querying randomly from the Customers table, and then randomly pairing FirstNames with LastNames, Address, City, State, etc. So if this is my real Customer data (simplified): FirstName LastName State ========================== Sally Simpson SD Will Warren WI Mike Malone MN Kelly Kline KS Then I'd generate several records that look like this: FirstName LastName State ========================== Sally Warren MN Kelly Malone SD Etc. My initial approach works, but it lacks the elegance that I'm hoping the final answer will provide. (I'm particularly unhappy with the repetitiveness of the subqueries, and the fact that this solution requires a known/fixed number of fields and therefore isn't reusable.) SELECT FirstName = (SELECT TOP 1 FirstName FROM Customer ORDER BY newid()), LastName= (SELECT TOP 1 LastNameFROM Customer ORDER BY newid()), State = (SELECT TOP 1 State FROM Customer ORDER BY newid()) Thanks!

Read the article

How to avoid multiple, unused has_many associations when using multiple models for the same entity (

- by mikep

Hello, I'm looking for a nice, Ruby/Rails-esque solution for something. I'm trying to split up some data using multiple tables, rather than just using one gigantic table. My reasoning is pretty much to try and avoid the performance drop that would come with having a big table. So, rather than have one table called books, I have multiple tables: books1, books2, books3, etc. (I know that I could use a partition, but, for now, I've decided to go the 'multiple tables' route.) Each user has their books placed into a specific table. The actual book table is chosen when the user is created, and all of their books go into the same table. The goal is to try and keep each table pretty much even -- but that's a different issue. One thing I don't particularly want to have is a bunch of unused associations in the User class. Right now, it looks like I'd have to do the following: class User < ActiveRecord::Base has_many :books1, :books2, :books3, :books4, :books5 end class Books1 < ActiveRecord::Base belongs_to :user end class Books2 < ActiveRecord::Base belongs_to :user end First off, for each specific user, only one of the book tables would be usable/applicable, since all of a user's books are stored in the same table. So, only one of the associations would be in use at any time and any other has_many :bookX association that was loaded would be a waste. I don't really know Ruby/Rails does internally with all of those has_many associations though, so maybe it's not so bad. But right now I'm thinking that it's really wasteful, and that there may just be a better, more efficient way of doing this. Is there's some sort of special Ruby/Rails methodology that could be applied here to avoid having to have all of those has_many associations? Also, does anyone have any advice on how to abstract the fact that there's multiple book tables behind a single books model/class?

Read the article

C# String.Replace with a start/index (Added my (slow) implementation)

- by Chris T

I'd like an efficient method that would work something like this EDIT: Sorry I didn't put what I'd tried before. I updated the example now. // Method signature, Only replaces first instance or how many are specified in max public int MyReplace(ref string source,string org, string replace, int start, int max) { int ret = 0; int len = replace.Length; int olen = org.Length; for(int i = 0; i < max; i++) { // Find the next instance of the search string int x = source.IndexOf(org, ret + olen); if(x > ret) ret = x; else break; // Insert the replacement source = source.Insert(x, replace); // And remove the original source = source.Remove(x + len, olen); // removes original string } return ret; } string source = "The cat can fly but only if he is the cat in the hat"; int i = MyReplace(ref source,"cat", "giraffe", 8, 1); // Results in the string "The cat can fly but only if he is the giraffe in the hat" // i contains the index of the first letter of "giraffe" in the new string The only reason I'm asking is because my implementation I'd imagine getting slow with 1,000s of replaces.

Read the article

Automatic tracking algorithm

- by nico

Hi everyone, I'm trying to write a simple tracking routine to track some points on a movie. Essentially I have a series of 100-frames-long movies, showing some bright spots on dark background. I have ~100-150 spots per frame, and they move over the course of the movie. I would like to track them, so I'm looking for some efficient (but possibly not overkilling to implement) routine to do that. A few more infos: the spots are a few (es. 5x5) pixels in size the movement are not big. A spot generally does not move more than 5-10 pixels from its original position. The movements are generally smooth. the "shape" of these spots is generally fixed, they don't grow or shrink BUT they become less bright as the movie progresses. the spots don't move in a particular direction. They can move right and then left and then right again the user will select a region around each spot and then this region will be tracked, so I do not need to automatically find the points. As the videos are b/w, I though I should rely on brigthness. For instance I thought I could move around the region and calculate the correlation of the region's area in the previous frame with that in the various positions in the next frame. I understand that this is a quite naïve solution, but do you think it may work? Does anyone know specific algorithms that do this? It doesn't need to be superfast, as long as it is accurate I'm happy. Thank you nico

Read the article

Reversing permutation of an array in Java efficiently

- by HansDampf

Okay, here is my problem: Im implementing an algorithm in Java and part of it will be following: The Question is to how to do what I will explain now in an efficient way. given: array a of length n integer array perm, which is a permutation of [1..n] now I want to shuffle the array a, using the order determined by array perm, i.e. a=[a,b,c,d], perm=[2,3,4,1] ------ shuffledA[b,c,d,a], I figured out I can do that by iterating over the array with: shuffledA[i]=a[perm[i-1]], (-1 because the permutation indexes in perm start with 1 not 0) Now I want to do some operations on shuffledA... And now I want to do the reverse the shuffle operation. This is where I am not sure how to do it. Note that a can hold an item more than once, i.e. a=[a,a,a,a] If that was not the case, I could iterate perm, and find the corresponding indexes to the values. Now I thought that using a Hashmap instead of the the perm array will help. But I am not sure if this is the best way to do.

Read the article

Finding Common Phrases in MS SQL TEXT Column

- by regex

Hello All, Short Desc: I'm curious to see if I can use SQL Analysis services or some other MS SQL service to mine some data for me that will show commonalities between SQL TEXT fields in a dataset. Long Desc I am looking at a subset of data that consists of about 10,000 rows of TEXT blobs which are used as a notes column in a issue tracking (ticketing) software. I would like to use something out of the box (without having to build something) that might be able to parse through all of the rows and find commonly used byte sequences in the "Notes" column. In other words, I want to find commonly used phrases (two to three word phrases, so 9 - 20 character sections of the TEXT blob). This will help me better determine if associate's notes contain similar phrases (troubleshooting techniques) that we could standardize in our troubleshooting process flow. Closing Note I'd really rather not build an application to do this as my method will probably not be the most efficient way to do it. Hopefully all this makes sense. Please let me know in the comments if anything needs clarification. Thanks in advance for your help.

Read the article

Enumerate all paths in a weighted graph from A to B where path length is between C1 and C2

- by awmross

Given two points A and B in a weighted graph, find all paths from A to B where the length of the path is between C1 and C2. Ideally, each vertex should only be visited once, although this is not a hard requirement. I supose I could use a heuristic to sort the results of the algorithm to weed out "silly" paths (e.g. a path that just visits the same two nodes over and over again) I can think of simple brute force algorithms, but are there any more sophisticed algorithms that will make this more efficient? I can imagine as the graph grows this could become expensive. In the application I am developing, A & B are actually the same point (i.e. the path must return to the start), if that makes any difference. Note that this is an engineering problem, not a computer science problem, so I can use an algorithm that is fast but not necessarily 100% accurate. i.e. it is ok if it returns most of the possible paths, or if most of the paths returned are within the given length range.

Read the article

mysql subselect alternative

- by Arnold

Hi, Lets say I am analyzing how high school sports records affect school attendance. So I have a table in which each row corresponds to a high school basketball game. Each game has an away team id and a home team id (FK to another "team table") and a home score and an away score and a date. I am writing a query that matches attendance with this seasons basketball games. My sample output will be (#_students_missed_class, day_of_game, home_team, away_team, home_team_wins_this_season, away_team_wins_this_season) I now want to add how each team did the previous season to my analysis. Well, I have their previous season stored in the game table but i should be able to accomplish that with a subselect. So in my main select statement I add the subselect: SELECT COUNT(*) FROM game_table WHERE game_table.date BETWEEN 'start of previous season' AND 'end of previous season' AND ( (game_table.home_team = team_table.id AND game_table.home_score > game_table.away_score) OR (game_table.away_team = team_table.id AND game_table.away_score > game_table.home_score)) In this case team-table.id refers to the id of the home_team so I now have all their wins calculated from the previous year. This method of calculation is neither time nor resource intensive. The Explain SQL shows that I have ALL in the Type field and I am not using a Key and the query times out. I'm not sure how I can accomplish a more efficient query with a subselect. It seems proposterously inefficient to have to write 4 of these queries (for home wins, home losses, away wins, away losses). I am sure this could be more lucid. I'll absolutely add color tomorrow if anyone has questions

Read the article

Best way to randomly select rows per column in SQL Server

- by LesterDove

A search of SO yields many results describing how to select random rows of data from a database table. My requirement is a bit different, though, in that I'd like to select individual columns from across random rows in the most efficient/random/interesting way possible. To better illustrate: I have a large Customers table, and from that I'd like to generate a bunch of fictitious demo Customer records that aren't real people. I'm thinking of just querying randomly from the Customers table, and then randomly pairing FirstNames with LastNames, Address, City, State, etc. So if this is my real Customer data (simplified): FirstName LastName State ========================== Sally Simpson SD Will Warren WI Mike Malone MN Kelly Kline KS Then I'd generate several records that look like this: FirstName LastName State ========================== Sally Warren MN Kelly Malone SD Etc. My initial approach works, but it lacks the elegance that I'm hoping the final answer will provide. (I'm particularly unhappy with the repetitiveness of the subqueries, and the fact that this solution requires a known/fixed number of fields and therefore isn't reusable.) SELECT FirstName = (SELECT TOP 1 FirstName FROM Customer ORDER BY newid()), LastName= (SELECT TOP 1 LastNameFROM Customer ORDER BY newid()), State = (SELECT TOP 1 State FROM Customer ORDER BY newid()) Thanks!

Read the article

Running an existing LINQ query against a dynamic object (DataTable like)

- by TomTom

Hello, I am working on a generic OData provider to go against a custom data provider that we have here. Thsi is fully dynamic in that I query the data provider for the table it knows. I have a basic storage structure in place so far based on the OData sample code. My problem is: OData supports queries and expects me to hand in an IQueryable implementation. On the lowe rside, I dont have any query support. Not a joke - the provider returns tables and the WHERE clause is not supported. Performance is not an issue here - the tables are small. It is ok to sort them in the OData provider. My main problem is this. I submit a SQL statement to get out the data of a table. The result is some sort of ADO.NET data reader here. I need to expose an IQueryable implementation for this data to potentially allow later filtering. Any ide ahow to best touch that? .NET 3.5 only (no 4.0 planned for some time). I was seriously thinking of creating dynamic DTO classes for every table (emitting bytecode) so I can use standard LINQ. Right now I am using a dictionary per entry (not too efficient) but I see no real way to filter / sort based on them.

Read the article

Coding the Python way

- by Aaron Moodie

I've just spent the last half semester at Uni learning python. I've really enjoyed it, and was hoping for a few tips on how to write more 'pythonic' code. This is the __init__ class from a recent assignment I did. At the time I wrote it, I was trying to work out how I could re-write this using lambdas, or in a neater, more efficient way, but ran out of time. def __init__(self, dir): def _read_files(_, dir, files): for file in files: if file == "classes.txt": class_list = readtable(dir+"/"+file) for item in class_list: Enrol.class_info_dict[item[0]] = item[1:] if item[1] in Enrol.classes_dict: Enrol.classes_dict[item[1]].append(item[0]) else: Enrol.classes_dict[item[1]] = [item[0]] elif file == "subjects.txt": subject_list = readtable(dir+"/"+file) for item in subject_list: Enrol.subjects_dict[item[0]] = item[1] elif file == "venues.txt": venue_list = readtable(dir+"/"+file) for item in venue_list: Enrol.venues_dict[item[0]] = item[1:] elif file.endswith('.roll'): roll_list = readlines(dir+"/"+file) file = os.path.splitext(file)[0] Enrol.class_roll_dict[file] = roll_list for item in roll_list: if item in Enrol.enrolled_dict: Enrol.enrolled_dict[item].append(file) else: Enrol.enrolled_dict[item] = [file] try: os.path.walk(dir, _read_files, None) except: print "There was a problem reading the directory" As you can see, it's a little bulky. If anyone has the time or inclination, I'd really appreciate a few tips on some python best-practices. Thanks.

Read the article

PHP templating challenge (optimizing front-end templates)

- by Matt

Hey all, I'm trying to do some templating optimizations and I'm wondering if it is possible to do something like this: function table_with_lowercase($data) { $out = '<table>'; for ($i=0; $i < 3; $i++) { $out .= '<tr><td>'; $out .= strtolower($data); $out .= '</td></tr>'; } $out .= "</table>"; return $out; } NOTE: You do not know what $data is when you run this function. Results in: <table> <tr><td><?php echo strtolower($data) ?></td></tr> <tr><td><?php echo strtolower($data) ?></td></tr> <tr><td><?php echo strtolower($data) ?></td></tr> </table> General Case: Anything that can be evaluated (compiled) will be. Any time there is an unknown variable, the variable and the functions enclosing it, will be output in a string format. Here's one more example: function capitalize($str) { return ucwords(strtolower($str)); } If $str is "HI ALL" then the output is: Hi All If $str is unknown then the output is: <?php echo ucwords(strtolower($str)); ?> In this case it would be easier to just call the function (ie. <?php echo capitalize($str) ?> ), but the example before would allow you to precompile your PHP to make it more efficient

Read the article

PHP templating challenge (optimizing front-end templates)

- by Matt

Hey all, I'm trying to do some templating optimizations and I'm wondering if it is possible to do something like this: function table_with_lowercase($data) { $out = '<table>'; for ($i=0; $i < 3; $i++) { $out .= '<tr><td>'; $out .= strtolower($data); $out .= '</td></tr>'; } $out .= "</table>"; return $out; } NOTE: You do not know what $data is when you run this function. Results in: <table> <tr><td><?php echo strtolower($data) ?></td></tr> <tr><td><?php echo strtolower($data) ?></td></tr> <tr><td><?php echo strtolower($data) ?></td></tr> </table> General Case: Anything that can be evaluated (compiled) will be. Any time there is an unknown variable, the variable and the functions enclosing it, will be output in a string format. Here's one more example: function capitalize($str) { return ucwords(strtolower($str)); } If $str is "HI ALL" then the output is: Hi All If $str is unknown then the output is: <?php echo ucwords(strtolower($str)); ?> In this case it would be easier to just call the function (ie. <?php echo capitalize($str) ?> ), but the example before would allow you to precompile your PHP to make it more efficient

Read the article

Implementing Model-level caching

- by Byron

I was posting some comments in a related question about MVC caching and some questions about actual implementation came up. How does one implement a Model-level cache that works transparently without the developer needing to manually cache, yet still remains efficient? I would keep my caching responsibilities firmly within the model. It is none of the controller's or view's business where the model is getting data. All they care about is that when data is requested, data is provided - this is how the MVC paradigm is supposed to work. (Source: Post by Jarrod) The reason I am skeptical is because caching should usually not be done unless there is a real need, and shouldn't be done for things like search results. So somehow the Model itself has to know whether or not the SELECT statement being issued to it worthy of being cached. Wouldn't the Model have to be astronomically smart, and/or store statistics of what is being most often queried over a long period of time in order to accurately make a decision? And wouldn't the overhead of all this make the caching useless anyway? Also, how would you uniquely identify a query from another query (or more accurately, a resultset from another resultset)? What about if you're using prepared statements, with only the parameters changing according to user input? Another poster said this: I would suggest using the md5 hash of your query combined with a serialized version of your input arguments. This would require twice the number of serialization options. I was under the impression that serialization was quite expensive, and for large inputs this might be even worse than just re-querying. And is the minuscule chance of collision worth worrying about? Conceptually, caching in the Model seems like a good idea to me, but it seems in practicality the developer should have direct control over caching and write it into the controller. Thoughts/ideas? Edit: I'm using PHP and MySQL if that helps to narrow your focus.

Read the article

Passing data between ViewControllers versus doing local Fetch in each VC

- by Tofrizer

Hi All, I'm developing an iPhone app using Core Data and I'm looking for some general advice and recommendations on whether its acceptable to pass data between ViewControllers versus doing a local fetch in each ViewController as you navigate to it. Ordinarily I would say it all depends on various factors (e.g. performance etc) but the passing data approach is so prevalent in my app and I'm spooked by all the stories about Apple rejecting apps because of not conforming to their standard guidelines. So let me put another way -- is it non-standard to pass data between VC's? The reason I pass data so much is because each ViewController is just another view on to data present in my object model / graph. Once I have a handle on my first object in the first view controller (which I of course do have to fetch), I can use the existing object composition / relationships to drill down into the next level of detail into data and so I just pass these objects to the next VC. Separately, one possible downside with this passing-data-to-each-VC approach is I don't benefit from (what I perceive to be) the optimisation/benefits that NSFetchedResultsController provides in terms of efficient memory usage and section handling. My app is read-only but I do have one table with 5000 rows and I'm curious if I am missing out on NSFetchedResultsController benefits. Any thoughts on this as well? Can I somehow still benefit from NSFetchedResultsController goodness without having to do a full fetch (as I would have already passed in the data from my previous VC)? Thanks a lot.

Read the article

Java: split a List into two sub-Lists?

- by Chris Conway

What's the simplest, most standard, and/or most efficient way to split a List into two sub-Lists in Java? It's OK to mutate the original List, so no copying should be necessary. The method signature could be /** Split a list into two sublists. The original list will be modified to * have size i and will contain exactly the same elements at indices 0 * through i-1 as it had originally; the returned list will have size * len-i (where len is the size of the original list before the call) * and will have the same elements at indices 0 through len-(i+1) as * the original list had at indices i through len-1. */ <T> List<T> split(List<T> list, int i); [EDIT] List.subList returns a view on the original list, which becomes invalid if the original is modified. So split can't use subList unless it also dispenses with the original reference (or, as in Marc Novakowski's answer, uses subList but immediately copies the result).

Read the article

Help making userscript work in chrome

- by Vishal Shah

I've written a userscript for Gmail Pimp.my.Gmail & i'd like it to be compatible with Google Chrome too. Now i have tried a couple of things, to the best of my Javascript knowledge (which is very weak) & have been successful up-to a certain extent, though im not sure if it's the right way. Here's what i tried, to make it work in Chrome: The very first thing i found is that contentWindow.document doesn't work in chrome, so i tried contentDocument, which works. BUT i noticed one thing, checking the console messages in Firefox and Chrome, i saw that the script gets executed multiple times in Firefox whereas in Chrome it just executes once! So i had to abandon the window.addEventListener('load', init, false); line and replace it with window.setTimeout(init, 5000); and i'm not sure if this is a good idea. The other thing i tried is keeping the window.addEventListener('load', init, false); line and using window.setTimeout(init, 1000); inside init() in case the canvasframe is not found. So please do lemme know what would be the best way to make this script cross-browser compatible. Oh and im all ears for making this script better/efficient code wise (which is sure there is)

Read the article

php parsing speed optimization

- by Arnaud

I would like to add tooltip or generate link according to the element available in the database, for exemple if the html page printed is: to reboot your linux host in single-user mode you can ... I will use explode(" ", $row[page]) and the idea is now to lookup for every single word in the page to find out if they have a related referance in this exemple let's say i've got a table referance an one entry for reboot and one for linux reboot: restart a computeur linux: operating system now my output will look like (replaced < and by @) to @a href="ref/reboot"@reboot@/a@ your @a href="ref/linux"@linux@/a@ host in single-user mode you can ... Instead of have a static list generated when I saved the content, if I add more keyword in the future, then the text will become more interactive. My main concerne and question is how can I create a efficient enough process to do it ? Should I store all the db entry in an array and compare them ? Do an sql query for each word (seems to be crazy) Dump the table in a file and use a very long regex or a "grep -f pattern data" way of doing it? Or or or or I'm sure it must be a better way of doing it, just don't have a clue about it, or maybe this will be far too resource un-friendly and I should avoid doing such things. Cheers!

Read the article

Best way to detect similar email addresses?

- by Chris

I have a list of ~20,000 email addresses, some of which I know to be fraudulent attempts to get around a "1 per e-mail" limit. ([email protected], [email protected], [email protected], etc...). I want to find similar email addresses for evaluation. Currently I'm using a levenshtein algorithm to check each e-mail against the others in the list and report any with an edit distance of less than 2. However, this is painstakingly slow. Is there a more efficient approach? The test code I'm using now is: using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.IO; using System.Threading; namespace LevenshteinAnalyzer { class Program { const string INPUT_FILE = @"C:\Input.txt"; const string OUTPUT_FILE = @"C:\Output.txt"; static void Main(string[] args) { var inputWords = File.ReadAllLines(INPUT_FILE); var outputWords = new SortedSet<string>(); for (var i = 0; i < inputWords.Length; i++) { if (i % 100 == 0) Console.WriteLine("Processing record #" + i); var word1 = inputWords[i].ToLower(); for (var n = i + 1; n < inputWords.Length; n++) { if (i == n) continue; var word2 = inputWords[n].ToLower(); if (word1 == word2) continue; if (outputWords.Contains(word1)) continue; if (outputWords.Contains(word2)) continue; var distance = LevenshteinAlgorithm.Compute(word1, word2); if (distance <= 2) { outputWords.Add(word1); outputWords.Add(word2); } } } File.WriteAllLines(OUTPUT_FILE, outputWords.ToArray()); Console.WriteLine("Found {0} words", outputWords.Count); } } }

Read the article

How to do this Python / MySQL manipulation (match) more efficiently?

- by NJTechie

Following is my data : Company Table : ID Company Address City State Zip Phone 1 ABC 123 Oak St Philly PA 17542 7329878901 2 CDE 111 Joe St Newark NJ 08654 3 GHI 211 Foe St Brick NJ 07740 7321178901 4 JAK 777 Wall Ocean NJ 07764 7322278901 5 KLE 87 Ilk St Plains NY 07654 7376578901 6 AB 1 W.House SField PA 87656 7329878901 Branch Office Table : ID Address City State Zip Phone 1 323 Alk St Philly PA 17542 7329832221 1 171 Joe St Newark NJ 08654 3 287 Foe St Brick NJ 07740 7321178901 3 700 Wall Ocean NJ 07764 7322278901 1 89 Blk St Surrey NY 07154 7376222901 File to be Matched (In MySQL): ID Company Address City State Zip Phone 1 ABC 123 Oak St Philly PA 17542 7329878901 2 AB 171 Joe St Newark NJ 08654 3 GHI 211 Foe St Brick NJ 07740 7321178901 4 JAK 777 Wall Ocean NJ 07764 7322278901 5 K 87 Ilk St Plains NY 07654 7376578901 Resulting File : ID Company Address City State Zip Phone appendedID 1 ABC 123 Oak St Philly PA 17542 7329878901 [Original record, field always empty] 1 ABC 171 Joe St Newark NJ 08654 1 [Company Table] 1 ABC 323 Alk St Philly PA 17542 7329832221 1 [Branch Office Table] 1 AB 1 W.House SField PA 87656 7329878901 6 [Partial firm and State, Zip match] 2 CDE 111 Joe St Newark NJ 08654 3 GHI 211 Foe St Brick NJ 07740 7321178901 3 GHI 700 Wall Ocean NJ 07764 7322278901 3 3 GHI 287 Foe St Brick NJ 07740 7321178901 3 4 JAK 777 Wall Ocean NJ 07764 7322278901 5 KLE 87 Ilk St Surrey NY 07654 7376578901 5 KLE 89 Blk St Surrey NY 07154 7376222901 5 Requirement : 1) I have to match each firm on the 'File to be Matched' to that of Company and Branch Office tables (MySQL). 2) If there are multiple exact/partial matches, then the ID from Company, Branch Office table is inserted as a new row in the resulting file. 3) Not all the firms will be matched perfectly, in that case I have to match on partial Company names (like 5/8th of the company name) and any of the address fields and insert them in the resulting file. Please help me out in the most efficient solution for this problem.

Search Results

Search found 15637 results on 626 pages for 'memory efficient'.

Page 509/626 | < Previous Page | 505 506 507 508 509 510 511 512 513 514 515 516 | Next Page >

- by Andrew Robinson

- by user168715

- by santiagobasulto

- by Andrey Fedorov

- by Anton Gogolev

- by LesterDove

- by mikep

- by Chris T

- by nico

- by HansDampf

- by regex

- by awmross

- by Arnold

- by LesterDove

- by TomTom

- by Aaron Moodie

- by Matt

- by Matt

- by Byron

- by Tofrizer

- by Chris Conway

- by Vishal Shah

- by Arnaud

- by Chris

- by NJTechie

< Previous Page | 505 506 507 508 509 510 511 512 513 514 515 516 | Next Page >