faster - Page 159 - Developer IT

How to check user input for correct formatting

- by Arcadian

This is what i've come up with so far private void CheckFormatting() { StringReader objReaderf = new StringReader(txtInput.Text); List<String> formatTextList = new List<String>(); do { formatTextList.Add(objReaderf.ReadLine()); } while (objReaderf.Peek() != -1); objReaderf.Close(); for (int i = 0; i < formatTextList.Count; i++) { } } What it is designed to do is check that the user has entered their information in this format Gxx:xx:xx:xx JGxx where "x" can be any integer. As you can see the user inputs their data into a multi-line textbox. i then take that data and enter it into a list. the next part is where i'm stuck. i create a for loop to go through the list line by line, but i guess i will also need to go through each line character by character. How do i do this? or is there a faster way of doing it? thanks in advance

Read the article

Optimize Duplicate Detection

- by Dave Jarvis

Background This is an optimization problem. Oracle Forms XML files have elements such as: <Trigger TriggerName="name" TriggerText="SELECT * FROM DUAL" ... /> Where the TriggerText is arbitrary SQL code. Each SQL statement has been extracted into uniquely named files such as: sql/module=DIAL_ACCESS+trigger=KEY-LISTVAL+filename=d_access.fmb.sql sql/module=REP_PAT_SEEN+trigger=KEY-LISTVAL+filename=rep_pat_seen.fmb.sql I wrote a script to generate a list of exact duplicates using a brute force approach. Problem There are 37,497 files to compare against each other; it takes 8 minutes to compare one file against all the others. Logically, if A = B and A = C, then there is no need to check if B = C. So the problem is: how do you eliminate the redundant comparisons? The script will complete in approximately 208 days. Script Source Code The comparison script is as follows: #!/bin/bash echo Loading directory ... for i in $(find sql/ -type f -name \*.sql); do echo Comparing $i ... for j in $(find sql/ -type f -name \*.sql); do if [ "$i" = "$j" ]; then continue; fi # Case insensitive compare, ignore spaces diff -IEbwBaq $i $j > /dev/null # 0 = no difference (i.e., duplicate code) if [ $? = 0 ]; then echo $i :: $j >> clones.txt fi done done Question How would you optimize the script so that checking for cloned code is a few orders of magnitude faster? System Constraints Using a quad-core CPU with an SSD; trying to avoid using cloud services if possible. The system is a Windows-based machine with Cygwin installed -- algorithms or solutions in other languages are welcome. Thank you!

Read the article

Complexity of subset product

- by threenplusone

I have a set of numbers produced using the following formula with integers 0 < x < a. f(x) = f(x-1)^2 % a For example starting at 2 with a = 649. {2, 4, 16, 256, 636, 169, 5, 25, 649, 576, 137, ...} I am after a subset of these numbers that when multiplied together equals 1 mod N. I believe this problem by itself to be NP-complete (based on similaries to Subset-Sum problem). However starting with any integer (x) gives the same solution pattern. Eg. a = 649 {2, 4, 16, 256, 636, 169, 5, 25, 649, 576, 137, ...} = 16 * 5 * 576 = 1 % 649 {3, 9, 81, 71, 498, 86, 257, 500, 135, 53, 213, ...} = 81 * 257 * 53 = 1 % 649 {4, 16, 256, 636, 169, 5, 25, 649, 576, 137, 597, ...} = 256 * 25 * 137 = 1 % 649 I am wondering if this additional fact makes this problem solvable faster? Or if anyone has run into this problem previously or has any advice?

Read the article

Simple aggregating query very slow in PostgreSql, any way to improve?

- by Ash

HI I have a table which holds files and their types such as CREATE TABLE files ( id SERIAL PRIMARY KEY, name VARCHAR(255), filetype VARCHAR(255), ... ); and another table for holding file properties such as CREATE TABLE properties ( id SERIAL PRIMARY KEY, file_id INTEGER CONSTRAINT fk_files REFERENCES files(id), size INTEGER, ... // other property fields ); The file_id field has an index. The file table has around 800k lines, and the properties table around 200k (not all files necessarily have/need a properties). I want to do aggregating queries, for example find the average size and standard deviation for all file types. But it's very slow - around 70 seconds for the latter query. I understand it needs a sequential scan, but still it seems too much. Here's the query SELECT f.filetype, avg(size), stddev(size) FROM files as f, properties as pr WHERE f.id = pr.file_id GROUP BY f.filetype; and the explain HashAggregate (cost=140292.20..140293.94 rows=116 width=13) (actual time=74013.621..74013.954 rows=110 loops=1) -> Hash Join (cost=6780.19..138945.47 rows=179564 width=13) (actual time=1520.104..73156.531 rows=179499 loops=1) Hash Cond: (f.id = pr.file_id) -> Seq Scan on files f (cost=0.00..108365.41 rows=1140941 width=9) (actual time=0.998..62569.628 rows=805270 loops=1) -> Hash (cost=3658.64..3658.64 rows=179564 width=12) (actual time=1131.053..1131.053 rows=179499 loops=1) -> Seq Scan on properties pr (cost=0.00..3658.64 rows=179564 width=12) (actual time=0.753..557.171 rows=179574 loops=1) Total runtime: 74014.520 ms Any ideas why it is so slow/how to make it faster?

Read the article

Algorithm to determine if array contains n...n+m?

- by Kyle Cronin

I saw this question on Reddit, and there were no positive solutions presented, and I thought it would be a perfect question to ask here. This was in a thread about interview questions: Write a method that takes an int array of size m, and returns (True/False) if the array consists of the numbers n...n+m-1, all numbers in that range and only numbers in that range. The array is not guaranteed to be sorted. (For instance, {2,3,4} would return true. {1,3,1} would return false, {1,2,4} would return false. The problem I had with this one is that my interviewer kept asking me to optimize (faster O(n), less memory, etc), to the point where he claimed you could do it in one pass of the array using a constant amount of memory. Never figured that one out. Along with your solutions please indicate if they assume that the array contains unique items. Also indicate if your solution assumes the sequence starts at 1. (I've modified the question slightly to allow cases where it goes 2, 3, 4...) edit: I am now of the opinion that there does not exist a linear in time and constant in space algorithm that handles duplicates. Can anyone verify this? The duplicate problem boils down to testing to see if the array contains duplicates in O(n) time, O(1) space. If this can be done you can simply test first and if there are no duplicates run the algorithms posted. So can you test for dupes in O(n) time O(1) space?

Read the article

PHP shell_exec() - Run directly, or perform a cron (bash/php) and include MySQL layer?

- by Jimbo

Sorry if the title is vague - I wasn't quite sure how to word it! What I'm Doing I'm running a Linux command to output data into a variable, parse the data, and output it as an array. Array values will be displayed on a page using PHP, and this PHP page output is requested via AJAX every 10 seconds so, in effect, the data will be retrieved and displayed/updated every 10 seconds. There could be as many as 10,000 characters being parsed on every request, although this is usually much lower. Alternative Idea I want to know if there is a better* alternative method of retrieving this data every 10 seconds, as multiple users (<10) will be having this command executed automatically for them. A cronjob running on the server could execute either bash or php (which is faster?) to grab the data and store it in a MySQL database. Then, any AJAX calls to the PHP output would return values in the MySQL database rather than making a direct call to execute server code every 10 seconds. Why? I know there are security concerns with running execs directly from PHP, and (I hope this isn't micro-optimisation) I'm worried about CPU usage on the server. The server is running a sempron processor. Yes, they do still exist. Having this only execute when the user is on the page (idea #1) means that the server isn't running code that doesn't need to be run. However, is this slow and insecure? Just in case the type of linux command may be of assistance in determining it's efficiency: shell_exec("transmission-remote $host:$port --auth $username:$password -l"); I'm hoping that there are differences in efficiency and level of security with the two methods I have outlined above, and that this isn't just micro-micro-optimisation. If there are alternative methods that are better*, I'd love to learn about these! :)

Read the article

Running Boggle Solver takes over an hour to run. What is wrong with my code?

- by user1872912

So I am running a Boggle Solver in java on the NetBeans IDE. When I run it, i have to quit after 10 minutes or so because it will end up taking about 2 hour to run completely. Is there something wrong with my code or a way that will make is substantially faster? public void findWords(String word, int iLoc, int jLoc, ArrayList<JLabel> labelsUsed){ if(iLoc < 0 || iLoc >= 4 || jLoc < 0 || jLoc >= 4){ return; } if(labelsUsed.contains(jLabels[iLoc][jLoc])){ return; } word += jLabels[iLoc][jLoc].getText(); labelsUsed.add(jLabels[iLoc][jLoc]); if(word.length() >= 3 && wordsPossible.contains(word)){ wordsMade.add(word); } findWords(word, iLoc-1, jLoc, labelsUsed); findWords(word, iLoc+1, jLoc, labelsUsed); findWords(word, iLoc, jLoc-1, labelsUsed); findWords(word, iLoc, jLoc+1, labelsUsed); findWords(word, iLoc-1, jLoc+1, labelsUsed); findWords(word, iLoc-1, jLoc-1, labelsUsed); findWords(word, iLoc+1, jLoc-1, labelsUsed); findWords(word, iLoc+1, jLoc+1, labelsUsed); labelsUsed.remove(jLabels[iLoc][jLoc]); } here is where I call this method from: public void findWords(){ ArrayList <JLabel> labelsUsed = new ArrayList<JLabel>(); for(int i=0; i<jLabels.length; i++){ for(int j=0; j<jLabels[i].length; j++){ findWords(jLabels[i][j].getText(), i, j, labelsUsed); //System.out.println("Done"); } } } edit: BTW I am using a GUI and the letters on the board are displayed by using a JLabel.

Read the article

Need to sort 3 arrays by one key array

- by jeff6461

I am trying to get 3 arrays sorted by one key array in objective c for the iphone, here is a example to help out... Array 1 Array 2 Array 3 Array 4 1 15 21 7 3 12 8 9 6 7 8 0 2 3 4 8 When sorted i want this to look like Array 1 Array 2 Array 3 Array 4 1 15 21 7 2 3 4 8 3 12 8 9 6 7 8 0 So array 2,3,4 are moving with Array 1 when sorted. Currently i am using a bubble sort to do this but it lags so bad that it crashes by app. The code i am using to do this is int flag = 0; int i = 0; int temp = 0; do { flag=1; for(i = 0; i < distancenumber; i++) { if(distance[i] > distance[i+1]) { temp = distance[i]; distance[i]=distance[i + 1]; distance[i + 1]=temp; temp = FlowerarrayNumber[i]; FlowerarrayNumber[i] = FlowerarrayNumber[i+1]; FlowerarrayNumber[i + 1] = temp; temp = BeearrayNumber[i]; BeearrayNumber[i] = BeearrayNumber[i + 1]; BeearrayNumber[i + 1] = temp; flag=0; } } }while (flag==0); where distance number is the amount of elements in all of the arrays, distance is array 1 or my key array. and the other 2 are getting sorted. If anyone can help me get a merge sort(or something faster, it is running on a iPhone so it needs to be quick and light) to do this that would be great i cannot figure out how the recursion works in this method and so having a hard time to get the code to work. Any help would be greatly appreciated

Read the article

C Population Count of unsigned 64-bit integer with a maximum value of 15

- by BitTwiddler1011

I use a population count (hamming weight) function intensively in a windows c application and have to optimize it as much as possible in order to boost performance. More than half the cases where I use the function I only need to know the value to a maximum of 15. The software will run on a wide range of processors, both old and new. I already make use of the POPCNT instruction when Intel's SSE4.2 or AMD's SSE4a is present, but would like to optimize the software implementation (used as a fall back if no SSE4 is present) as much as possible. Currently I have the following software implementation of the function: inline int population_count64(unsigned __int64 w) { w -= (w 1) & 0x5555555555555555ULL; w = (w & 0x3333333333333333ULL) + ((w 2) & 0x3333333333333333ULL); w = (w + (w 4)) & 0x0f0f0f0f0f0f0f0fULL; return int(w * 0x0101010101010101ULL) 56; } So to summarize: (1) I would like to know if it is possible to optimize this for the case when I only want to know the value to a maximum of 15. (2) Is there a faster software implementation (for both Intel and AMD CPU's) than the function above?

Read the article

Custom punctuation function making script run over the php's 60s runtime limit

- by webmasters

I am importing allot of product data from an XML file (about 5000 products). When I run the script I can make it work in about 10-12 seconds. Now, when I add this punctuation function which makes sure each product description ends with a punctuation sign, the code runs until the php 60 seconds loadtime on my server but I'm not getting any errors. I have error reporting turned on. I just get a final error that the script could not load in 60 seconds. The question is, looking at this function, is it that resource consuming? What can I do to make it faster? function punctuation($string){ if(strlen($string) > 5){ // Get $last_char $desired_punctuation = array(".",",","?","!"); $last_char = substr($string, -1); // Check if $last_char is in the $desired_punctuation array if(!in_array($last_char, $desired_punctuation)){ // strip the $mytrim string and get only letters at the end; while(!preg_match("/^[a-zA-Z]$/", $last_char)){ $string = substr($string, 0, -1); $last_char = substr($string, -1); } // add "." to the string $string .= '.'; } } return $string; } If the function is ok, the long runtime must come from something else which I'll have to discover. I just want your input on this part.

Read the article

Java: multi-threaded maps: how do the implementations compare?

- by user346629

I'm looking for a good hash map implementation. Specifically, one that's good for creating a large number of maps, most of them small. So memory is an issue. It should be thread-safe (though losing the odd put might be an OK compromise in return for better performance), and fast for both get and put. And I'd also like the moon on a stick, please, with a side-order of justice. The options I know are: HashMap. Disastrously un-thread safe. ConcurrentHashMap. My first choice, but this has a hefty memory footprint - about 2k per instance. Collections.sychronizedMap(HashMap). That's working OK for me, but I'm sure there must be faster alternatives. Trove or Colt - I think neither of these are thread-safe, but perhaps the code could be adapted to be thread safe. Any others? Any advice on what beats what when? Any really good new hash map algorithms that Java could use an implementation of? Thanks in advance for your input!

Read the article

What is the fastest (to access) struct-like object in Python?

- by DNS

I'm optimizing some code whose main bottleneck is running through and accessing a very large list of struct-like objects. Currently I'm using namedtuples, for readability. But some quick benchmarking using 'timeit' shows that this is really the wrong way to go where performance is a factor: Named tuple with a, b, c: >>> timeit("z = a.c", "from __main__ import a") 0.38655471766332994 Class using __slots__, with a, b, c: >>> timeit("z = b.c", "from __main__ import b") 0.14527461047146062 Dictionary with keys a, b, c: >>> timeit("z = c['c']", "from __main__ import c") 0.11588272541098377 Tuple with three values, using a constant key: >>> timeit("z = d[2]", "from __main__ import d") 0.11106188992948773 List with three values, using a constant key: >>> timeit("z = e[2]", "from __main__ import e") 0.086038238242508669 Tuple with three values, using a local key: >>> timeit("z = d[key]", "from __main__ import d, key") 0.11187358437882722 List with three values, using a local key: >>> timeit("z = e[key]", "from __main__ import e, key") 0.088604143037173344 First of all, is there anything about these little timeit tests that would render them invalid? I ran each several times, to make sure no random system event had thrown them off, and the results were almost identical. It would appear that dictionaries offer the best balance between performance and readability, with classes coming in second. This is unfortunate, since, for my purposes, I also need the object to be sequence-like; hence my choice of namedtuple. Lists are substantially faster, but constant keys are unmaintainable; I'd have to create a bunch of index-constants, i.e. KEY_1 = 1, KEY_2 = 2, etc. which is also not ideal. Am I stuck with these choices, or is there an alternative that I've missed?

Read the article

Build OpenGL model in parallel?

- by Brendan Long

I have a program which draws some terrain and simulates water flowing over it (in a cheap and easy way). Updating the water was easy to parallelize using OpenMP, so I can do ~50 updates per second. The problem is that even with a small amounts of water, my draws per second are very very low (starts at 5 and drops to around 2 once there's a significant amount of water). It's not a problem with the video card because the terrain is more complicated and gets drawn so quickly that boost::timer tells me that I get infinity draws per second if I turn the water off. It may be related to memory bandwidth though (since I assume the model stays on the card and doesn't have to be transfered every time). What I'm concerned about is that on every draw, I'm calling glVertex3f() about a million times (max size is 450*600, 4 vertices each), and it's done entirely sequentially because Glut won't let me call anything in parallel. So.. is if there's some way of building the list in parallel and then passing it to OpenGL all at once? Or some other way of making it draw this faster? Am I using the wrong method (besides the obvious "use less vertices")?

Read the article

Logic: Best way to sample & count bytes of a 100MB+ file

- by Jami

Let's say I have this 170mb file (roughly 180 million bytes). What I need to do is to create a table that lists: all 4096 byte combinations found [column 'bytes'], and the number of times each byte combination appeared in it [column 'occurrences'] Assume two things: I can save data very fast, but I can update my saved data very slow. How should I sample the file and save the needed information? Here're some suggestions that are (extremely) slow: Go through each 4096 byte combinations in the file, save each data, but search the table first for existing combinations and update it's values. this is unbelievably slow Go through each 4096 byte combinations in the file, save until 1 million rows of data in a temporary table. Go through that table and fix the entries (combine repeating byte combinations), then copy to the big table. Repeat going through another 1 million rows of data and repeat the process. this is faster by a bit, but still unbelievably slow This is kind of like taking the statistics of the file. NOTE: I know that sampling the file can generate tons of data (around 22Gb from experience), and I know that any solution posted would take a bit of time to finish. I need the most efficient saving process

Read the article

jQuery - Wait until image loads before performing function

- by Steven

I'm trying to create a simple portfolio page. I have a list of thumbs and an image. When you click on a thumb, the image will change. When a thumbnail is clicked, I'd like to have the image fade out, wait until the image is loaded, then fade back in. The problem I have right now is that some of the images are pretty big, so it fades out, then fades back in immediately, sometimes while the image is still loading. I'd like to avoid using setTimeout, since sometimes an image will load faster or slower than the time I set. Here's my code: $(function() { $('img#image').attr("src", $('ul#thumbs li:first img').attr("src")); $('ul#thumbs li img').click(function() { $('img#image').fadeOut(700); var src = $(this).attr("src"); $('img#image').attr("src", src); $('img#image').fadeIn(700); }); }); <img id="image" src="" alt="" /> <ul id="thumbs"> <li><img src="/images/thumb1.png" /></li> <li><img src="/images/thumb2.png" /></li> <li><img src="/images/thumb3.png" /></li> </ul>

Read the article

Heap Algorithmic Issue

- by OberynMarDELL

I am having this algorithmic problem that I want to discuss about. Its not about find a solution but about optimization in terms of runtime. So here it is: Suppose we have a race court of Length L and a total of N cars that participate on the race. The race rules are simple. Once a car overtakes an other car the second car is eliminated from the race. The race ends when no more overtakes are possible to happen. The tricky part is that the k'th car has a starting point x[k] and a velocity v[k]. The points are given in an ascending order, but the velocities may differ. What I've done so far: Given that a car can get overtaken only by its previous, I calculated the time that it takes for each car to reach its next one t = (x[i] - x[i+1])/(v[i] - v[i+1]) and I insert these times onto a min heap in O(n log n). So in theory I have to pop the first element in O(logn), find its previous, pop it as well , update its time and insert it in the heap once more, much like a priority queue. My main problem is how I can access specific points of a heap in O(log n) or faster in order to keep the complexity in O(n log n) levels. This program should be written on Haskell so I would like to keep things simple as far as possible EDIT: I Forgot to write the actual point of the race. The goal is to find the order in which cars exit the game

Read the article

Parallelize or vectorize all-against-all operation on a large number of matrices?

- by reve_etrange

I have approximately 5,000 matrices with the same number of rows and varying numbers of columns (20 x ~200). Each of these matrices must be compared against every other in a dynamic programming algorithm. In this question, I asked how to perform the comparison quickly and was given an excellent answer involving a 2D convolution. Serially, iteratively applying that method, like so list = who('data_matrix_prefix*') H = cell(numel(list),numel(list)); for i=1:numel(list) for j=1:numel(list) if i ~= j eval([ 'H{i,j} = compare(' char(list(i)) ',' char(list(j)) ');']); end end end is fast for small subsets of the data (e.g. for 9 matrices, 9*9 - 9 = 72 calls are made in ~1 s). However, operating on all the data requires almost 25 million calls. I have also tried using deal() to make a cell array composed entirely of the next element in data, so I could use cellfun() in a single loop: # who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data. nextData = cell(k,1); for i=1:k [nextData{:}] = deal(data{i}); H{:,i} = cellfun(@compare,data,nextData,'UniformOutput',false); end Unfortunately, this is not really any faster, because all the time is in compare(). Both of these code examples seem ill-suited for parallelization. I'm having trouble figuring out how to make my variables sliced. compare() is totally vectorized; it uses matrix multiplication and conv2() exclusively (I am under the impression that all of these operations, including the cellfun(), should be multithreaded in MATLAB?). Does anyone see a (explicitly) parallelized solution or better vectorization of the problem?

Read the article

Planning a programming project by example (C# or C++)

- by Lunan

I am in the last year of undergraduate degree and i am stumped by the lack of example in c++ and c# large project in my university. All the mini project and assignment are based on text based database, which is so inefficient, and console display and command, which is frustrating. I want to develop a complete prototype of corporate software which deals in Inventory, Sales, Marketing, etc. Everything you would usually find in SAP. I am grateful if any of you could direct me to a books or article or sample program. Some of the question are : How to plan for this kind of programming? should i use the concept of 1 object(such as inventory) have its own process and program and have an integrator sit for all the program, or should i integrate it in 1 big program? How to build and address a database? i have little bit knowledge in database and i know SQL but i never address database in a program before. Database are table, and how do you suppose to represent a table in a OOP way? For development type, which is better PHP and C++ or C# and ASP.NET? I am planning to use Web Interface to set form and information, but using a background program to handle the compute. .NET is very much integrated and coding should be much faster, but i really wonder about performance if compared to PHP and C++ package thank you for the info

Read the article

A question of style/readability regarding the C# "using" statement

- by Charles

I'd like to know your opinion on a matter of coding style that I'm on the fence about. I realize there probably isn't a definitive answer, but I'd like to see if there is a strong preference in one direction or the other. I'm going through a solution adding using statements in quite a few places. Often I will come across something like so: { log = new log(); log.SomeProperty = something; // several of these log.Connection = new OracleConnection("..."); log.InsertData(); // this is where log.Connection will be used ... // do other stuff with log, but connection won't be used again } where log.Connection is an OracleConnection, which implements IDisposable. The neatnik in me wants to change it to: { log = new log(); using (OracleConnection connection = new OracleConnection("...")) { log.SomeProperty = something; log.Connection = conn; log.InsertData(); ... } } But the lover of brevity and getting-the-job-done-slightly-faster wants to do: { log = new log(); log.SomeProperty = something; using (log.Connection = new OracleConnection("...")) log.InsertData(); ... } For some reason I feel a bit dirty doing this. Do you consider this bad or not? If you think this is bad, why? If it's good, why?

Read the article

Speeding up a group by date query on a big table in postgres

- by zaius

I've got a table with around 20 million rows. For arguments sake, lets say there are two columns in the table - an id and a timestamp. I'm trying to get a count of the number of items per day. Here's what I have at the moment. SELECT DATE(timestamp) AS day, COUNT(*) FROM actions WHERE DATE(timestamp) >= '20100101' AND DATE(timestamp) < '20110101' GROUP BY day; Without any indices, this takes about a 30s to run on my machine. Here's the explain analyze output: GroupAggregate (cost=675462.78..676813.42 rows=46532 width=8) (actual time=24467.404..32417.643 rows=346 loops=1) -> Sort (cost=675462.78..675680.34 rows=87021 width=8) (actual time=24466.730..29071.438 rows=17321121 loops=1) Sort Key: (date("timestamp")) Sort Method: external merge Disk: 372496kB -> Seq Scan on actions (cost=0.00..667133.11 rows=87021 width=8) (actual time=1.981..12368.186 rows=17321121 loops=1) Filter: ((date("timestamp") >= '2010-01-01'::date) AND (date("timestamp") < '2011-01-01'::date)) Total runtime: 32447.762 ms Since I'm seeing a sequential scan, I tried to index on the date aggregate CREATE INDEX ON actions (DATE(timestamp)); Which cuts the speed by about 50%. HashAggregate (cost=796710.64..796716.19 rows=370 width=8) (actual time=17038.503..17038.590 rows=346 loops=1) -> Seq Scan on actions (cost=0.00..710202.27 rows=17301674 width=8) (actual time=1.745..12080.877 rows=17321121 loops=1) Filter: ((date("timestamp") >= '2010-01-01'::date) AND (date("timestamp") < '2011-01-01'::date)) Total runtime: 17038.663 ms I'm new to this whole query-optimization business, and I have no idea what to do next. Any clues how I could get this query running faster?

Read the article

JAVA : How to get the positions of all matches in a String?

- by user692704

I have a text document and a query (the query could be more than one word). I want to find the position of all occurrences of the query in the document. I thought of the documentText.indexOf(query) and using regular expression but I could not make it work. I end up with the following method: First, I have create a dataType called QueryOccurrence public class QueryOccurrence implements Serializable{ public QueryOccurrence(){} private int start; private int end; public QueryOccurrence(int nameStart,int nameEnd,String nameText){ start=nameStart; end=nameEnd; } public int getStart(){ return start; } public int getEnd(){ return end; } public void SetStart(int i){ start=i; } public void SetEnd(int i){ end=i; } } Then, I have used this datatype in the following method: public static List<QueryOccurrence>FindQueryPositions(String documentText, String query){ // Normalize do the following: lower case, trim, and remove punctuation String normalizedQuery = Normalize.Normalize(query); String normalizedDocument = Normalize.Normalize(documentText); String[] documentWords = normalizedDocument.split(" ");; String[] queryArray = normalizedQuery.split(" "); List<QueryOccurrence> foundQueries = new ArrayList(); QueryOccurrence foundQuery = new QueryOccurrence(); int index = 0; for (String word : documentWords) { if (word.equals(queryArray[0])){ foundQuery.SetStart(index); } if (word.equals(queryArray[queryArray.length-1])){ foundQuery.SetEnd(index); if((foundQuery.End()-foundQuery.Start())+1==queryArray.length){ //add the found query to the list foundQueries.add(foundQuery); //flush the foundQuery variable to use it again foundQuery= new QueryOccurrence(); } } index++; } return foundQueries; } This method return a list of all occurrence of the query in the document each one with its position. Could you suggest any easer and faster way to accomplish this task. Thanks

Read the article

Top n items in a List ( including duplicates )

- by Krishnan

Trying to find an efficient way to obtain the top N items in a very large list, possibly containing duplicates. I first tried sorting & slicing, which works. But this seems unnnecessary. You shouldn't need to sort a very large list if you just want the top 20 members. So I wrote a recursive routine which builds the top-n list. This also works, but is very much slower than the non-recursive one! Question: Which is my second routine (elite2) so much slower than elite, and how do I make it faster ? My code is attached below. Thanks. import scala.collection.SeqView import scala.math.min object X { def elite(s: SeqView[Int, List[Int]], k:Int):List[Int] = { s.sorted.reverse.force.slice(0,min(k,s.size)) } def elite2(s: SeqView[Int, List[Int]], k:Int, s2:List[Int]=Nil):List[Int] = { if( k == 0 || s.size == 0) s2.reverse else { val m = s.max val parts = s.force.partition(_==m) val whole = if( parts._1.size > 1) parts._1.tail:::parts._2 else parts._2 elite2( whole.view, k-1, m::s2 ) } } def main(args:Array[String]) = { val N = 1000000/3 val x = List(N to 1 by -1).flatten.map(x=>List(x,x,x)).flatten.view println(elite2(x,20)) println(elite(x,20)) } }

Read the article

Are programming languages and methods ineffective? (assembler and C knowledge needed)

- by b-gen-jack-o-neill

Hi, for a long time, I am thinking and studying output of C language compiler in asemlber form, as well as CPU architecture. I know this may be silly to you, but it seems to me that something is very ineffective. Please, don´t be angry if I am wrong, and there is some reason I do not see for all these principles. I will be very glad if you tell me why is it designed this way. I actually trully believe I am wrong, I know the genius minds of people which get PCs together knew a reason to do so. What exactly, do you ask? I´ll tell you right away, I use C as a example: 1, Stack local scope memory allocation: So, typical local memory allocation uses stack. Just copy esp to ebp and than allocate all the memory via ebp. OK, I would understand this if you explicitly need allocate RAM by default stack values, but if I do understand it correctly, modern OS use paging as a translation layer between application and physical RAM, when adress you desire is further translated before reaching actuall RAM byte. So why don´t just say 0x00000000 is int a,0x00000004 is int b and so? And access them just by mov 0x00000000,#10? Becouse you wont actually access memory blocks 0x00000000 and 0x00000004 but those your OS set the paging tables to. Actually, since memory allocation by ebp and esp use indirect adressing, "my" way would be even faster. 2, Variable allocation duplicitly: When you run aaplication, Loader load its code into RAM. When you create variable, or string, compiler generates code that pushes these values on the top o stack when created in main. So there is actuall instruction for do so, and that actuall number in memory. So, there are 2 entries of the same value in RAM. One in fomr of instruction, second in form of actuall bytes in the RAM. But why? Why not to just when declaring variable count at which memory block it would be, than when used, just insert this memory location?

Read the article

Optimizing landing pages

- by Oleg Shaldybin

In my current project (Rails 2.3) we have a collection of 1.2 million keywords, and each of them is associated with a landing page, which is effectively a search results page for a given keywords. Each of those pages is pretty complicated, so it can take a long time to generate (up to 2 seconds with a moderate load, even longer during traffic spikes, with current hardware). The problem is that 99.9% of visits to those pages are new visits (via search engines), so it doesn't help a lot to cache it on the first visit: it will still be slow for that visit, and the next visit could be in several weeks. I'd really like to make those pages faster, but I don't have too many ideas on how to do it. A couple of things that come to mind: build a cache for all keywords beforehand (with a very long TTL, a month or so). However, building and maintaing this cache can be a real pain, and the search results on the page might be outdated, or even no longer accessible; given the volatile nature of this data, don't try to cache anything at all, and just try to scale out to keep up with traffic. I'd really appreciate any feedback on this problem.

Read the article

Using "as bool?" instead of "object something = ViewState["hi"]"

- by Programmin Tool

So I'm going through old code (2.0) and I came across this: object isReviewingValue = ViewState["IsReviewing"]; if (isReviewingValue is bool) { return (bool)isReviewingValue; } My first thought was to us the "as" keyword to avoid the unneeded (bool)isReviewingValue; But "as" only works with non value types. No problem, I just went ahead and did this: bool? isReviewingValue= ViewState["IsReviewing"] as bool?; if (isReviewingValue.HasValue) { return isReviewingValue.Value; } Question is: Besides looking a bit more readable, is this in fact better? EDIT: So this is getting more interesting. I decided to test it using a simple stopwatch and turns out that the second is much faster... Which after reading some of the responses here I didn't expect at all. I was thinking for sure my way was much slower. Tell me what I did wrong: public Stopwatch AsRun() { Stopwatch watch = new Stopwatch(); watch.Start(); for (Int32 loopCounter = 0; loopCounter < 10000; loopCounter++) { Boolean? test = true as Boolean?; if (test.HasValue) { Boolean something = test.Value; } } watch.Stop(); return watch; } public Stopwatch ObjectIsRun() { Stopwatch watch = new Stopwatch(); watch.Start(); for (Int32 loopCounter = 0; loopCounter < 10000; loopCounter++) { Object test = true; if (test is Boolean) { Boolean something = (Boolean)test; } } watch.Stop(); return watch; } Every time I run these methods against each other, the AsRun is twice as fast.

Search Results

Search found 4580 results on 184 pages for 'faster'.

Page 159/184 | < Previous Page | 155 156 157 158 159 160 161 162 163 164 165 166 | Next Page >

- by Arcadian

- by Dave Jarvis

- by threenplusone

- by Ash

- by Kyle Cronin

- by Jimbo

- by user1872912

- by jeff6461

- by BitTwiddler1011

- by webmasters

- by user346629

- by DNS

- by Brendan Long

- by Jami

- by Steven

- by OberynMarDELL

- by reve_etrange

- by Lunan

- by Charles

- by zaius

- by user692704

- by Krishnan

- by b-gen-jack-o-neill

- by Oleg Shaldybin

- by Programmin Tool

< Previous Page | 155 156 157 158 159 160 161 162 163 164 165 166 | Next Page >