Search Results

Search found 3706 results on 149 pages for 'nano optimization'.

Page 60/149 | < Previous Page | 56 57 58 59 60 61 62 63 64 65 66 67  | Next Page >

  • Rewriting a for loop in pure NumPy to decrease execution time

    - by Statto
    I recently asked about trying to optimise a Python loop for a scientific application, and received an excellent, smart way of recoding it within NumPy which reduced execution time by a factor of around 100 for me! However, calculation of the B value is actually nested within a few other loops, because it is evaluated at a regular grid of positions. Is there a similarly smart NumPy rewrite to shave time off this procedure? I suspect the performance gain for this part would be less marked, and the disadvantages would presumably be that it would not be possible to report back to the user on the progress of the calculation, that the results could not be written to the output file until the end of the calculation, and possibly that doing this in one enormous step would have memory implications? Is it possible to circumvent any of these? import numpy as np import time def reshape_vector(v): b = np.empty((3,1)) for i in range(3): b[i][0] = v[i] return b def unit_vectors(r): return r / np.sqrt((r*r).sum(0)) def calculate_dipole(mu, r_i, mom_i): relative = mu - r_i r_unit = unit_vectors(relative) A = 1e-7 num = A*(3*np.sum(mom_i*r_unit, 0)*r_unit - mom_i) den = np.sqrt(np.sum(relative*relative, 0))**3 B = np.sum(num/den, 1) return B N = 20000 # number of dipoles r_i = np.random.random((3,N)) # positions of dipoles mom_i = np.random.random((3,N)) # moments of dipoles a = np.random.random((3,3)) # three basis vectors for this crystal n = [10,10,10] # points at which to evaluate sum gamma_mu = 135.5 # a constant t_start = time.clock() for i in range(n[0]): r_frac_x = np.float(i)/np.float(n[0]) r_test_x = r_frac_x * a[0] for j in range(n[1]): r_frac_y = np.float(j)/np.float(n[1]) r_test_y = r_frac_y * a[1] for k in range(n[2]): r_frac_z = np.float(k)/np.float(n[2]) r_test = r_test_x +r_test_y + r_frac_z * a[2] r_test_fast = reshape_vector(r_test) B = calculate_dipole(r_test_fast, r_i, mom_i) omega = gamma_mu*np.sqrt(np.dot(B,B)) # write r_test, B and omega to a file frac_done = np.float(i+1)/(n[0]+1) t_elapsed = (time.clock()-t_start) t_remain = (1-frac_done)*t_elapsed/frac_done print frac_done*100,'% done in',t_elapsed/60.,'minutes...approximately',t_remain/60.,'minutes remaining'

    Read the article

  • Can anyone recommend a decent tool for optimising images other than photoshop

    - by toomanyairmiles
    Can anyone recommend a decent tool for optimising images other than adobe photoshop, the gimp etc? I'm looking to optimise images for the web preferably online and free. Basically I have a client who can't install additional software on their work PC but needs to optimise photographs and other images for their website and is presently uploading 1 or 2 Mb files. On a personal level I'm interested to see what other people are using...

    Read the article

  • Google Web Optimizer -- How long until winning combination?

    - by Django Reinhardt
    I've had an A/B Test running in Google Web Optimizer for six weeks now, and there's still no end in sight. Google is still saying: "We have not gathered enough data yet to show any significant results. When we collect more data we should be able to show you a winning combination." Is there any way of telling how close Google is to making up its mind? (Does anyone know what algorithm does it use to decide if there's been any "high confidence winners"?) According to the Google help documentation: Sometimes we simply need more data to be able to reach a level of high confidence. A tested combination typically needs around 200 conversions for us to judge its performance with certainty. But all of our conversions have over 200 conversations at the moment: 230 / 4061 (Original) 223 / 3937 (Variation 1) 205 / 3984 (Variation 2) 205 / 4007 (Variation 3) How much longer is it going to have to run?? Thanks for any help.

    Read the article

  • Code runs 6 times slower with 2 threads than with 1

    - by Edward Bird
    So I have written some code to experiment with threads and do some testing. The code should create some numbers and then find the mean of those numbers. I think it is just easier to show you what I have so far. I was expecting with two threads that the code would run about 2 times as fast. Measuring it with a stopwatch I think it runs about 6 times slower! void findmean(std::vector<double>*, std::size_t, std::size_t, double*); int main(int argn, char** argv) { // Program entry point std::cout << "Generating data..." << std::endl; // Create a vector containing many variables std::vector<double> data; for(uint32_t i = 1; i <= 1024 * 1024 * 128; i ++) data.push_back(i); // Calculate mean using 1 core double mean = 0; std::cout << "Calculating mean, 1 Thread..." << std::endl; findmean(&data, 0, data.size(), &mean); mean /= (double)data.size(); // Print result std::cout << " Mean=" << mean << std::endl; // Repeat, using two threads std::vector<std::thread> thread; std::vector<double> result; result.push_back(0.0); result.push_back(0.0); std::cout << "Calculating mean, 2 Threads..." << std::endl; // Run threads uint32_t halfsize = data.size() / 2; uint32_t A = 0; uint32_t B, C, D; // Split the data into two blocks if(data.size() % 2 == 0) { B = C = D = halfsize; } else if(data.size() % 2 == 1) { B = C = halfsize; D = hsz + 1; } // Run with two threads thread.push_back(std::thread(findmean, &data, A, B, &(result[0]))); thread.push_back(std::thread(findmean, &data, C, D , &(result[1]))); // Join threads thread[0].join(); thread[1].join(); // Calculate result mean = result[0] + result[1]; mean /= (double)data.size(); // Print result std::cout << " Mean=" << mean << std::endl; // Return return EXIT_SUCCESS; } void findmean(std::vector<double>* datavec, std::size_t start, std::size_t length, double* result) { for(uint32_t i = 0; i < length; i ++) { *result += (*datavec).at(start + i); } } I don't think this code is exactly wonderful, if you could suggest ways of improving it then I would be grateful for that also.

    Read the article

  • An image from byte to optimized web page presentation

    - by blgnklc
    I get the data of the stored image on database as byte[] array; then I convert it to System.Drawing.Image like the code shown below; public System.Drawing.Image CreateImage(byte[] bytes) { System.IO.MemoryStream memoryStream = new System.IO.MemoryStream(bytes); System.Drawing.Image image = System.Drawing.Image.FromStream(memoryStream); return image; } (*) On the other hand I am planning to show a list of images on asp.net pages as the client scrolls downs the page. The more user gets down and down on the page he/she does see the more photos. So it means fast page loads and rich user experience. (you may see what I mean on www.mashable.com, just take care the new loads of the photos as you scroll down.) Moreover, the returned imgae object from the method above, how can i show it in a loop dynamically using the (*) conditions above. Regards bk

    Read the article

  • Does the <script> tag position in HTML affects performance of the webpage?

    - by Rahul Joshi
    If the script tag is above or below the body in a HTML page, does it matter for the performance of a website? And what if used in between like this: <body> ..blah..blah.. <script language="JavaScript" src="JS_File_100_KiloBytes"> function f1() { .. some logic reqd. for manipulating contents in a webpage } </script> ... some text here too ... </body> Or is this better?: <script language="JavaScript" src="JS_File_100_KiloBytes"> function f1() { .. some logic reqd. for manipulating contents in a webpage } </script> <body> ..blah..blah.. ..call above functions on some events like onclick,onfocus,etc.. </body> Or this one?: <body> ..blah..blah.. ..call above functions on some events like onclick,onfocus,etc.. <script language="JavaScript" src="JS_File_100_KiloBytes"> function f1() { .. some logic reqd. for manipulating contents in a webpage } </script> </body> Need not tell everything is again in the <html> tag!! How does it affect performance of webpage while loading? Does it really? Which one is the best, either out of these 3 or some other which you know? And one more thing, I googled a bit on this, from which I went here: Best Practices for Speeding Up Your Web Site and it suggests put scripts at the bottom, but traditionally many people put it in <head> tag which is above the <body> tag. I know it's NOT a rule but many prefer it that way. If you don't believe it, just view source of this page! And tell me what's the better style for best performance.

    Read the article

  • when is java faster than c++ (or when is JIT faster then precompiled)?

    - by kostja
    I have heard that under certain circumstances, Java programs or rather parts of java programs are able to be executed faster than the "same" code in C++ (or other precompiled code) due to JIT optimizations. This is due to the compiler being able to determine the scope of some variables, avoid some conditionals and pull similar tricks at runtime. Could you give an (or better - some) example, where this applies? And maybe outline the exact conditions under which the compiler is able to optimize the bytecode beyond what is possible with precompiled code? NOTE : This question is not about comparing Java to C++. Its about the possibilities of JIT compiling. Please no flaming. I am also not aware of any duplicates. Please point them out if you are.

    Read the article

  • Oracle Sql Query taking a day long to return results using dblink

    - by Suresh S
    Guys i have the following oracle sql query that gives me the monthwise report between the dates. Basically for nov month i want sum of values between the dates 01nov to 30 nov. The table tha is being queried is residing in another database and accesssed using dblink. The DT columns is of NUMBER type (for ex 20101201) .The execution of the query is taking a day long and not completed. kindly suggest me , if their is any optimisation that can be suggested to my DBA on the dblink, or any tuning that can be done on the query , or rewriting the same. SELECT /*+ PARALLEL (A 8) */ TO_CHAR(TRUNC(TRUNC(SYSDATE,'MM')- 1,'MM'),'MONYYYY') "MONTH", TYPE AS "TYPE", COLUMN, COUNT (DISTINCT A) AS "A_COUNT", COUNT (COLUMN) AS NO_OF_COLS, SUM (DURATION) AS "SUM_DURATION", SUM (COST) AS "COST" FROM **A@LN_PROD A** WHERE DT >=TO_NUMBER(TO_CHAR(TRUNC(TRUNC(SYSDATE,'MM')-1,'MM'),'YYYYMMDD')) AND DT < TO_NUMBER(TO_CHAR(TRUNC(TRUNC(SYSDATE,'MM'),'MM'),'YYYYMMDD')) GROUP BY TYPE, COLUMN

    Read the article

  • How to index a date column with null values?

    - by Heinz Z.
    How should I index a date column when some rows has null values? We have to select rows between a date range and rows with null dates. We use Oracle 9.2 and higher. Options I found Using a bitmap index on the date column Using an index on date column and an index on a state field which value is 1 when the date is null Using an index on date column and an other granted not null column My thoughts to the options are: to 1: the table have to many different values to use an bitmap index to 2: I have to add an field only for this purpose and to change the query when I want to retrieve the null date rows to 3: locks tricky to add an field to an index which is not really needed What is the best practice for this case? Thanks in advance Some infos I have read: Oracle Date Index When does Oracle index null column values?

    Read the article

  • Data Access from single table in sql server 2005 is too slow

    - by Muhammad Kashif Nadeem
    Following is the script of table. Accessing data from this table is too slow. SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Emails]( [id] [int] IDENTITY(1,1) NOT NULL, [datecreated] [datetime] NULL CONSTRAINT [DF_Emails_datecreated] DEFAULT (getdate()), [UID] [nvarchar](250) COLLATE Latin1_General_CI_AS NULL, [From] [nvarchar](100) COLLATE Latin1_General_CI_AS NULL, [To] [nvarchar](100) COLLATE Latin1_General_CI_AS NULL, [Subject] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL, [Body] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL, [HTML] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL, [AttachmentCount] [int] NULL, [Dated] [datetime] NULL ) ON [PRIMARY] Following query takes 50 seconds to fetch data. select id, datecreated, UID, [From], [To], Subject, AttachmentCount, Dated from emails If I include Body and Html in select then time is event worse. indexes are on: id unique clustered From Non unique non clustered To Non unique non clustered Tabls has currently 180000+ records. There might be 100,000 records each month so this will become more slow as time will pass. Does splitting data into two table will solve the problem? What other indexes should be there?

    Read the article

  • Verifying compiler optimizations in gcc/g++ by analyzing assembly listings

    - by Victor Liu
    I just asked a question related to how the compiler optimizes certain C++ code, and I was looking around SO for any questions about how to verify that the compiler has performed certain optimizations. I was trying to look at the assembly listing generated with g++ (g++ -c -g -O2 -Wa,-ahl=file.s file.c) to possibly see what is going on under the hood, but the output is too cryptic to me. What techniques do people use to tackle this problem, and are there any good references on how to interpret the assembly listings of optimized code or articles specific to the GCC toolchain that talk about this problem?

    Read the article

  • Is SQL DATEDIFF(year, ..., ...) an Expensive Computation?

    - by rlb.usa
    I'm trying to optimize up some horrendously complicated SQL queries because it takes too long to finish. In my queries, I have dynamically created SQL statements with lots of the same functions, so I created a temporary table where each function is only called once instead of many, many times - this cut my execution time by 3/4. So my question is, can I expect to see much of a difference if say, 1,000 datediff computations are narrowed to 100?

    Read the article

  • What does ER_WARN_FIELD_RESOLVED mean?

    - by VolkerK
    When SHOW WARNINGS after a EXPLAIN EXTENDED shows a Note 1276 Field or reference 'test.foo.bar' of SELECT #2 was resolved in SELECT #1 what exactly does that mean and what impact does it have? In my case it prevents mysql from using what seems to be a perfectly good index. But it's not about fixing that specific query (as it is an irrelevant test). I found http://dev.mysql.com/doc/refman/5.0/en/error-messages-server.html butError: 1276 SQLSTATE: HY000 (ER_WARN_FIELD_RESOLVED) Message: Field or reference '%s%s%s%s%s' of SELECT #%d was resolved in SELECT #%d isn't much of an explaination.

    Read the article

  • Completely remove ViewState for specific pages

    - by Kerido
    Hi everybody, I have a site that features some pages which do not require any post-back functionality. They simply display static HTML and don't even have any associated code. However, since the Master Page has a <form runat="server"> tag which wraps all ContentPlaceHolders, the resulting HTML always contains the ViewState field, i.e: <input type="hidden" id="__VIEWSTATE" value="/wEPDwUKMjEwNDQyMTMxM2Rk0XhpfvawD3g+fsmZqmeRoPnb9kI=" /> I realize, that when decrypted, this string corresponds to the <form> tag which I cannot remove. However, I would still like to remove the ViewState field for pages that only display static HTML. Is it possible?

    Read the article

  • How to optimize an database suggestion engine

    - by Dimitar Vouldjeff
    Hi, I`m making an online engine for item-to-item recommending movies. I have made some researches and I think that the best way to implement that is using pearson correlation and make a table with item1, item2 and correlation fields, but the problem is that after each rate of item I have to regenerate the correlation for in the worst case N records (where N is the number of items). Another think that I read is the following article, but I haven`t thought a way to implement it. So what is your suggestion to optimize this process? Or any other suggestions? Thanks.

    Read the article

  • Database indexes - what should they be

    - by WebweaverD
    Most of my database tables have a clear unique index through which lookups are done 90% of the time but I am a bit unsure on this one - I have a table which keeps track of user rating totals for items in my database, I now want to add another table, to track individual ratings with an ip address column to make sure no one can rate something twice. Since I can see this becoming a big, high use table it is important to optimize it correctly. (MYSQL table) This table will have the following fields: rating_id(always - unique), item_id (always - not unique), user_id (optional - not unique), ip_address (always - not unique), rating_value(always - not unique), has_review(bool) Now I envisions 90% the queries going something like this: When a user rates something - select where item_id = x and ip_address = y, (if rows = 0) insert rating When in user account pages - select where ip_address = x or username = y Now none of the fields searched on are unique, can I still use them as indexes (for example item _id and ip_address), can I have two indexes and will this still improve performance over a non indexed table?

    Read the article

  • Should not a tail-recursive function also be faster?

    - by Balint Erdi
    I have the following Clojure code to calculate a number with a certain "factorable" property. (what exactly the code does is secondary). (defn factor-9 ([] (let [digits (take 9 (iterate #(inc %) 1)) nums (map (fn [x] ,(Integer. (apply str x))) (permutations digits))] (some (fn [x] (and (factor-9 x) x)) nums))) ([n] (or (= 1 (count (str n))) (and (divisible-by-length n) (factor-9 (quot n 10)))))) Now, I'm into TCO and realize that Clojure can only provide tail-recursion if explicitly told so using the recur keyword. So I've rewritten the code to do that (replacing factor-9 with recur being the only difference): (defn factor-9 ([] (let [digits (take 9 (iterate #(inc %) 1)) nums (map (fn [x] ,(Integer. (apply str x))) (permutations digits))] (some (fn [x] (and (factor-9 x) x)) nums))) ([n] (or (= 1 (count (str n))) (and (divisible-by-length n) (recur (quot n 10)))))) To my knowledge, TCO has a double benefit. The first one is that it does not use the stack as heavily as a non tail-recursive call and thus does not blow it on larger recursions. The second, I think is that consequently it's faster since it can be converted to a loop. Now, I've made a very rough benchmark and have not seen any difference between the two implementations although. Am I wrong in my second assumption or does this have something to do with running on the JVM (which does not have automatic TCO) and recur using a trick to achieve it? Thank you.

    Read the article

  • Javascriptlibrary more efficient than Rickshaw for realtime visualizations

    - by dan kutz
    I want to visualize data as time-series graphs on mobile devices(tablets) and therefore stumbled upon rickshaw, which is based on D3. First I must say I was a little bit confused when I realized that realtime in web design is defined totally different to realtime in engineering which has fixed(and often very short) timeframes. Anyway my aim is to visualize the data as fast as possible, and on older tablets visualization with rickshaw is quite slow. Can anybody recommend another library, which may be more efficient in rendering? Or is there no way out and I have to go native? regards Dan.

    Read the article

  • "Anagram solver" based on statistics rather than a dictionary/table?

    - by James M.
    My problem is conceptually similar to solving anagrams, except I can't just use a dictionary lookup. I am trying to find plausible words rather than real words. I have created an N-gram model (for now, N=2) based on the letters in a bunch of text. Now, given a random sequence of letters, I would like to permute them into the most likely sequence according to the transition probabilities. I thought I would need the Viterbi algorithm when I started this, but as I look deeper, the Viterbi algorithm optimizes a sequence of hidden random variables based on the observed output. I am trying to optimize the output sequence. Is there a well-known algorithm for this that I can read about? Or am I on the right track with Viterbi and I'm just not seeing how to apply it?

    Read the article

  • most efficient method of turning multiple 1D arrays into columns of a 2D array

    - by Ty W
    As I was writing a for loop earlier today, I thought that there must be a neater way of doing this... so I figured I'd ask. I looked briefly for a duplicate question but didn't see anything obvious. The Problem: Given N arrays of length M, turn them into a M-row by N-column 2D array Example: $id = [1,5,2,8,6] $name = [a,b,c,d,e] $result = [[1,a], [5,b], [2,c], [8,d], [6,e]] My Solution: Pretty straight forward and probably not optimal, but it does work: <?php // $row is returned from a DB query // $row['<var>'] is a comma separated string of values $categories = array(); $ids = explode(",", $row['ids']); $names = explode(",", $row['names']); $titles = explode(",", $row['titles']); for($i = 0; $i < count($ids); $i++) { $categories[] = array("id" => $ids[$i], "name" => $names[$i], "title" => $titles[$i]); } ?> note: I didn't put the name = value bit in the spec, but it'd be awesome if there was some way to keep that as well.

    Read the article

  • unroll nested for loops in C++

    - by Hristo
    How would I unroll the following nested loops? for(k = begin; k != end; ++k) { for(j = 0; j < Emax; ++j) { for(i = 0; i < N; ++i) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); } } } I tried the following, but my output isn't the same, and it should be: for(k = begin; k != end; ++k) { for(j = 0; j < Emax; ++j) { for(i = 0; i+4 < N; i+=4) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); array[k] += foo(i+1, tr[k][i+1], ex[j][i+1]); array[k] += foo(i+2, tr[k][i+2], ex[j][i+2]); array[k] += foo(i+3, tr[k][i+3], ex[j][i+3]); } if (i < N) { for (; i < N; ++i) { if (j >= E[i]) continue; array[k] += foo(i, tr[k][i], ex[j][i]); } } } } I will be running this code in parallel using Intel's TBB so that it takes advantage of multiple cores. After this is finished running, another function prints out what is in array[] and right now, with my unrolling, the output isn't identical. Any help is appreciated. Thanks, Hristo

    Read the article

  • How to insert zeros between bits in a bitmap?

    - by anatolyg
    I have some performance-heavy code that performs bit manipulations. It can be reduced to the following well-defined problem: Given a 13-bit bitmap, construct a 26-bit bitmap that contains the original bits spaced at even positions. To illustrate: 0000000000000000000abcdefghijklm (input, 32 bits) 0000000a0b0c0d0e0f0g0h0i0j0k0l0m (output, 32 bits) I currently have it implemented in the following way in C: if (input & (1 << 12)) output |= 1 << 24; if (input & (1 << 11)) output |= 1 << 22; if (input & (1 << 10)) output |= 1 << 20; ... My compiler (MS Visual Studio) turned this into the following: test eax,1000h jne 0064F5EC or edx,1000000h ... (repeated 13 times with minor differences in constants) I wonder whether i can make it any faster. I would like to have my code written in C, but switching to assembly language is possible. Can i use some MMX/SSE instructions to process all bits at once? Maybe i can use multiplication? (multiply by 0x11111111 or some other magical constant) Would it be better to use condition-set instruction (SETcc) instead of conditional-jump instruction? If yes, how can i make the compiler produce such code for me? Any other idea how to make it faster? Any idea how to do the inverse bitmap transformation (i have to implement it too, bit it's less critical)?

    Read the article

  • approximating log10[x^k0 + k1]

    - by Yale Zhang
    Greetings. I'm trying to approximate the function Log10[x^k0 + k1], where .21 < k0 < 21, 0 < k1 < ~2000, and x is integer < 2^14. k0 & k1 are constant. For practical purposes, you can assume k0 = 2.12, k1 = 2660. The desired accuracy is 5*10^-4 relative error. This function is virtually identical to Log[x], except near 0, where it differs a lot. I already have came up with a SIMD implementation that is ~1.15x faster than a simple lookup table, but would like to improve it if possible, which I think is very hard due to lack of efficient instructions. My SIMD implementation uses 16bit fixed point arithmetic to evaluate a 3rd degree polynomial (I use least squares fit). The polynomial uses different coefficients for different input ranges. There are 8 ranges, and range i spans (64)2^i to (64)2^(i + 1). The rational behind this is the derivatives of Log[x] drop rapidly with x, meaning a polynomial will fit it more accurately since polynomials are an exact fit for functions that have a derivative of 0 beyond a certain order. SIMD table lookups are done very efficiently with a single _mm_shuffle_epi8(). I use SSE's float to int conversion to get the exponent and significand used for the fixed point approximation. I also software pipelined the loop to get ~1.25x speedup, so further code optimizations are probably unlikely. What I'm asking is if there's a more efficient approximation at a higher level? For example: Can this function be decomposed into functions with a limited domain like log2((2^x) * significand) = x + log2(significand) hence eliminating the need to deal with different ranges (table lookups). The main problem I think is adding the k1 term kills all those nice log properties that we know and love, making it not possible. Or is it? Iterative method? don't think so because the Newton method for log[x] is already a complicated expression Exploiting locality of neighboring pixels? - if the range of the 8 inputs fall in the same approximation range, then I can look up a single coefficient, instead of looking up separate coefficients for each element. Thus, I can use this as a fast common case, and use a slower, general code path when it isn't. But for my data, the range needs to be ~2000 before this property hold 70% of the time, which doesn't seem to make this method competitive. Please, give me some opinion, especially if you're an applied mathematician, even if you say it can't be done. Thanks.

    Read the article

< Previous Page | 56 57 58 59 60 61 62 63 64 65 66 67  | Next Page >