Search Results

Search found 14719 results on 589 pages for 'optimization level'.

Page 92/589 | < Previous Page | 88 89 90 91 92 93 94 95 96 97 98 99 | Next Page >

Queries within queries: Is there a better way?

- by mririgo

As I build bigger, more advanced web applications, I'm finding myself writing extremely long and complex queries. I tend to write queries within queries a lot because I feel making one call to the database from PHP is better than making several and correlating the data. However, anyone who knows anything about SQL knows about JOINs. Personally, I've used a JOIN or two before, but quickly stopped when I discovered using subqueries because it felt easier and quicker for me to write and maintain. Commonly, I'll do subqueries that may contain one or more subqueries from relative tables. Consider this example: SELECT (SELECT username FROM users WHERE records.user_id = user_id) AS username, (SELECT last_name||', '||first_name FROM users WHERE records.user_id = user_id) AS name, in_timestamp, out_timestamp FROM records ORDER BY in_timestamp Rarely, I'll do subqueries after the WHERE clause. Consider this example: SELECT user_id, (SELECT name FROM organizations WHERE (SELECT organization FROM locations WHERE records.location = location_id) = organization_id) AS organization_name FROM records ORDER BY in_timestamp In these two cases, would I see any sort of improvement if I decided to rewrite the queries using a JOIN? As more of a blanket question, what are the advantages/disadvantages of using subqueries or a JOIN? Is one way more correct or accepted than the other?

Read the article
MySQL subqueries

- by swamprunner7

Can we do this query without subqueries? SELECT login, post_n, (SELECT SUM(vote) FROM votes WHERE votes.post_n=posts.post_n)AS votes, (SELECT COUNT(comments.post_n) FROM comments WHERE comments.post_n=posts.post_n)AS comments_count FROM users, posts WHERE posts.id=users.id AND (visibility=2 OR visibility=3) ORDER BY date DESC LIMIT 0, 15 tables: Users: id, login Posts: post_n, id, visibility Votes: post_n, vote id — it`s user id, Users the main table.

Read the article
Meassure website

- by s0mmer

Hi, I was wondering if it is possible to install or use any online service to measure your website's performance? I've seen many just checking the download speed of images, external files etc. But is it possible to meassure how long asp/php code takes to execute? I have a site running a bit slowly, and it would be very nice with some app/service guiding where to optimize.

Read the article
help me improve my sse yuv to rgb ssse3 code

- by David McPaul

Hello, I am looking to optimise some sse code I wrote for converting yuv to rgb (both planar and packed yuv functions). i am using SSSE3 at the moment but if there are useful functions from later sse versions thats ok. I am mainly interested in how I would work out processor stalls and the like. Anyone know of any tools that do static analysis of sse code? ; ; Copyright (C) 2009-2010 David McPaul ; ; All rights reserved. Distributed under the terms of the MIT License. ; ; A rather unoptimised set of ssse3 yuv to rgb converters ; does 8 pixels per loop ; inputer: ; reads 128 bits of yuv 8 bit data and puts ; the y values converted to 16 bit in xmm0 ; the u values converted to 16 bit and duplicated into xmm1 ; the v values converted to 16 bit and duplicated into xmm2 ; conversion: ; does the yuv to rgb conversion using 16 bit integer and the ; results are placed into the following registers as 8 bit clamped values ; r values in xmm3 ; g values in xmm4 ; b values in xmm5 ; outputer: ; writes out the rgba pixels as 8 bit values with 0 for alpha ; xmm6 used for scratch ; xmm7 used for scratch %macro cglobal 1 global _%1 %define %1 _%1 align 16 %1: %endmacro ; conversion code %macro yuv2rgbsse2 0 ; u = u - 128 ; v = v - 128 ; r = y + v + v >> 2 + v >> 3 + v >> 5 ; g = y - (u >> 2 + u >> 4 + u >> 5) - (v >> 1 + v >> 3 + v >> 4 + v >> 5) ; b = y + u + u >> 1 + u >> 2 + u >> 6 ; subtract 16 from y movdqa xmm7, [Const16] ; loads a constant using data cache (slower on first fetch but then cached) psubsw xmm0,xmm7 ; y = y - 16 ; subtract 128 from u and v movdqa xmm7, [Const128] ; loads a constant using data cache (slower on first fetch but then cached) psubsw xmm1,xmm7 ; u = u - 128 psubsw xmm2,xmm7 ; v = v - 128 ; load r,b with y movdqa xmm3,xmm0 ; r = y pshufd xmm5,xmm0, 0xE4 ; b = y ; r = y + v + v >> 2 + v >> 3 + v >> 5 paddsw xmm3, xmm2 ; add v to r movdqa xmm7, xmm1 ; move u to scratch pshufd xmm6, xmm2, 0xE4 ; move v to scratch psraw xmm6,2 ; divide v by 4 paddsw xmm3, xmm6 ; and add to r psraw xmm6,1 ; divide v by 2 paddsw xmm3, xmm6 ; and add to r psraw xmm6,2 ; divide v by 4 paddsw xmm3, xmm6 ; and add to r ; b = y + u + u >> 1 + u >> 2 + u >> 6 paddsw xmm5, xmm1 ; add u to b psraw xmm7,1 ; divide u by 2 paddsw xmm5, xmm7 ; and add to b psraw xmm7,1 ; divide u by 2 paddsw xmm5, xmm7 ; and add to b psraw xmm7,4 ; divide u by 32 paddsw xmm5, xmm7 ; and add to b ; g = y - u >> 2 - u >> 4 - u >> 5 - v >> 1 - v >> 3 - v >> 4 - v >> 5 movdqa xmm7,xmm2 ; move v to scratch pshufd xmm6,xmm1, 0xE4 ; move u to scratch movdqa xmm4,xmm0 ; g = y psraw xmm6,2 ; divide u by 4 psubsw xmm4,xmm6 ; subtract from g psraw xmm6,2 ; divide u by 4 psubsw xmm4,xmm6 ; subtract from g psraw xmm6,1 ; divide u by 2 psubsw xmm4,xmm6 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,2 ; divide v by 4 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g psraw xmm7,1 ; divide v by 2 psubsw xmm4,xmm7 ; subtract from g %endmacro ; outputer %macro rgba32sse2output 0 ; clamp values pxor xmm7,xmm7 packuswb xmm3,xmm7 ; clamp to 0,255 and pack R to 8 bit per pixel packuswb xmm4,xmm7 ; clamp to 0,255 and pack G to 8 bit per pixel packuswb xmm5,xmm7 ; clamp to 0,255 and pack B to 8 bit per pixel ; convert to bgra32 packed punpcklbw xmm5,xmm4 ; bgbgbgbgbgbgbgbg movdqa xmm0, xmm5 ; save bg values punpcklbw xmm3,xmm7 ; r0r0r0r0r0r0r0r0 punpcklwd xmm5,xmm3 ; lower half bgr0bgr0bgr0bgr0 punpckhwd xmm0,xmm3 ; upper half bgr0bgr0bgr0bgr0 ; write to output ptr movntdq [edi], xmm5 ; output first 4 pixels bypassing cache movntdq [edi+16], xmm0 ; output second 4 pixels bypassing cache %endmacro SECTION .data align=16 Const16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 dw 16 Const128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 dw 128 UMask db 0x01 db 0x80 db 0x01 db 0x80 db 0x05 db 0x80 db 0x05 db 0x80 db 0x09 db 0x80 db 0x09 db 0x80 db 0x0d db 0x80 db 0x0d db 0x80 VMask db 0x03 db 0x80 db 0x03 db 0x80 db 0x07 db 0x80 db 0x07 db 0x80 db 0x0b db 0x80 db 0x0b db 0x80 db 0x0f db 0x80 db 0x0f db 0x80 YMask db 0x00 db 0x80 db 0x02 db 0x80 db 0x04 db 0x80 db 0x06 db 0x80 db 0x08 db 0x80 db 0x0a db 0x80 db 0x0c db 0x80 db 0x0e db 0x80 ; void Convert_YUV422_RGBA32_SSSE3(void *fromPtr, void *toPtr, int width) width equ ebp+16 toPtr equ ebp+12 fromPtr equ ebp+8 ; void Convert_YUV420P_RGBA32_SSSE3(void *fromYPtr, void *fromUPtr, void *fromVPtr, void *toPtr, int width) width1 equ ebp+24 toPtr1 equ ebp+20 fromVPtr equ ebp+16 fromUPtr equ ebp+12 fromYPtr equ ebp+8 SECTION .text align=16 cglobal Convert_YUV422_RGBA32_SSSE3 ; reserve variables push ebp mov ebp, esp push edi push esi push ecx mov esi, [fromPtr] mov edi, [toPtr] mov ecx, [width] ; loop width / 8 times shr ecx,3 test ecx,ecx jng ENDLOOP REPEATLOOP: ; loop over width / 8 ; YUV422 packed inputer movdqa xmm0, [esi] ; should have yuyv yuyv yuyv yuyv pshufd xmm1, xmm0, 0xE4 ; copy to xmm1 movdqa xmm2, xmm0 ; copy to xmm2 ; extract both y giving y0y0 pshufb xmm0, [YMask] ; extract u and duplicate so each u in yuyv becomes u0u0 pshufb xmm1, [UMask] ; extract v and duplicate so each v in yuyv becomes v0v0 pshufb xmm2, [VMask] yuv2rgbsse2 rgba32sse2output ; endloop add edi,32 add esi,16 sub ecx, 1 ; apparently sub is better than dec jnz REPEATLOOP ENDLOOP: ; Cleanup pop ecx pop esi pop edi mov esp, ebp pop ebp ret cglobal Convert_YUV420P_RGBA32_SSSE3 ; reserve variables push ebp mov ebp, esp push edi push esi push ecx push eax push ebx mov esi, [fromYPtr] mov eax, [fromUPtr] mov ebx, [fromVPtr] mov edi, [toPtr1] mov ecx, [width1] ; loop width / 8 times shr ecx,3 test ecx,ecx jng ENDLOOP1 REPEATLOOP1: ; loop over width / 8 ; YUV420 Planar inputer movq xmm0, [esi] ; fetch 8 y values (8 bit) yyyyyyyy00000000 movd xmm1, [eax] ; fetch 4 u values (8 bit) uuuu000000000000 movd xmm2, [ebx] ; fetch 4 v values (8 bit) vvvv000000000000 ; extract y pxor xmm7,xmm7 ; 00000000000000000000000000000000 punpcklbw xmm0,xmm7 ; interleave xmm7 into xmm0 y0y0y0y0y0y0y0y0 ; extract u and duplicate so each becomes 0u0u punpcklbw xmm1,xmm7 ; interleave xmm7 into xmm1 u0u0u0u000000000 punpcklwd xmm1,xmm7 ; interleave again u000u000u000u000 pshuflw xmm1,xmm1, 0xA0 ; copy u values pshufhw xmm1,xmm1, 0xA0 ; to get u0u0 ; extract v punpcklbw xmm2,xmm7 ; interleave xmm7 into xmm1 v0v0v0v000000000 punpcklwd xmm2,xmm7 ; interleave again v000v000v000v000 pshuflw xmm2,xmm2, 0xA0 ; copy v values pshufhw xmm2,xmm2, 0xA0 ; to get v0v0 yuv2rgbsse2 rgba32sse2output ; endloop add edi,32 add esi,8 add eax,4 add ebx,4 sub ecx, 1 ; apparently sub is better than dec jnz REPEATLOOP1 ENDLOOP1: ; Cleanup pop ebx pop eax pop ecx pop esi pop edi mov esp, ebp pop ebp ret SECTION .note.GNU-stack noalloc noexec nowrite progbits

Read the article
Difference between Logarithmic and Uniform cost criteria

- by Marthin

I'v got some problem to understand the difference between Logarithmic(Lcc) and Uniform(Ucc) cost criteria and also how to use it in calculations. Could someone please explain the difference between the two and perhaps show how to calculate the complexity for a problem like A+B*C (Yes this is part of an assignment =) ) Thx for any help! /Marthin

Read the article
Rewriting a for loop in pure NumPy to decrease execution time

- by Statto

I recently asked about trying to optimise a Python loop for a scientific application, and received an excellent, smart way of recoding it within NumPy which reduced execution time by a factor of around 100 for me! However, calculation of the B value is actually nested within a few other loops, because it is evaluated at a regular grid of positions. Is there a similarly smart NumPy rewrite to shave time off this procedure? I suspect the performance gain for this part would be less marked, and the disadvantages would presumably be that it would not be possible to report back to the user on the progress of the calculation, that the results could not be written to the output file until the end of the calculation, and possibly that doing this in one enormous step would have memory implications? Is it possible to circumvent any of these? import numpy as np import time def reshape_vector(v): b = np.empty((3,1)) for i in range(3): b[i][0] = v[i] return b def unit_vectors(r): return r / np.sqrt((r*r).sum(0)) def calculate_dipole(mu, r_i, mom_i): relative = mu - r_i r_unit = unit_vectors(relative) A = 1e-7 num = A*(3*np.sum(mom_i*r_unit, 0)*r_unit - mom_i) den = np.sqrt(np.sum(relative*relative, 0))**3 B = np.sum(num/den, 1) return B N = 20000 # number of dipoles r_i = np.random.random((3,N)) # positions of dipoles mom_i = np.random.random((3,N)) # moments of dipoles a = np.random.random((3,3)) # three basis vectors for this crystal n = [10,10,10] # points at which to evaluate sum gamma_mu = 135.5 # a constant t_start = time.clock() for i in range(n[0]): r_frac_x = np.float(i)/np.float(n[0]) r_test_x = r_frac_x * a[0] for j in range(n[1]): r_frac_y = np.float(j)/np.float(n[1]) r_test_y = r_frac_y * a[1] for k in range(n[2]): r_frac_z = np.float(k)/np.float(n[2]) r_test = r_test_x +r_test_y + r_frac_z * a[2] r_test_fast = reshape_vector(r_test) B = calculate_dipole(r_test_fast, r_i, mom_i) omega = gamma_mu*np.sqrt(np.dot(B,B)) # write r_test, B and omega to a file frac_done = np.float(i+1)/(n[0]+1) t_elapsed = (time.clock()-t_start) t_remain = (1-frac_done)*t_elapsed/frac_done print frac_done*100,'% done in',t_elapsed/60.,'minutes...approximately',t_remain/60.,'minutes remaining'

Read the article
AppFabric caching's local cache isnt working for us... What are we doing wrong?

- by Olly

We are using appfabric as the 2ndlevel cache for an NHibernate asp.net application comprising a customer facing website and an admin website. They are both connected to the same cache so when admin updates something, the customer facing site is updated. It seems to be working OK - we have a CacheCLuster on a seperate server and all is well but we want to enable localcache to get better performance, however, it dosnt seem to be working. We have enabled it like this... bool UseLocalCache = int LocalCacheObjectCount = int.MaxValue; TimeSpan LocalCacheDefaultTimeout = TimeSpan.FromMinutes(3); DataCacheLocalCacheInvalidationPolicy LocalCacheInvalidationPolicy = DataCacheLocalCacheInvalidationPolicy.TimeoutBased; if (UseLocalCache) { configuration.LocalCacheProperties = new DataCacheLocalCacheProperties( LocalCacheObjectCount, LocalCacheDefaultTimeout, LocalCacheInvalidationPolicy ); // configuration.NotificationProperties = new DataCacheNotificationProperties(500, TimeSpan.FromSeconds(300)); } Initially we tried using a timeout invalidation policy (3mins) and our app felt like it was running faster. HOWEVER, we noticed that if we changed something in the admin site, it was immediatley updated in the live site. As we are using timeouts not notifications, this demonstrates that the local cache isnt being queried (or is, but is always missing). The cache.GetType().Name returns "LocalCache" - so the factory has made a local cache. Running "Get-Cache-Statistics MyCache" in PS on my dev environment (asp.net app running local from vs2008, cache cluster running on a seperate w2k8 machine) show a handful of Request Counts. However, on the Production environment, the Request Count increases dramaticaly. We tried following the method here to se the cache cliebt-server traffic... http://blogs.msdn.com/b/appfabriccat/archive/2010/09/20/appfabric-cache-peeking-into-client-amp-server-wcf-communication.aspx but the log file had nothing but the initial header in it - i.e no loggin either. I cant find anything in SO or Google. Have we done something wrong? Have we got a screwy install of AppFabric - we installed it via WebPlatform Installer - I think? (note: the IIS box running ASp.net isnt in yhe cluster - it is just the client). Any insights greatfully received!

Read the article
How to optimize an database suggestion engine

- by Dimitar Vouldjeff

Hi, I`m making an online engine for item-to-item recommending movies. I have made some researches and I think that the best way to implement that is using pearson correlation and make a table with item1, item2 and correlation fields, but the problem is that after each rate of item I have to regenerate the correlation for in the worst case N records (where N is the number of items). Another think that I read is the following article, but I haven`t thought a way to implement it. So what is your suggestion to optimize this process? Or any other suggestions? Thanks.

Read the article
Javascriptlibrary more efficient than Rickshaw for realtime visualizations

- by dan kutz

I want to visualize data as time-series graphs on mobile devices(tablets) and therefore stumbled upon rickshaw, which is based on D3. First I must say I was a little bit confused when I realized that realtime in web design is defined totally different to realtime in engineering which has fixed(and often very short) timeframes. Anyway my aim is to visualize the data as fast as possible, and on older tablets visualization with rickshaw is quite slow. Can anybody recommend another library, which may be more efficient in rendering? Or is there no way out and I have to go native? regards Dan.

Read the article
"variable tracking" is eating my compile time!

- by wowus

I have an auto-generated file which looks something like this... static void do_SomeFunc1(void* parameter) { // Do stuff. } // Continues on for another 4000 functions... void dispatch(int id, void* parameter) { switch(id) { case ::SomeClass1::id: return do_SomeFunc1(parameter); case ::SomeClass2::id: return do_SomeFunc2(parameter); // This continues for the next 4000 cases... } } When I build it like this, the build time is enormous. If I inline all the functions automagically into their respective cases using my script, the build time is cut in half. GCC 4.5.0 says ~50% of the build time is being taken up by "variable tracking" when I use -ftime-report. What does this mean and how can I speed compilation while still maintaining the superior cache locality of pulling out the functions from the switch? EDIT: Interestingly enough, the build time has exploded only on debug builds, as per the following profiling information of the whole project (which isn't just the file in question, but still a good metric; the file in question takes the most time to build): Debug: 8 minutes 50 seconds Release: 4 minutes, 25 seconds

Read the article
Data Access from single table in sql server 2005 is too slow

- by Muhammad Kashif Nadeem

Following is the script of table. Accessing data from this table is too slow. SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[Emails]( [id] [int] IDENTITY(1,1) NOT NULL, [datecreated] [datetime] NULL CONSTRAINT [DF_Emails_datecreated] DEFAULT (getdate()), [UID] [nvarchar](250) COLLATE Latin1_General_CI_AS NULL, [From] [nvarchar](100) COLLATE Latin1_General_CI_AS NULL, [To] [nvarchar](100) COLLATE Latin1_General_CI_AS NULL, [Subject] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL, [Body] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL, [HTML] [nvarchar](max) COLLATE Latin1_General_CI_AS NULL, [AttachmentCount] [int] NULL, [Dated] [datetime] NULL ) ON [PRIMARY] Following query takes 50 seconds to fetch data. select id, datecreated, UID, [From], [To], Subject, AttachmentCount, Dated from emails If I include Body and Html in select then time is event worse. indexes are on: id unique clustered From Non unique non clustered To Non unique non clustered Tabls has currently 180000+ records. There might be 100,000 records each month so this will become more slow as time will pass. Does splitting data into two table will solve the problem? What other indexes should be there?

Read the article
most efficient method of turning multiple 1D arrays into columns of a 2D array

- by Ty W

As I was writing a for loop earlier today, I thought that there must be a neater way of doing this... so I figured I'd ask. I looked briefly for a duplicate question but didn't see anything obvious. The Problem: Given N arrays of length M, turn them into a M-row by N-column 2D array Example: $id = [1,5,2,8,6] $name = [a,b,c,d,e] $result = [[1,a], [5,b], [2,c], [8,d], [6,e]] My Solution: Pretty straight forward and probably not optimal, but it does work: <?php // $row is returned from a DB query // $row['<var>'] is a comma separated string of values $categories = array(); $ids = explode(",", $row['ids']); $names = explode(",", $row['names']); $titles = explode(",", $row['titles']); for($i = 0; $i < count($ids); $i++) { $categories[] = array("id" => $ids[$i], "name" => $names[$i], "title" => $titles[$i]); } ?> note: I didn't put the name = value bit in the spec, but it'd be awesome if there was some way to keep that as well.

Read the article
Database structure, optimizing table with lots of rows

- by aepheus

I have a table with a lot of rows. The structure is something like this: UserID | ItemID | Item Data... Would I see any gains in query time if I separated this into a table per user, or per smaller group of users? Queries are always single user getting/modifying a selection of items.

Read the article
Is this implementation truely tail-recursive?

- by CFP

Hello everyone! I've come up with the following code to compute in a tail-recursive way the result of an expression such as 3 4 * 1 + cos 8 * (aka 8*cos(1+(3*4))) The code is in OCaml. I'm using a list refto emulate a stack. type token = Num of float | Fun of (float->float) | Op of (float->float->float);; let pop l = let top = (List.hd !l) in l := List.tl (!l); top;; let push x l = l := (x::!l);; let empty l = (l = []);; let pile = ref [];; let eval data = let stack = ref data in let rec _eval cont = match (pop stack) with | Num(n) -> cont n; | Fun(f) -> _eval (fun x -> cont (f x)); | Op(op) -> _eval (fun x -> cont (op x (_eval (fun y->y)))); in _eval (fun x->x) ;; eval [Fun(fun x -> x**2.); Op(fun x y -> x+.y); Num(1.); Num(3.)];; I've used continuations to ensure tail-recursion, but since my stack implements some sort of a tree, and therefore provides quite a bad interface to what should be handled as a disjoint union type, the call to my function to evaluate the left branch with an identity continuation somehow irks a little. Yet it's working perfectly, but I have the feeling than in calling the _eval (fun y->y) bit, there must be something wrong happening, since it doesn't seem that this call can replace the previous one in the stack structure... Am I misunderstanding something here? I mean, I understand that with only the first call to _eval there wouldn't be any problem optimizing the calls, but here it seems to me that evaluation the _eval (fun y->y) will require to be stacked up, and therefore will fill the stack, possibly leading to an overflow... Thanks!

Read the article
when is java faster than c++ (or when is JIT faster then precompiled)?

- by kostja

I have heard that under certain circumstances, Java programs or rather parts of java programs are able to be executed faster than the "same" code in C++ (or other precompiled code) due to JIT optimizations. This is due to the compiler being able to determine the scope of some variables, avoid some conditionals and pull similar tricks at runtime. Could you give an (or better - some) example, where this applies? And maybe outline the exact conditions under which the compiler is able to optimize the bytecode beyond what is possible with precompiled code? NOTE : This question is not about comparing Java to C++. Its about the possibilities of JIT compiling. Please no flaming. I am also not aware of any duplicates. Please point them out if you are.

Read the article
Faster way to split a string and count characters using R?

- by chrisamiller

I'm looking for a faster way to calculate GC content for DNA strings read in from a FASTA file. This boils down to taking a string and counting the number of times that the letter 'G' or 'C' appears. I also want to specify the range of characters to consider. I have a working function that is fairly slow, and it's causing a bottleneck in my code. It looks like this: ## ## count the number of GCs in the characters between start and stop ## gcCount <- function(line, st, sp){ chars = strsplit(as.character(line),"")[[1]] numGC = 0 for(j in st:sp){ ##nested ifs faster than an OR (|) construction if(chars[[j]] == "g"){ numGC <- numGC + 1 }else if(chars[[j]] == "G"){ numGC <- numGC + 1 }else if(chars[[j]] == "c"){ numGC <- numGC + 1 }else if(chars[[j]] == "C"){ numGC <- numGC + 1 } } return(numGC) } Running Rprof gives me the following output: > a = "GCCCAAAATTTTCCGGatttaagcagacataaattcgagg" > Rprof(filename="Rprof.out") > for(i in 1:500000){gcCount(a,1,40)}; > Rprof(NULL) > summaryRprof(filename="Rprof.out") self.time self.pct total.time total.pct "gcCount" 77.36 76.8 100.74 100.0 "==" 18.30 18.2 18.30 18.2 "strsplit" 3.58 3.6 3.64 3.6 "+" 1.14 1.1 1.14 1.1 ":" 0.30 0.3 0.30 0.3 "as.logical" 0.04 0.0 0.04 0.0 "as.character" 0.02 0.0 0.02 0.0 $by.total total.time total.pct self.time self.pct "gcCount" 100.74 100.0 77.36 76.8 "==" 18.30 18.2 18.30 18.2 "strsplit" 3.64 3.6 3.58 3.6 "+" 1.14 1.1 1.14 1.1 ":" 0.30 0.3 0.30 0.3 "as.logical" 0.04 0.0 0.04 0.0 "as.character" 0.02 0.0 0.02 0.0 $sampling.time [1] 100.74 Any advice for making this code faster?

Read the article
Does the order of columns in a query matter?

- by James Simpson

When selecting columns from a MySQL table, is performance affected by the order that you select the columns as compared to their order in the table (not considering indexes that may cover the columns)? For example, you have a table with rows uid, name, bday, and you have the following query. SELECT uid, name, bday FROM table Does MySQL see the following query any differently and thus cause any sort of performance hit? SELECT uid, bday, name FROM table

Read the article
Filtering with joined tables

- by viraptor

I'm trying to get some query performance improved, but the generated query does not look the way I expect it to. The results are retrieved using: query = session.query(SomeModel). options(joinedload_all('foo.bar')). options(joinedload_all('foo.baz')). options(joinedload('quux.other')) What I want to do is filter on the table joined via 'first', but this way doesn't work: query = query.filter(FooModel.address == '1.2.3.4') It results in a clause like this attached to the query: WHERE foos.address = '1.2.3.4' Which doesn't do the filtering in a proper way, since the generated joins attach tables foos_1 and foos_2. If I try that query manually but change the filtering clause to: WHERE foos_1.address = '1.2.3.4' AND foos_2.address = '1.2.3.4' It works fine. The question is of course - how can I achieve this with sqlalchemy itself?

Read the article
Script Speed vs Memory Usage

- by Doug Neiner

I am working on an image generation script in PHP and have gotten it working two ways. One way is slow but uses a limited amount of memory, the second is much faster, but uses 6x the memory . There is no leakage in either script (as far as I can tell). In a limited benchmark, here is how they performed: -------------------------------------------- METHOD | TOTAL TIME | PEAK MEMORY | IMAGES -------------------------------------------- One | 65.626 | 540,036 | 200 Two | 20.207 | 3,269,600 | 200 -------------------------------------------- And here is the average of the previous numbers (if you don't want to do your own math): -------------------------------------------- METHOD | TOTAL TIME | PEAK MEMORY | IMAGES -------------------------------------------- One | 0.328 | 540,036 | 1 Two | 0.101 | 3,269,600 | 1 -------------------------------------------- Which method should I use and why? I anticipate this being used by a high volume of users, with each user making 10-20 requests to this script during a normal visit. I am leaning toward the faster method because though it uses more memory, it is for a 1/3 of the time and would reduce the number of concurrent requests.

Read the article
Optimizing Vector elements swaps using CUDA

- by Orion Nebula

Hi all, Since I am new to cuda .. I need your kind help I have this long vector, for each group of 24 elements, I need to do the following: for the first 12 elements, the even numbered elements are multiplied by -1, for the second 12 elements, the odd numbered elements are multiplied by -1 then the following swap takes place: Graph: because I don't yet have enough points, I couldn't post the image so here it is: http://www.freeimagehosting.net/image.php?e4b88fb666.png I have written this piece of code, and wonder if you could help me further optimize it to solve for divergence or bank conflicts .. //subvector is a multiple of 24, Mds and Nds are shared memory _shared_ double Mds[subVector]; _shared_ double Nds[subVector]; int tx = threadIdx.x; int tx_mod = tx ^ 0x0001; int basex = __umul24(blockDim.x, blockIdx.x); Mds[tx] = M.elements[basex + tx]; __syncthreads(); // flip the signs if (tx < (tx/24)*24 + 12) { //if < 12 and even if ((tx & 0x0001)==0) Mds[tx] = -Mds[tx]; } else if (tx < (tx/24)*24 + 24) { //if >12 and < 24 and odd if ((tx & 0x0001)==1) Mds[tx] = -Mds[tx]; } __syncthreads(); if (tx < (tx/24)*24 + 6) { //for the first 6 elements .. swap with last six in the 24elements group (see graph) Nds[tx] = Mds[tx_mod + 18]; Mds [tx_mod + 18] = Mds [tx]; Mds[tx] = Nds[tx]; } else if (tx < (tx/24)*24 + 12) { // for the second 6 elements .. swp with next adjacent group (see graph) Nds[tx] = Mds[tx_mod + 6]; Mds [tx_mod + 6] = Mds [tx]; Mds[tx] = Nds[tx]; } __syncthreads(); Thanks in advance ..

Read the article
Oracle Sql Query taking a day long to return results using dblink

- by Suresh S

Guys i have the following oracle sql query that gives me the monthwise report between the dates. Basically for nov month i want sum of values between the dates 01nov to 30 nov. The table tha is being queried is residing in another database and accesssed using dblink. The DT columns is of NUMBER type (for ex 20101201) .The execution of the query is taking a day long and not completed. kindly suggest me , if their is any optimisation that can be suggested to my DBA on the dblink, or any tuning that can be done on the query , or rewriting the same. SELECT /*+ PARALLEL (A 8) */ TO_CHAR(TRUNC(TRUNC(SYSDATE,'MM')- 1,'MM'),'MONYYYY') "MONTH", TYPE AS "TYPE", COLUMN, COUNT (DISTINCT A) AS "A_COUNT", COUNT (COLUMN) AS NO_OF_COLS, SUM (DURATION) AS "SUM_DURATION", SUM (COST) AS "COST" FROM **A@LN_PROD A** WHERE DT >=TO_NUMBER(TO_CHAR(TRUNC(TRUNC(SYSDATE,'MM')-1,'MM'),'YYYYMMDD')) AND DT < TO_NUMBER(TO_CHAR(TRUNC(TRUNC(SYSDATE,'MM'),'MM'),'YYYYMMDD')) GROUP BY TYPE, COLUMN

Read the article
Splitting tables by field to optimize MySQL?

- by AK

Do splitting fields into multiple tables ever yield faster queries? Consider the following two scenarios: Table1 ----------- int PersonID text Value1 float Value2 or Table1 ----------- int PersonID text Value1 Table2 ----------- int PersonID float Value2 If Value1 and Value2 are always being displayed together, I imagine Table1 is always faster because the second schema would require two SELECT statements. But are there any situations where you would choose the second? If the number of records were expected to be really large?

Read the article
Optimizing an embedded SELECT query in mySQL

- by Crazy Serb

Ok, here's a query that I am running right now on a table that has 45,000 records and is 65MB in size... and is just about to get bigger and bigger (so I gotta think of the future performance as well here): SELECT count(payment_id) as signup_count, sum(amount) as signup_amount FROM payments p WHERE tm_completed BETWEEN '2009-05-01' AND '2009-05-30' AND completed > 0 AND tm_completed IS NOT NULL AND member_id NOT IN (SELECT p2.member_id FROM payments p2 WHERE p2.completed=1 AND p2.tm_completed < '2009-05-01' AND p2.tm_completed IS NOT NULL GROUP BY p2.member_id) And as you might or might not imagine - it chokes the mysql server to a standstill... What it does is - it simply pulls the number of new users who signed up, have at least one "completed" payment, tm_completed is not empty (as it is only populated for completed payments), and (the embedded Select) that member has never had a "completed" payment before - meaning he's a new member (just because the system does rebills and whatnot, and this is the only way to sort of differentiate between an existing member who just got rebilled and a new member who got billed for the first time). Now, is there any possible way to optimize this query to use less resources or something, and to stop taking my mysql resources down on their knees...? Am I missing any info to clarify this any further? Let me know... EDIT: Here are the indexes already on that table: PRIMARY PRIMARY 46757 payment_id member_id INDEX 23378 member_id payer_id INDEX 11689 payer_id coupon_id INDEX 1 coupon_id tm_added INDEX 46757 tm_added, product_id tm_completed INDEX 46757 tm_completed, product_id

Read the article
Efficient implementation of natural logarithm (ln) and exponentiation

- by Donotalo

Basically, I'm looking for implementation of log() and exp() functions provided in C library <math.h>. I'm working with 8 bit microcontrollers (OKI 411 and 431). I need to calculate Mean Kinetic Temperature. The requirement is that we should be able to calculate MKT as fast as possible and with as little code memory as possible. The compiler comes with log() and exp() functions in <math.h>. But calling either function and linking with the library causes the code size to increase by 5 Kilobytes, which will not fit in one of the micro we work with (OKI 411), because our code already consumed ~12K of available ~15K code memory. The implementation I'm looking for should not use any other C library functions (like pow(), sqrt() etc). This is because all library functions are packed in one library and even if one function is called, the linker will bring whole 5K library to code memory.

Read the article
"Anagram solver" based on statistics rather than a dictionary/table?

- by James M.

My problem is conceptually similar to solving anagrams, except I can't just use a dictionary lookup. I am trying to find plausible words rather than real words. I have created an N-gram model (for now, N=2) based on the letters in a bunch of text. Now, given a random sequence of letters, I would like to permute them into the most likely sequence according to the transition probabilities. I thought I would need the Viterbi algorithm when I started this, but as I look deeper, the Viterbi algorithm optimizes a sequence of hidden random variables based on the observed output. I am trying to optimize the output sequence. Is there a well-known algorithm for this that I can read about? Or am I on the right track with Viterbi and I'm just not seeing how to apply it?

Read the article

< Previous Page | 88 89 90 91 92 93 94 95 96 97 98 99 | Next Page >