Search Results

Search found 3481 results on 140 pages for 'convex optimization'.

Page 7/140 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14  | Next Page >

  • Execution Plan Optimization when where clause is removed then added back

    - by nmushov
    I have a stored procedure that uses a table valued function which executes in 9 seconds. If I alter the table valued function and remove the where clause, the stored procedure executes in 3 seconds. If I add the where clause back, the query still executes in 3 seconds. I took a look at the execution plans and it appears that after I remove the where clause, the execution plan includes parallelism and the scan count for 2 of my tables drops for 50000 and 65000 down to 5 and 3. After I add the where clause back, the optimized execution plan still runs unless I run DBCC FREEPROCCACHE. Questions 1. Why would SQL Server start using the optimized execution plan for both queries only when I first remove the where clause? Is there a way to force SQL Server to use this execution plan? Also, this is a paramaterized all-in-one query that uses the (Parameter is null or Parameter) in the where clause, which I believe is bad for performance. RETURNS TABLE AS RETURN ( SELECT TOP (@PageNumber * @PageSize) CASE WHEN @SortOrder = 'Expensive' THEN ROW_NUMBER() OVER (ORDER BY SellingPrice DESC) WHEN @SortOrder = 'Inexpensive' THEN ROW_NUMBER() OVER (ORDER BY SellingPrice ASC) WHEN @SortOrder = 'LowMiles' THEN ROW_NUMBER() OVER (ORDER BY Mileage ASC) WHEN @SortOrder = 'HighMiles' THEN ROW_NUMBER() OVER (ORDER BY Mileage DESC) WHEN @SortOrder = 'Closest' THEN ROW_NUMBER() OVER (ORDER BY P1.Distance ASC) WHEN @SortOrder = 'Newest' THEN ROW_NUMBER() OVER (ORDER BY [Year] DESC) WHEN @SortOrder = 'Oldest' THEN ROW_NUMBER() OVER (ORDER BY [Year] ASC) ELSE ROW_NUMBER() OVER (ORDER BY InventoryID ASC) END as rn, P1.InventoryID, P1.SellingPrice, P1.Distance, P1.Mileage, Count(*) OVER () RESULT_COUNT, dimCarStatus.[year] FROM (SELECT InventoryID, SellingPrice, Zip.Distance, Mileage, ColorKey, CarStatusKey, CarKey FROM facInventory JOIN @ZipCodes Zip ON Zip.DealerKey = facInventory.DealerKey) as P1 JOIN dimColor ON dimColor.ColorKey = P1.ColorKey JOIN dimCarStatus ON dimCarStatus.CarStatusKey = P1.CarStatusKey JOIN dimCar ON dimCar.CarKey = P1.CarKey WHERE (@ExteriorColor is NULL OR dimColor.ExteriorColor like @ExteriorColor) AND (@InteriorColor is NULL OR dimColor.InteriorColor like @InteriorColor) AND (@Condition is NULL OR dimCarStatus.Condition like @Condition) AND (@Year is NULL OR dimCarStatus.[Year] like @Year) AND (@Certified is NULL OR dimCarStatus.Certified like @Certified) AND (@Make is NULL OR dimCar.Make like @Make) AND (@ModelCategory is NULL OR dimCar.ModelCategory like @ModelCategory) AND (@Model is NULL OR dimCar.Model like @Model) AND (@Trim is NULL OR dimCar.Trim like @Trim) AND (@BodyType is NULL OR dimCar.BodyType like @BodyType) AND (@VehicleTypeCode is NULL OR dimCar.VehicleTypeCode like @VehicleTypeCode) AND (@MinPrice is NULL OR P1.SellingPrice >= @MinPrice) AND (@MaxPrice is NULL OR P1.SellingPrice < @MaxPrice) AND (@Mileage is NULL OR P1.Mileage < @Mileage) ORDER BY CASE WHEN @SortOrder = 'Expensive' THEN -SellingPrice WHEN @SortOrder = 'Inexpensive' THEN SellingPrice WHEN @SortOrder = 'LowMiles' THEN Mileage WHEN @SortOrder = 'HighMiles' THEN -Mileage WHEN @SortOrder = 'Closest' THEN P1.Distance WHEN @SortOrder = 'Newest' THEN -[YEAR] WHEN @SortOrder = 'Oldest' THEN [YEAR] ELSE InventoryID END )

    Read the article

  • Query with many CASE statements - optimization

    - by Nemanja Vujacic
    Hi guys, I have one very dirty query that per sure can be optimized because there are so many CASE statements in it! SELECT (CASE pa.KplusTable_Id WHEN 1 THEN sp.sp_id WHEN 2 THEN fw.fw_id WHEN 3 THEN s.sw_Id WHEN 4 THEN id.ia_id END) as Deal_Id, max(CASE pa.KplusTable_Id WHEN 1 THEN sp.Trans_Id WHEN 2 THEN fw.Trans_Id WHEN 3 THEN s.Trans_Id WHEN 4 THEN id.Trans_Id END) as TransId_CurrentMax INTO #MaxRazlicitOdNull FROM #PotencijalniAktuelni pa LEFT JOIN kplus_sp sp (nolock) on sp.sp_id=pa.Deal_Id AND pa.KplusTable_Id=1 LEFT JOIN kplus_fw fw (nolock) on fw.fw_id=pa.Deal_Id AND pa.KplusTable_Id=2 LEFT JOIN dev_sw s (nolock) on s.sw_Id=pa.Deal_Id AND pa.KplusTable_Id=3 LEFT JOIN kplus_ia id (nolock) on id.ia_id=pa.Deal_Id AND pa.KplusTable_Id=4 WHERE isnull(CASE pa.KplusTable_Id WHEN 1 THEN sp.BROJ_TIKETA WHEN 2 THEN fw.BROJ_TIKETA WHEN 3 THEN s.tiket WHEN 4 THEN id.BROJ_TIKETA END, '')<>'' GROUP BY CASE pa.KplusTable_Id WHEN 1 THEN sp.sp_id WHEN 2 THEN fw.fw_id WHEN 3 THEN s.sw_Id WHEN 4 THEN id.ia_id END Because I have same condition couple times, do you have idea how to optimize query, make it simpler and better. All suggestions are welcome! TnX in advance! Nemanja

    Read the article

  • C# 'is' type check on struct - odd .NET 4.0 x86 optimization behavior

    - by Jacob Stanley
    Since upgrading to VS2010 I'm getting some very strange behavior with the 'is' keyword. The program below (test.cs) outputs True when compiled in debug mode (for x86) and False when compiled with optimizations on (for x86). Compiling all combinations in x64 or AnyCPU gives the expected result, True. All combinations of compiling under .NET 3.5 give the expected result, True. I'm using the batch file below (runtest.bat) to compile and test the code using various combinations of compiler .NET framework. Has anyone else seen these kind of problems under .NET 4.0? Does everyone else see the same behavior as me on their computer when running runtests.bat? #@$@#$?? Is there a fix for this? test.cs using System; public class Program { public static bool IsGuid(object item) { return item is Guid; } public static void Main() { Console.Write(IsGuid(Guid.NewGuid())); } } runtest.bat @echo off rem Usage: rem runtest -- runs with csc.exe x86 .NET 4.0 rem runtest 64 -- runs with csc.exe x64 .NET 4.0 rem runtest v3.5 -- runs with csc.exe x86 .NET 3.5 rem runtest v3.5 64 -- runs with csc.exe x64 .NET 3.5 set version=v4.0.30319 set platform=Framework for %%a in (%*) do ( if "%%a" == "64" (set platform=Framework64) if "%%a" == "v3.5" (set version=v3.5) ) echo Compiler: %platform%\%version%\csc.exe set csc="C:\Windows\Microsoft.NET\%platform%\%version%\csc.exe" set make=%csc% /nologo /nowarn:1607 test.cs rem CS1607: Referenced assembly targets a different processor rem This happens if you compile for x64 using csc32, or x86 using csc64 %make% /platform:x86 test.exe echo =^> x86 %make% /platform:x86 /optimize test.exe echo =^> x86 (Optimized) %make% /platform:x86 /debug test.exe echo =^> x86 (Debug) %make% /platform:x86 /debug /optimize test.exe echo =^> x86 (Debug + Optimized) %make% /platform:x64 test.exe echo =^> x64 %make% /platform:x64 /optimize test.exe echo =^> x64 (Optimized) %make% /platform:x64 /debug test.exe echo =^> x64 (Debug) %make% /platform:x64 /debug /optimize test.exe echo =^> x64 (Debug + Optimized) %make% /platform:AnyCPU test.exe echo =^> AnyCPU %make% /platform:AnyCPU /optimize test.exe echo =^> AnyCPU (Optimized) %make% /platform:AnyCPU /debug test.exe echo =^> AnyCPU (Debug) %make% /platform:AnyCPU /debug /optimize test.exe echo =^> AnyCPU (Debug + Optimized) Test Results When running the runtest.bat I get the following results on my Win7 x64 install. > runtest 32 v4.0 Compiler: Framework\v4.0.30319\csc.exe False => x86 False => x86 (Optimized) True => x86 (Debug) False => x86 (Debug + Optimized) True => x64 True => x64 (Optimized) True => x64 (Debug) True => x64 (Debug + Optimized) True => AnyCPU True => AnyCPU (Optimized) True => AnyCPU (Debug) True => AnyCPU (Debug + Optimized) > runtest 64 v4.0 Compiler: Framework64\v4.0.30319\csc.exe False => x86 False => x86 (Optimized) True => x86 (Debug) False => x86 (Debug + Optimized) True => x64 True => x64 (Optimized) True => x64 (Debug) True => x64 (Debug + Optimized) True => AnyCPU True => AnyCPU (Optimized) True => AnyCPU (Debug) True => AnyCPU (Debug + Optimized) > runtest 32 v3.5 Compiler: Framework\v3.5\csc.exe True => x86 True => x86 (Optimized) True => x86 (Debug) True => x86 (Debug + Optimized) True => x64 True => x64 (Optimized) True => x64 (Debug) True => x64 (Debug + Optimized) True => AnyCPU True => AnyCPU (Optimized) True => AnyCPU (Debug) True => AnyCPU (Debug + Optimized) > runtest 64 v3.5 Compiler: Framework64\v3.5\csc.exe True => x86 True => x86 (Optimized) True => x86 (Debug) True => x86 (Debug + Optimized) True => x64 True => x64 (Optimized) True => x64 (Debug) True => x64 (Debug + Optimized) True => AnyCPU True => AnyCPU (Optimized) True => AnyCPU (Debug) True => AnyCPU (Debug + Optimized) tl;dr

    Read the article

  • compile time if && return string reference optimization

    - by Truncheon
    Hi. I'm writing a series classes that inherit from a base class using virtual. They are INT, FLOAT and STRING objects that I want to use in a scripting language. I'm trying to implement weak typing, but I don't want STRING objects to return copies of themselves when used in the following way (instead I would prefer to have a reference returned which can be used in copying): a = "hello "; b = "world"; c = a + b; I have written the following code as a mock example: #include <iostream> #include <string> #include <cstdio> #include <cstdlib> std::string dummy("<int object cannot return string reference>"); struct BaseImpl { virtual bool is_string() = 0; virtual int get_int() = 0; virtual std::string get_string_copy() = 0; virtual std::string const& get_string_ref() = 0; }; struct INT : BaseImpl { int value; INT(int i = 0) : value(i) { std::cout << "constructor called\n"; } INT(BaseImpl& that) : value(that.get_int()) { std::cout << "copy constructor called\n"; } bool is_string() { return false; } int get_int() { return value; } std::string get_string_copy() { char buf[33]; sprintf(buf, "%i", value); return buf; } std::string const& get_string_ref() { return dummy; } }; struct STRING : BaseImpl { std::string value; STRING(std::string s = "") : value(s) { std::cout << "constructor called\n"; } STRING(BaseImpl& that) { if (that.is_string()) value = that.get_string_ref(); else value = that.get_string_copy(); std::cout << "copy constructor called\n"; } bool is_string() { return true; } int get_int() { return atoi(value.c_str()); } std::string get_string_copy() { return value; } std::string const& get_string_ref() { return value; } }; struct Base { BaseImpl* impl; Base(BaseImpl* p = 0) : impl(p) {} ~Base() { delete impl; } }; int main() { Base b1(new INT(1)); Base b2(new STRING("Hello world")); Base b3(new INT(*b1.impl)); Base b4(new STRING(*b2.impl)); std::cout << "\n"; std::cout << b1.impl->get_int() << "\n"; std::cout << b2.impl->get_int() << "\n"; std::cout << b3.impl->get_int() << "\n"; std::cout << b4.impl->get_int() << "\n"; std::cout << "\n"; std::cout << b1.impl->get_string_ref() << "\n"; std::cout << b2.impl->get_string_ref() << "\n"; std::cout << b3.impl->get_string_ref() << "\n"; std::cout << b4.impl->get_string_ref() << "\n"; std::cout << "\n"; std::cout << b1.impl->get_string_copy() << "\n"; std::cout << b2.impl->get_string_copy() << "\n"; std::cout << b3.impl->get_string_copy() << "\n"; std::cout << b4.impl->get_string_copy() << "\n"; return 0; } It was necessary to add an if check in the STRING class to determine whether its safe to request a reference instead of a copy: Script code: a = "test"; b = a; c = 1; d = "" + c; /* not safe to request reference by standard */ C++ code: STRING(BaseImpl& that) { if (that.is_string()) value = that.get_string_ref(); else value = that.get_string_copy(); std::cout << "copy constructor called\n"; } If was hoping there's a way of moving that if check into compile time, rather than run time.

    Read the article

  • Optimization t-sql query

    - by phenevo
    Hi, I'm newbie in t-sql, and I wonder why this query executes so long ? Is there any way to optimize this ?? update aggregateflags set value=@value where objecttype=@objecttype and objectcode=@objectcode and storagetype=@storagetype and value != 2 and type=@type IF @@ROWCOUNT=0 Select * from aggregateflags where objecttype=@objecttype and objectcode=@objectcode and storagetype=@storagetype and value = 2 and type=@type IF @@ROWCOUNT=0 insert into aggregateflags (objectcode,objecttype,value,type,storagetype) select @objectcode,@objecttype,@value,@type,@storagetype @value int @storagetype int @type int @objectcode nvarchar(100) @objecttype int There is not foreign key.

    Read the article

  • MySQL query optimization - distinct, order by and limit

    - by Manuel Darveau
    I am trying to optimize the following query: select distinct this_.id as y0_ from Rental this_ left outer join RentalRequest rentalrequ1_ on this_.id=rentalrequ1_.rental_id left outer join RentalSegment rentalsegm2_ on rentalrequ1_.id=rentalsegm2_.rentalRequest_id where this_.DTYPE='B' and this_.id<=1848978 and this_.billingStatus=1 and rentalsegm2_.endDate between 1273631699529 and 1274927699529 order by rentalsegm2_.id asc limit 0, 100; This query is done multiple time in a row for paginated processing of records (with a different limit each time). It returns the ids I need in the processing. My problem is that this query take more than 3 seconds. I have about 2 million rows in each of the three tables. Explain gives: +----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+ | 1 | SIMPLE | rentalsegm2_ | range | index_endDate,fk_rentalRequest_id_BikeRentalSegment | index_endDate | 9 | NULL | 449904 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | rentalrequ1_ | eq_ref | PRIMARY,fk_rental_id_BikeRentalRequest | PRIMARY | 8 | solscsm_main.rentalsegm2_.rentalRequest_id | 1 | Using where | | 1 | SIMPLE | this_ | eq_ref | PRIMARY,index_billingStatus | PRIMARY | 8 | solscsm_main.rentalrequ1_.rental_id | 1 | Using where | +----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+----------------------------------------------+ I tried to remove the distinct and the query ran three times faster. explain without the query gives: +----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+ | 1 | SIMPLE | rentalsegm2_ | range | index_endDate,fk_rentalRequest_id_BikeRentalSegment | index_endDate | 9 | NULL | 451972 | Using where; Using filesort | | 1 | SIMPLE | rentalrequ1_ | eq_ref | PRIMARY,fk_rental_id_BikeRentalRequest | PRIMARY | 8 | solscsm_main.rentalsegm2_.rentalRequest_id | 1 | Using where | | 1 | SIMPLE | this_ | eq_ref | PRIMARY,index_billingStatus | PRIMARY | 8 | solscsm_main.rentalrequ1_.rental_id | 1 | Using where | +----+-------------+--------------+--------+-----------------------------------------------------+---------------+---------+--------------------------------------------+--------+-----------------------------+ As you can see, the Using temporary is added when using distinct. I already have an index on all fields used in the where clause. Is there anything I can do to optimize this query? Thank you very much!

    Read the article

  • Python optimization problem?

    - by user342079
    Alright, i had this homework recently (don't worry, i've already done it, but in c++) but I got curious how i could do it in python. The problem is about 2 light sources that emit light. I won't get into details tho. Here's the code (that I've managed to optimize a bit in the latter part): import math, array import numpy as np from PIL import Image size = (800,800) width, height = size s1x = width * 1./8 s1y = height * 1./8 s2x = width * 7./8 s2y = height * 7./8 r,g,b = (255,255,255) arr = np.zeros((width,height,3)) hy = math.hypot print 'computing distances (%s by %s)'%size, for i in xrange(width): if i%(width/10)==0: print i, if i%20==0: print '.', for j in xrange(height): d1 = hy(i-s1x,j-s1y) d2 = hy(i-s2x,j-s2y) arr[i][j] = abs(d1-d2) print '' arr2 = np.zeros((width,height,3),dtype="uint8") for ld in [200,116,100,84,68,52,36,20,8,4,2]: print 'now computing image for ld = '+str(ld) arr2 *= 0 arr2 += abs(arr%ld-ld/2)*(r,g,b)/(ld/2) print 'saving image...' ar2img = Image.fromarray(arr2) ar2img.save('ld'+str(ld).rjust(4,'0')+'.png') print 'saved as ld'+str(ld).rjust(4,'0')+'.png' I have managed to optimize most of it, but there's still a huge performance gap in the part with the 2 for-s, and I can't seem to think of a way to bypass that using common array operations... I'm open to suggestions :D

    Read the article

  • SQL optimization: deletes taking a long time

    - by Will
    I have an Oracle SQL query as part of a stored proc: DELETE FROM item i WHERE NOT EXISTS (SELECT 1 FROM item_queue q WHERE q.n=i.n) AND NOT EXISTS (SELECT 1 FROM tool_queue t WHERE t.n=i.n); A bit about the tables: item contains about 10k rows with an index on the n column item_queue contains about 1mil rows also with index on n column tool_queue contains about 5mil rows indexed as well I am wondering if the query/subqueries can be optimized somehow to make them run faster, I thought that deletes were generally fairly fast

    Read the article

  • MySQL Optimization 20 gig table

    - by user169743
    I have a 20 gig table that has a large amount of inserts and updates daily. This table is also frequently searched. I'd like to know if the MySQL indices can become fragmented and perhaps need to be rebuilt or something similar. I'm finding it difficult to figure out which of the CHECK TABLE, REPAIR TABLE or something similar? Any guidance appreciated, I'm a db newb.

    Read the article

  • Rails/mysql SUM distinct records - optimization

    - by pepernik
    Hey. How would you optimize this SQL SELECT SUM(tmp.cost) FROM ( SELECT DISTINCT clients.id as client, countries.credits_cost AS cost FROM countries INNER JOIN clients ON clients.country_id = countries.id INNER JOIN clients_groups ON clients_groups.client_id=clients.id WHERE clients_groups.group_id IN (1,2,3,4,5,6,7,8,9) GROUP BY clients.id ) AS tmp; I'm using this example as part of my Ruby on Rails project. Note that my nested SQL (tmp) can have more then 10 milion records. You can split that in more SQLs if the performance is better. Should I add any indexes to make it quicker (i have it on IDs)?

    Read the article

  • Haskell optimization of the following function

    - by me2
    Profiling of some code of mine showed that about 65% of the time I was running the following code. What it does is use the Data.Binary.Get monad to walk through a bytestring looking for the terminator. If it detects 0xff, it checks if the next byte is 0x00. If it is, it drops the 0x00 and continues. If it is not 0x00, then it drops both bytes and the resulting list of bytes is converted to a bytestring and returned. Any obvious ways to optimize this code? I can't see it. parseECS = f [] False where f acc ff = do b <- getWord8 if ff then if b == 0x00 then f (0xff:acc) False else return $ L.pack (reverse acc) else if b == 0xff then f acc True else f (b:acc) False

    Read the article

  • mysql query optimization

    - by vamsivanka
    I would need some help on how to optimize the query. select * from transaction where id < 7500001 order by id desc limit 16 when i do an explain plan on this - the type is "range" and rows is "7500000" According to the some online reference's this is explained as, it took the query 7,500,000 rows to scan and get the data. Is there any way i can optimize so it uses less rows to scan and get the data. Also, id is the primary key column.

    Read the article

  • Odd optimization problem under MSVC

    - by Goz
    I've seen this blog: http://igoro.com/archive/gallery-of-processor-cache-effects/ The "weirdness" in part 7 is what caught my interest. My first thought was "Thats just C# being weird". Its not I wrote the following C++ code. volatile int* p = (volatile int*)_aligned_malloc( sizeof( int ) * 8, 64 ); memset( (void*)p, 0, sizeof( int ) * 8 ); double dStart = t.GetTime(); for (int i = 0; i < 200000000; i++) { //p[0]++;p[1]++;p[2]++;p[3]++; // Option 1 //p[0]++;p[2]++;p[4]++;p[6]++; // Option 2 p[0]++;p[2]++; // Option 3 } double dTime = t.GetTime() - dStart; The timing I get on my 2.4 Ghz Core 2 Quad go as follows: Option 1 = ~8 cycles per loop. Option 2 = ~4 cycles per loop. Option 3 = ~6 cycles per loop. Now This is confusing. My reasoning behind the difference comes down to the cache write latency (3 cycles) on my chip and an assumption that the cache has a 128-bit write port (This is pure guess work on my part). On that basis in Option 1: It will increment p[0] (1 cycle) then increment p[2] (1 cycle) then it has to wait 1 cycle (for cache) then p[1] (1 cycle) then wait 1 cycle (for cache) then p[3] (1 cycle). Finally 2 cycles for increment and jump (Though its usually implemented as decrement and jump). This gives a total of 8 cycles. In Option 2: It can increment p[0] and p[4] in one cycle then increment p[2] and p[6] in another cycle. Then 2 cycles for subtract and jump. No waits needed on cache. Total 4 cycles. In option 3: It can increment p[0] then has to wait 2 cycles then increment p[2] then subtract and jump. The problem is if you set case 3 to increment p[0] and p[4] it STILL takes 6 cycles (which kinda blows my 128-bit read/write port out of the water). So ... can anyone tell me what the hell is going on here? Why DOES case 3 take longer? Also I'd love to know what I've got wrong in my thinking above, as i obviously have something wrong! Any ideas would be much appreciated! :) It'd also be interesting to see how GCC or any other compiler copes with it as well! Edit: Jerry Coffin's idea gave me some thoughts. I've done some more tests (on a different machine so forgive the change in timings) with and without nops and with different counts of nops case 2 - 0.46 00401ABD jne (401AB0h) 0 nops - 0.68 00401AB7 jne (401AB0h) 1 nop - 0.61 00401AB8 jne (401AB0h) 2 nops - 0.636 00401AB9 jne (401AB0h) 3 nops - 0.632 00401ABA jne (401AB0h) 4 nops - 0.66 00401ABB jne (401AB0h) 5 nops - 0.52 00401ABC jne (401AB0h) 6 nops - 0.46 00401ABD jne (401AB0h) 7 nops - 0.46 00401ABE jne (401AB0h) 8 nops - 0.46 00401ABF jne (401AB0h) 9 nops - 0.55 00401AC0 jne (401AB0h) I've included the jump statetements so you can see that the source and destination are in one cache line. You can also see that we start to get a difference when we are 13 bytes or more apart. Until we hit 16 ... then it all goes wrong. So Jerry isn't right (though his suggestion DOES help a bit), however something IS going on. I'm more and more intrigued to try and figure out what it is now. It does appear to be more some sort of memory alignment oddity rather than some sort of instruction throughput oddity. Anyone want to explain this for an inquisitive mind? :D Edit 3: Interjay has a point on the unrolling that blows the previous edit out of the water. With an unrolled loop the performance does not improve. You need to add a nop in to make the gap between jump source and destination the same as for my good nop count above. Performance still sucks. Its interesting that I need 6 nops to improve performance though. I wonder how many nops the processor can issue per cycle? If its 3 then that account for the cache write latency ... But, if thats it, why is the latency occurring? Curiouser and curiouser ...

    Read the article

  • Code optimization - Unused methods

    - by Yochai Timmer
    How can I tell if a method will never be used ? I know that for dll files and libraries you can't really know if someone else (another project) will ever use the code. In general I assume that anything public might be used somewhere else. But what about private methods ? Is it safe to assume that if I don't see an explicit call to that method, it won't be used ? I assume that for private methods it's easier to decide. But is it safe to decide it ONLY for private methods ?

    Read the article

  • Image/"most resembling pixel" search optimization?

    - by SigTerm
    The situation: Let's say I have an image A, say, 512x512 pixels, and image B, 5x5 or 7x7 pixels. Both images are 24bit rgb, and B have 1bit alpha mask (so each pixel is either completely transparent or completely solid). I need to find within image A a pixel which (with its' neighbors) most closely resembles image B, OR the pixel that probably most closely resembles image B. Resemblance is calculated as "distance" which is sum of "distances" between non-transparent B's pixels and A's pixels divided by number of non-transparent B's pixels. Here is a sample SDL code for explanation: struct Pixel{ unsigned char b, g, r, a; }; void fillPixel(int x, int y, SDL_Surface* dst, SDL_Surface* src, int dstMaskX, int dstMaskY){ Pixel& dstPix = *((Pixel*)((char*)(dst->pixels) + sizeof(Pixel)*x + dst->pitch*y)); int xMin = x + texWidth - searchWidth; int xMax = xMin + searchWidth*2; int yMin = y + texHeight - searchHeight; int yMax = yMin + searchHeight*2; int numFilled = 0; for (int curY = yMin; curY < yMax; curY++) for (int curX = xMin; curX < xMax; curX++){ Pixel& cur = *((Pixel*)((char*)(dst->pixels) + sizeof(Pixel)*(curX & texMaskX) + dst->pitch*(curY & texMaskY))); if (cur.a != 0) numFilled++; } if (numFilled == 0){ int srcX = rand() % src->w; int srcY = rand() % src->h; dstPix = *((Pixel*)((char*)(src->pixels) + sizeof(Pixel)*srcX + src->pitch*srcY)); dstPix.a = 0xFF; return; } int storedSrcX = rand() % src->w; int storedSrcY = rand() % src->h; float lastDifference = 3.40282347e+37F; //unsigned char mask = for (int srcY = searchHeight; srcY < (src->h - searchHeight); srcY++) for (int srcX = searchWidth; srcX < (src->w - searchWidth); srcX++){ float curDifference = 0; int numPixels = 0; for (int tmpY = -searchHeight; tmpY < searchHeight; tmpY++) for(int tmpX = -searchWidth; tmpX < searchWidth; tmpX++){ Pixel& tmpSrc = *((Pixel*)((char*)(src->pixels) + sizeof(Pixel)*(srcX+tmpX) + src->pitch*(srcY+tmpY))); Pixel& tmpDst = *((Pixel*)((char*)(dst->pixels) + sizeof(Pixel)*((x + dst->w + tmpX) & dstMaskX) + dst->pitch*((y + dst->h + tmpY) & dstMaskY))); if (tmpDst.a){ numPixels++; int dr = tmpSrc.r - tmpDst.r; int dg = tmpSrc.g - tmpDst.g; int db = tmpSrc.g - tmpDst.g; curDifference += dr*dr + dg*dg + db*db; } } if (numPixels) curDifference /= (float)numPixels; if (curDifference < lastDifference){ lastDifference = curDifference; storedSrcX = srcX; storedSrcY = srcY; } } dstPix = *((Pixel*)((char*)(src->pixels) + sizeof(Pixel)*storedSrcX + src->pitch*storedSrcY)); dstPix.a = 0xFF; } This thing is supposed to be used for texture generation. Now, the question: The easiest way to do this is brute force search (which is used in example routine). But it is slow - even using GPU acceleration and dual core cpu won't make it much faster. It looks like I can't use modified binary search because of B's mask. So, how can I find desired pixel faster? Additional Info: It is allowed to use 2 cores, GPU acceleration, CUDA, and 1.5..2 gigabytes of RAM for the task. I would prefer to avoid some kind of lengthy preprocessing phase that will take 30 minutes to finish. Ideas?

    Read the article

  • Optimization in Common Decalaration

    - by Pratik
    Its a 3-tier ASP.NET Website Project In Data Layer there is class "Common Decalaration" in which lot of common things are mentioned. Something this way : public class CommonDeclartion { #region Common Messages public const string RECORD_INSERT_MSG = "Record Inserted Successfully "; public const string RECORD_UPDATE_MSG = "Record Updated Successfully"; public const string RECORD_DELETE_MSG = "Record Deleted Successfully"; public const string ERROR_MSG = "Error Ocuured while Perfoming This Action."; public const string UserID_Incorrect = "Please Enter The Correct User ID."; public const string RECORD_ALREADY_EXIT = "Record Already Exit"; public const string NO_RECORD = "No Record found."; #endregion } Can this be more optimized in terms of : 1.Perfomance 2.Security(if any) 3.Code Readablity or Reusablity I thought of using enum but can't figure that out : enum CommonMessages { RECORD_INSERT_MSG "Record Inserted Successfully.", RECORD_UPDATE_MSG "Record Updated Successfully.", RECORD_DELETE_MSG "Record Deleted Successfully.", ERROR_MSG "Error Ocuured while Perfoming This Action.", UserID_Incorrect "Please Enter The Correct User ID.", RECORD_ALREADY_EXIT "Record Already Exit.", NO_RECORD "No Record found.", } or else should keep them in some collections like dictionary/NameValueCollection or so or i have to keep them in XML in form of key/value pair and reterive from it ? What can be better way keeping in mind 1.Perfomance 2.Security(if any) 3.Code Readablity or Reusablity

    Read the article

  • Algorithm for generating an array of non-equal costs for a transport problem optimization

    - by Carlos
    I have an optimizer that solves a transportation problem, using a cost matrix of all the possible paths. The optimiser works fine, but if two of the costs are equal, the solution contains one more path that the minimum number of paths. (Think of it as load balancing routers; if two routes are same cost, you'll use them both.) I would like the minimum number of routes, and to do that I need a cost matrix that doesn't have two costs that are equal within a certain tolerance. At the moment, I'm passing the cost matrix through a baking function which tests every entry for equality to each of the other entries, and moves it a fixed percentage if it matches. However, this approach seems to require N^2 comparisons, and if the starting values are all the same, the last cost will be r^N bigger. (r is the arbitrary fixed percentage). Also there is the problem that by multiplying by the percentage, you end up on top of another value. So the problem seems to have an element of recursion, or at least repeated checking, which bloats the code. The current implementation is basically not very good (I won't paste my GOTO-using code here for you all to mock), and I'd like to improve it. Is there a name for what I'm after, and is there a standard implementation? Example: {1,1,2,3,4,5} (tol = 0.05) becomes {1,1.05,2,3,4,5}

    Read the article

  • Insert takes too long, code optimization needed

    - by Pentium10
    I have some code I use to transfer a table1 values to another table2, they are sitting in different database. It's slow when I have 100.000 records. It takes forever to finish, 10+ minutes. (Windows Mobile smartphone) What can I do? cmd.CommandText = "insert into " + TableName + " select * from sync2." + TableName+""; cmd.ExecuteNonQuery(); EDIT The problem is not resolved. I am still after answers.

    Read the article

  • Optimization of Function with Dictionary and Zip()

    - by eWizardII
    Hello, I have the following function: def filetxt(): word_freq = {} lvl1 = [] lvl2 = [] total_t = 0 users = 0 text = [] for l in range(0,500): # Open File if os.path.exists("C:/Twitter/json/user_" + str(l) + ".json") == True: with open("C:/Twitter/json/user_" + str(l) + ".json", "r") as f: text_f = json.load(f) users = users + 1 for i in range(len(text_f)): text.append(text_f[str(i)]['text']) total_t = total_t + 1 else: pass # Filter occ = 0 import string for i in range(len(text)): s = text[i] # Sample string a = re.findall(r'(RT)',s) b = re.findall(r'(@)',s) occ = len(a) + len(b) + occ s = s.encode('utf-8') out = s.translate(string.maketrans("",""), string.punctuation) # Create Wordlist/Dictionary word_list = text[i].lower().split(None) for word in word_list: word_freq[word] = word_freq.get(word, 0) + 1 keys = word_freq.keys() numbo = range(1,len(keys)+1) WList = ', '.join(keys) NList = str(numbo).strip('[]') WList = WList.split(", ") NList = NList.split(", ") W2N = dict(zip(WList, NList)) for k in range (0,len(word_list)): word_list[k] = W2N[word_list[k]] for i in range (0,len(word_list)-1): lvl1.append(word_list[i]) lvl2.append(word_list[i+1]) I have used the profiler to find that it seems the greatest CPU time is spent on the zip() function and the join and split parts of the code, I'm looking to see if there is any way I have overlooked that I could potentially clean up the code to make it more optimized, since the greatest lag seems to be in how I am working with the dictionaries and the zip() function. Any help would be appreciated thanks!

    Read the article

  • C optimization breaks algorithm

    - by Halpo
    I am programming an algorithm that contains 4 nested for loops. The problem is at at each level a pointer is updated. The innermost loop only uses 1 of the pointers. The algorithm does a complicated count. When I include a debugging statement that logs the combination of the indexes and the results of the count I get the correct answer. When the debugging statement is omitted, the count is incorrect. The program is compiled with the -O3 option on gcc. Why would this happen?

    Read the article

  • sql server procedure optimization

    - by stackoverflow
    SQl Server 2005: Option: 1 CREATE TABLE #test (customerid, orderdate, field1 INT, field2 INT, field3 INT) CREATE UNIQUE CLUSTERED INDEX Idx1 ON #test(customerid) CREATE INDEX Idx2 ON #test(field1 DESC) CREATE INDEX Idx3 ON #test(field2 DESC) CREATE INDEX Idx4 ON #test(field3 DESC) INSERT INTO #test (customerid, orderdate, field1 INT, field2 INT, field3 INT) SELECT customerid, orderdate, field1, field2, field3 FROM ATABLERETURNING4000000ROWS compared to Option: 2 CREATE TABLE #test (customerid, orderdate, field1 INT, field2 INT, field3 INT) INSERT INTO #test (customerid, orderdate, field1 INT, field2 INT, field3 INT) SELECT customerid, orderdate, field1, field2, field3 FROM ATABLERETURNING4000000ROWS CREATE UNIQUE CLUSTERED INDEX Idx1 ON #test(customerid) CREATE INDEX Idx2 ON #test(field1 DESC) CREATE INDEX Idx3 ON #test(field2 DESC) CREATE INDEX Idx4 ON #test(field3 DESC) When we use the second option it runs close to 50% faster. Why is this?

    Read the article

  • Seeking free ODBC database optimization tool for non-experts

    - by mawg
    I'm a database n00b and am reading as many books as I can. I have been given responsibility for an ODBC tool where the databases were designed by a hardware engineer with some VB experience - which made him a s/w guru in the small firm at that time. Things are running slowly and I suspect that the db could have been designed better. I hope to learn enough to use Explain/Describe, etc maybe add some indices, but, in the meantime, is there any free for commercial use tool which can examine an ODBC database and suggest improvements. I'm just talking about db schema here, but maybe I should also be looking at optimizing Selects with Joins? Is there a tool for that? ODBC compliant.

    Read the article

  • Website optimization

    - by MB1
    Hi, have can i speed up the loading of images - specialy when i open the website for the first time it takes some time for images to load... Is there anything i can do to improve this (html, css)? link

    Read the article

  • Help with code optimization

    - by Ockonal
    Hello, I've written a little particle system for my 2d-application. Here is raining code: // HPP ----------------------------------- struct Data { float x, y, x_speed, y_speed; int timeout; Data(); }; std::vector<Data> mData; bool mFirstTime; void processDrops(float windPower, int i); // CPP ----------------------------------- Data::Data() : x(rand()%ScreenResolutionX), y(0) , x_speed(0), y_speed(0), timeout(rand()%130) { } void Rain::processDrops(float windPower, int i) { int posX = rand() % mWindowWidth; mData[i].x = posX; mData[i].x_speed = WindPower*0.1; // WindPower is float mData[i].y_speed = Gravity*0.1; // Gravity is 9.8 * 19.2 // If that is first time, process drops randomly with window height if (mFirstTime) { mData[i].timeout = 0; mData[i].y = rand() % mWindowHeight; } else { mData[i].timeout = rand() % 130; mData[i].y = 0; } } void update(float windPower, float elapsed) { // If this is first time - create array with new Data structure objects if (mFirstTime) { for (int i=0; i < mMaxObjects; ++i) { mData.push_back(Data()); processDrops(windPower, i); } mFirstTime = false; } for (int i=0; i < mMaxObjects; i++) { // Sleep until uptime > 0 (To make drops fall with randomly timeout) if (mData[i].timeout > 0) { mData[i].timeout--; } else { // Find new x/y positions mData[i].x += mData[i].x_speed * elapsed; mData[i].y += mData[i].y_speed * elapsed; // Find new speeds mData[i].x_speed += windPower * elapsed; mData[i].y_speed += Gravity * elapsed; // Drawing here ... // If drop has been falled out of the screen if (mData[i].y > mWindowHeight) processDrops(windPower, i); } } } So the main idea is: I have some structure which consist of drop position, speed. I have a function for processing drops at some index in the vector-array. Now if that's first time of running I'm making array with max size and process it in cycle. But this code works slower that all another I have. Please, help me to optimize it. I tried to replace all int with uint16_t but I think it doesn't matter.

    Read the article

  • WPF PathGeometry/RotateTransform optimization

    - by devinb
    I am having performance issues when rendering/rotating WPF triangles If I had a WPF triangle being displayed and it will be rotated to some degree around a centrepoint, I can do it one of two ways: Programatically determine the points and their offset in the backend, use XAML to simply place them on the canvas where they belong, it would look like this: <Path Stroke="Black"> <Path.Data> <PathGeometry> <PathFigure StartPoint ="{Binding CalculatedPointA, Mode=OneWay}"> <LineSegment Point="{Binding CalculatedPointB, Mode=OneWay}" /> <LineSegment Point="{Binding CalculatedPointC, Mode=OneWay}" /> <LineSegment Point="{Binding CalculatedPointA, Mode=OneWay}" /> </PathFigure> </PathGeometry> </Path.Data> </Path> Generate the 'same' triangle every time, and then use a RenderTransform (Rotate) to put it where it belongs. In this case, the rotation calculations are being obfuscated, because I don't have any access to how they are being done. <Path Stroke="Black"> <Path.Data> <PathGeometry> <PathFigure StartPoint ="{Binding TriPointA, Mode=OneWay}"> <LineSegment Point="{Binding TriPointB, Mode=OneWay}" /> <LineSegment Point="{Binding TriPointC, Mode=OneWay}" /> <LineSegment Point="{Binding TriPointA, Mode=OneWay}" /> </PathFigure> </PathGeometry> </Path.Data> <Path.RenderTransform> <RotateTransform CenterX="{Binding Centre.X, Mode=OneWay}" CenterY="{Binding Centre.Y, Mode=OneWay}" Angle="{Binding Orientation, Mode=OneWay}" /> </Path.RenderTransform> </Path> My question is which one is faster? I know I should test it myself but how do I measure the render time of objects with such granularity. I would need to be able to time how long the actual rendering time is for the form, but since I'm not the one that's kicking off the redraw, I don't know how to capture the start time.

    Read the article

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14  | Next Page >