Search Results

Search found 6387 results on 256 pages for 'cpu allocation'.

Page 234/256 | < Previous Page | 230 231 232 233 234 235 236 237 238 239 240 241  | Next Page >

  • c++ floating point precision loss: 3015/0.00025298219406977296

    - by SigTerm
    The problem. Microsoft Visual C++ 2005 compiler, 32bit windows xp sp3, amd 64 x2 cpu. Code: double a = 3015.0; double b = 0.00025298219406977296; //*((unsigned __int64*)(&a)) == 0x40a78e0000000000 //*((unsigned __int64*)(&b)) == 0x3f30945640000000 double f = a/b;//3015/0.00025298219406977296; the result of calculation (i.e. "f") is 11917835.000000000 (*((unsigned __int64*)(&f)) == 0x4166bb4160000000) although it should be 11917834.814763514 (i.e. *((unsigned __int64*)(&f)) == 0x4166bb415a128aef). I.e. fractional part is lost. Unfortunately, I need fractional part to be correct. Questions: 1) Why does this happen? 2) How can I fix the problem? Additional info: 0) The result is taken directly from "watch" window (it wasn't printed, and I didn't forget to set printing precision). I also provided hex dump of floating point variable, so I'm absolutely sure about calculation result. 1) The disassembly of f = a/b is: fld qword ptr [a] fdiv qword ptr [b] fstp qword ptr [f] 2) f = 3015/0.00025298219406977296; yields correct result (f == 11917834.814763514 , *((unsigned __int64*)(&f)) == 0x4166bb415a128aef ), but it looks like in this case result is simply calculated during compile-time: fld qword ptr [__real@4166bb415a128aef (828EA0h)] fstp qword ptr [f] So, how can I fix this problem? P.S. I've found a temporary workaround (i need only fractional part of division, so I simply use f = fmod(a/b)/b at the moment), but I still would like to know how to fix this problem properly - double precision is supposed to be 16 decimal digits, so such calculation isn't supposed to cause problems.

    Read the article

  • Big-O for GPS data

    - by HH
    A non-critical GPS module use lists because it needs to be modifiable, new routes added, new distances calculated, continuos comparisons. Well so I thought but my team member wrote something I am very hard to get into. His pseudo code int k =0; a[][] <- create mapModuleNearbyDotList -array //CPU O(n) for(j = 1 to n) // O(nlog(m)) for(i =1 to n) for(k = 1 to n) if(dot is nearby) adj[i][j]=min(adj[i][j], adj[i][k] + adj[k][j]); His ideas transformations of lists to tables His worst case time complexity is O(n^3), where n is number of elements in his so-called table. Exception to the last point with Finite structure: O(mlog(n)) where n is number of vertices and m is an arbitrary constants Questions about his ideas why to waste resources to transform constantly-modified lists to table? Fast? only point where I to some extent agree but cannot understand the same upper limits n for each for-loops -- perhaps he supposed it circular why does the code take O(mlog(n)) to proceed in time as finite structure? The term finite may be wrong, explicit?

    Read the article

  • Simple POSIX threads question

    - by Andy
    Hi, I have this POSIX thread: void subthread(void) { while(!quit_thread) { // do something ... // don't waste cpu cycles if(!quit_thread) usleep(500); } // free resources ... // tell main thread we're done quit_thread = FALSE; } Now I want to terminate subthread() from my main thread. I've tried the following: quit_thread = TRUE; // wait until subthread() has cleaned its resources while(quit_thread); But it does not work! The while() clause does never exit although my subthread clearly sets quit_thread to FALSE after having freed its resources! If I modify my shutdown code like this: quit_thread = TRUE; // wait until subthread() has cleaned its resources while(quit_thread) usleep(10); Then everything is working fine! Could someone explain to me why the first solution does not work and why the version with usleep(10) suddenly works? I know that this is not a pretty solution. I could use semaphores/signals for this but I'd like to learn something about multithreading, so I'd like to know why my first solution doesn't work. Thanks!

    Read the article

  • trouble calculating offset index into 3D array

    - by Derek
    Hello, I am writing a CUDA kernel to create a 3x3 covariance matrix for each location in the rows*cols main matrix. So that 3D matrix is rows*cols*9 in size, which i allocated in a single malloc accordingly. I need to access this in a single index value the 9 values of the 3x3 covariance matrix get their values set according to the appropriate row r and column c from some other 2D arrays. In other words - I need to calculate the appropriate index to access the 9 elements of the 3x3 covariance matrix, as well as the row and column offset of the 2D matrices that are inputs to the value, as well as the appropriate index for the storage array. i have tried to simplify it down to the following: //I am calling this kernel with 1D blocks who are 512 cols x 1row. TILE_WIDTH=512 int bx = blockIdx.x; int by = blockIdx.y; int tx = threadIdx.x; int ty = threadIdx.y; int r = by + ty; int c = bx*TILE_WIDTH + tx; int offset = r*cols+c; int ndx = r*cols*rows + c*cols; if((r < rows) && (c < cols)){ //this IF statement is trying to avoid the case where a threadblock went bigger than my original array..not sure if correct d_cov[ndx + 0] = otherArray[offset]; d_cov[ndx + 1] = otherArray[offset] d_cov[ndx + 2] = otherArray[offset] d_cov[ndx + 3] = otherArray[offset] d_cov[ndx + 4] = otherArray[offset] d_cov[ndx + 5] = otherArray[offset] d_cov[ndx + 6] = otherArray[offset] d_cov[ndx + 7] = otherArray[offset] d_cov[ndx + 8] = otherArray[offset] } When I check this array with the values calculated on the CPU, which loops over i=rows, j=cols, k = 1..9 The results do not match up. in other words d_cov[i*rows*cols + j*cols + k] != correctAnswer[i][j][k] Can anyone give me any tips on how to sovle this problem? Is it an indexing problem, or some other logic error?

    Read the article

  • Standard term for a thread I/O reorder buffer?

    - by Crashworks
    I have a case where many threads all concurrently generate data that is ultimately written to one long, serial file. I need to somehow serialize these writes so that the file gets written in the right order. ie, I have an input queue of 2048 jobs j0..jn, each of which produces a chunk of data oi. The jobs run in parallel on, say, eight threads, but the output blocks have to appear in the file in the same order as the corresponding input blocks — the output file has to be in the order o0o1o2... The solution to this is pretty self evident: I need some kind of buffer that accumulates and writes the output blocks in the correct order, similar to a CPU reorder buffer in Tomasulo's algorithm, or to the way that TCP reassembles out-of-order packets before passing them to the application layer. Before I go code it, I'd like to do a quick literature search to see if there are any papers that have solved this problem in a particularly clever or efficient way, since I have severe realtime and memory constraints. I can't seem to find any papers describing this though; a Scholar search on every permutation of [threads, concurrent, reorder buffer, reassembly, io, serialize] hasn't yielded anything useful. I feel like I must just not be searching the right terms. Is there a common academic name or keyword for this kind of pattern that I can search on?

    Read the article

  • DirectX: Game loop order, draw first and then handle input?

    - by Ricket
    I was just reading through the DirectX documentation and encountered something interesting in the page for IDirect3DDevice9::BeginScene : To enable maximal parallelism between the CPU and the graphics accelerator, it is advantageous to call IDirect3DDevice9::EndScene as far ahead of calling present as possible. I've been accustomed to writing my game loop to handle input and such, then draw. Do I have it backwards? Maybe the game loop should be more like this: (semi-pseudocode, obviously) while(running) { d3ddev->Clear(...); d3ddev->BeginScene(); // draw things d3ddev->EndScene(); // handle input // do any other processing // play sounds, etc. d3ddev->Present(NULL, NULL, NULL, NULL); } According to that sentence of the documentation, this loop would "enable maximal parallelism". Is this commonly done? Are there any downsides to ordering the game loop like this? I see no real problem with it after the first iteration... And I know the best way to know the actual speed increase of something like this is to actually benchmark it, but has anyone else already tried this and can you attest to any actual speed increase?

    Read the article

  • are mobile can be used as a devices to develop application

    - by Richa Media and services
    I say that we can work @ mobile using a technique and work with any IDE and use any OS without problem like work on visual studio and use Window 7 How it possible ? 1. We use Mobile like a CPU and Use a monitor to watch code. We use samsung's techniques to display on monitor. it's give signal to monitor wirelessly to display code on monitor 2 We use Wireless keyboard and Mouse (if user like USB then he also use USB keyboard and mouse) 3. We use a component inside of mobile to control all devices like internet , wi-fi bluethoth. by component user easily setup , control and use feature. 4 don't be confused. i am sure to say that we not use mobile to watch code on mobile screen and Mobile 's keyboard because it's too smaller to work so we use Monitor (LCD) to display code and a keyboard to work comfortably and freely. 5. what are you think if you see a developer who work using this way 6. it is not impossible. give me some feedback and suggestion about your thinking on this technologies.

    Read the article

  • Was Visual Studio 2008 or 2010 written to use multi cores?

    - by Erx_VB.NExT.Coder
    basically i want to know if the visual studio IDE and/or compiler in 2010 was written to make use of a multi core environment (i understand we can target multi core environments in 08 and 10, but that is not my question). i am trying to decide on if i should get a higher clock dual core or a lower clock quad core, as i want to try and figure out which processor will give me the absolute best possible experience with Visual Studio 2010 (ide and background compiler). if they are running the most important section (background compiler and other ide tasks) in one core, then the core will get cut off quicker if running a quad core, esp if background compiler is the heaviest task, i would imagine this would b e difficult to seperate in more then one process, so even if it uses multi cores you might still be better off with going for a higher clock cpu if the majority of the processing is still bound to occur in one core (ie the most significant part of the VS environment). i am a vb programmer, they've made great performance improvements in beta 2, congrats, but i would love to be able to use VS seamlessly... anyone have any ideas? thanks, erx

    Read the article

  • What hash algorithms are parallelizable? Optimizing the hashing of large files utilizing on multi-co

    - by DanO
    I'm interested in optimizing the hashing of some large files (optimizing wall clock time). The I/O has been optimized well enough already and the I/O device (local SSD) is only tapped at about 25% of capacity, while one of the CPU cores is completely maxed-out. I have more cores available, and in the future will likely have even more cores. So far I've only been able to tap into more cores if I happen to need multiple hashes of the same file, say an MD5 AND a SHA256 at the same time. I can use the same I/O stream to feed two or more hash algorithms, and I get the faster algorithms done for free (as far as wall clock time). As I understand most hash algorithms, each new bit changes the entire result, and it is inherently challenging/impossible to do in parallel. Are any of the mainstream hash algorithms parallelizable? Are there any non-mainstream hashes that are parallelizable (and that have at least a sample implementation available)? As future CPUs will trend toward more cores and a leveling off in clock speed, is there any way to improve the performance of file hashing? (other than liquid nitrogen cooled overclocking?) or is it inherently non-parallelizable?

    Read the article

  • Want to Receive dynamic length data from a message queue in IPC?

    - by user1089679
    Here I have to send and receive dynamic data using a SysV message queue. so in structure filed i have dynamic memory allocation char * because its size may be varies. so how can i receive this type of message at receiver side. Please let me know how can i send dynamic length of data with message queue. I am getting problem in this i posted my code below. send.c /*filename : send.c *To compile : gcc send.c -o send */ #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <string.h> #include <sys/types.h> #include <sys/ipc.h> #include <sys/msg.h> struct my_msgbuf { long mtype; char *mtext; }; int main(void) { struct my_msgbuf buf; int msqid; key_t key; static int count = 0; char temp[5]; int run = 1; if ((key = ftok("send.c", 'B')) == -1) { perror("ftok"); exit(1); } printf("send.c Key is = %d\n",key); if ((msqid = msgget(key, 0644 | IPC_CREAT)) == -1) { perror("msgget"); exit(1); } printf("Enter lines of text, ^D to quit:\n"); buf.mtype = 1; /* we don't really care in this case */ int ret = -1; while(run) { count++; buf.mtext = malloc(50); strcpy(buf.mtext,"Hi hello test message here"); snprintf(temp, sizeof (temp), "%d",count); strcat(buf.mtext,temp); int len = strlen(buf.mtext); /* ditch newline at end, if it exists */ if (buf.mtext[len-1] == '\n') buf.mtext[len-1] = '\0'; if (msgsnd(msqid, &buf, len+1, IPC_NOWAIT) == -1) /* +1 for '\0' */ perror("msgsnd"); if(count == 100) run = 0; usleep(1000000); } if (msgctl(msqid, IPC_RMID, NULL) == -1) { perror("msgctl"); exit(1); } return 0; } receive.c /* filename : receive.c * To compile : gcc receive.c -o receive */ #include <stdio.h> #include <stdlib.h> #include <errno.h> #include <sys/types.h> #include <sys/ipc.h> #include <sys/msg.h> struct my_msgbuf { long mtype; char *mtext; }; int main(void) { struct my_msgbuf buf; int msqid; key_t key; if ((key = ftok("send.c", 'B')) == -1) { /* same key as send.c */ perror("ftok"); exit(1); } if ((msqid = msgget(key, 0644)) == -1) { /* connect to the queue */ perror("msgget"); exit(1); } printf("test: ready to receive messages, captain.\n"); for(;;) { /* receive never quits! */ buf.mtext = malloc(50); if (msgrcv(msqid, &buf, 50, 0, 0) == -1) { perror("msgrcv"); exit(1); } printf("test: \"%s\"\n", buf.mtext); } return 0; }

    Read the article

  • Windows 2008 VPS hosting experiences

    - by Luke Bennett
    Whilst similar questions exist, I couldn't find any which quite match my request. I'm looking for hosting for some personal .NET projects which for various reasons I do not want to host on our servers at work. I need to be able to host multiple sites and for that reason I'm thinking of a VPS with RDP access for the time being - don't fancy shared hosting as I feel that doesn't offer me the flexbility and control I'm looking for. What experiences do people have of Windows 2008 VPS providers? I've come across a few possibilities although it seems a lot of places are still on Windows 2003 with 2008 'coming soon'. Is VPS the best way to go? Eventually (depending on how the projects take off) I intend to get a dedicated box but at this stage it's not cost-effective. Also, what are people's experiences of running SQL Server Express on a VPS? What would you say the minimum requirements are for CPU/memory? I know it's not going to be anywhere near as performant as SQL Server 2005/8 running on a dedicated box but I'm hoping it will be an acceptable starting point. Any other tips/advice also welcome! Edit: Forgot to mention, I'm ideally looking for UK hosting although I'm open to alternatives.

    Read the article

  • Why is CoRegisterClassObject creating two extra threads?

    - by Stijn Sanders
    I'm trying to fix a problem that only recently happens on a number of machine's on a VPN. They each run a client application I wrote that exposes a COM automation object. For some strange reason I haven't been able to discover yet, one thread in the application takes up all of the available CPU time, slowing other operation on the machine. In observing the application's strange behaviour, I've noticed it's the third thread started, and if I debug on my machine I notice the first call to CoRegisterClassObject created two extra threads. If the second of these two threads is the one that gets into an infinite loop, I'm not at all shure how to fix this. Where could I check next about what's wrong? Could it have started by one of the recent patches rolled out by Microsoft this last 'patch tuesday'? I had a go with ProcessExplorer to extract a stack trace of the thread: ntoskrnl.exe!ExReleaseResourceLite+0x1a3 ntoskrnl.exe!PsGetContextThread+0x329 WLDAP32.dll!Ordinal325+0x1231 WLDAP32.dll!Ordinal325+0x129e WLDAP32.dll!Ordinal325+0x1178 ntdll.dll!LdrInitializeThunk+0x24 ntdll.dll!LdrShutdownThread+0xe9 kernel32.dll!ExitThread+0x3e kernel32.dll!FreeLibraryAndExitThread+0x1e ole32.dll!StringFromGUID2+0x65d kernel32.dll!GetModuleFileNameA+0x1ba

    Read the article

  • Can knowing C actually hurt the code you write in higher level languages?

    - by Jurily
    The question seems settled, beaten to death even. Smart people have said smart things on the subject. To be a really good programmer, you need to know C. Or do you? I was enlightened twice this week. The first one made me realize that my assumptions don't go further than my knowledge behind them, and given the complexity of software running on my machine, that's almost non-existent. But what really drove it home was this Slashdot comment: The end result is that I notice the many naive ways in which traditional C "bare metal" programmers assume that higher level languages are implemented. They make bad "optimization" decisions in projects they influence, because they have no idea how a compiler works or how different a good runtime system may be from the naive macro-assembler model they understand. Then it hit me: C is just one more abstraction, like all others. Even the CPU itself is only an abstraction! I've just never seen it break, because I don't have the tools to measure it. I'm confused. Has my mind been mutilated beyond recovery, like Dijkstra said about BASIC? Am I living in a constant state of premature optimization? Is there hope for me, now that I realized I know nothing about anything? Is there anything to know, even? And why is it so fascinating, that everything I've written in the last five years might have been fundamentally wrong? To sum it up: is there any value in knowing more than the API docs tell me? EDIT: Made CW. Of course this also means now you must post examples of the interpreter/runtime optimizing better than we do :)

    Read the article

  • JAR files, don't they just bloat and slow Java down?

    - by Josamoto
    Okay, the question might seem dumb, but I'm asking it anyways. After struggling for hours to get a Spring + BlazeDS project up and running, I discovered that I was having problems with my project as a result of not including the right dependencies for Spring etc. There were .jars missing from my WEB-INF/lib folder, yes, silly me. After a while, I managed to get all the .jar files where they belong, and it comes at a whopping 12.5MB at that, and there's more than 30 of them! Which concerns me, but it probably and hopefully shouldn't be concerned. How does Java operate in terms of these JAR files, they do take up quite a bit of hard drive space, taking into account that it's compressed and compiled source code. So that can really quickly populate a lot of RAM and in an instant. My questions are: Does Java load an entire .jar file into memory when say for instance a class in that .jar is instantiated? What about stuff that's in the .jar that never gets used. Do .jars get cached somehow, for optimized application performance? When a single .jar is loaded, I understand that the thing sits in memory and is available across multiple HTTP requests (i.e. for the lifetime of the server instance running), unlike PHP where objects are created on the fly with each request, is this assumption correct? When using Spring, I'm thinking, I had to include all those fiddly .jars, wouldn't I just be better off just using native Java, with say at least and ORM solution like Hibernate? So far, Spring just took extra time configuring, extra hard drive space, extra memory, cpu consumption, so I'm concerned that the framework is going to cost too much application performance just to get for example, IoC implemented with my BlazeDS server. There still has to come ORM, a unit testing framework and bits and pieces here and there. It's just so easy to bloat up a project quickly and irresponsibly easily. Where do I draw the line?

    Read the article

  • A generic C++ library that provides QtConcurrent functionality?

    - by Lucas
    QtConcurrent is awesome. I'll let the Qt docs speak for themselves: QtConcurrent includes functional programming style APIs for parallel list processing, including a MapReduce and FilterReduce implementation for shared-memory (non-distributed) systems, and classes for managing asynchronous computations in GUI applications. For instance, you give QtConcurrent::map() an iterable sequence and a function that accepts items of the type stored in the sequence, and that function is applied to all the items in the collection. This is done in a multi-threaded manner, with a thread pool equal to the number of logical CPU's on the system. There are plenty of other function in QtConcurrent, like filter(), filteredReduced() etc. The standard CompSci map/reduce functions and the like. I'm totally in love with this, but I'm starting work on an OSS project that will not be using the Qt framework. It's a library, and I don't want to force others to depend on such a large framework like Qt. I'm trying to keep external dependencies to a minimum (it's the decent thing to do). I'm looking for a generic C++ framework that provides me with the same/similar high-level primitives that QtConcurrent does. AFAIK boost has nothing like this (I may be wrong though). boost::thread is very low-level compared to what I'm looking for. I know C# has something very similar with their Parallel Extensions so I know this isn't a Qt-only idea. What do you suggest I use?

    Read the article

  • 3 questions about JSON-P format.

    - by Tristan
    Hi, I have to write a script which will be hosted on differents domains. This script has to get information from my server. So, stackoverflow's user told me that i have to use JSON-P format, which, after research, is what i'm going to do. (the data provided in JSON-P is for displaying some information hosted on my server on other website) How do I output JSON-P from my server ? Is it the same as the json_encode function from PHP How do i design the tree pattern for the output JSON-P (you know, like : ({"name" : "foo", "id" : "xxxxx", "blog" : "http://xxxxxx.com"}); can I steal this from my XML output ? (http://bit.ly/9kzBDP) Each time a visitor browse a website on which my widget is it'll make a request on my server, requesting the JSON-P data to display on the client side. It'll increase dramatically the CPU load (1 visitor on the website who will have the script = 1 SQL request on my server to output data), so is there any way to 'caching' the JSON-P information output to refresh it only one or twice a day and stores it into a 'file' (in which extension?). BUT on the other hand i would say that requesting the JSON-P data directly (without caching it) is a plus, because, websites which will integrates the script only want to display THEIR information and not the whole data. So, making a script with something like that: jQuery.getJSON("http://www.something.com/json-p/outpout?filter=5&callback=?", function(data) { ................); }); Where filter= the information the website wants to display What do you think ? Thank you very much ;)

    Read the article

  • Flash browser game - HTTP + PHP vs Socket + Something else

    - by Maurycy Zarzycki
    I am developing a non-real time browser RPG game (think Kingdom of Loathing) which would be played from within a Flash app. At first I just wanted to make the communication with server using simply URLLoader to tell PHP what I am doing, and using $_SESSION to store data needed in-between request. I wonder if it wouldn't be better to base it on a socket connection, an app residing on a server written in Java or Python. The problem is I have never ever written such an app so I have no idea how much I'd have to "shift" my thoughts from simple responding do request (like PHP) to continuously working application. I won't hide I am also concerned about the memory and CPU usage of such Server app, when for example there would be hundreds of users connected. I've done some research. I have tried to do some research, but thanks to my nil knowledge on the sockets subject I haven't found anything helpful. So, considering the fact I don't need real time data exchange, will it be wise to develop the server side part as socket server, not in plain ol' PHP?

    Read the article

  • I need to make a multithreading program (python)

    - by Andreawu98
    import multiprocessing import time from itertools import product out_file = open("test.txt", 'w') P = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p','q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',] N = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] M = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] c = int(input("Insert the number of digits you want: ")) n = int(input("If you need number press 1: ")) m = int(input("If you need upper letters press 1: ")) i = [] if n == 1: P = P + N if m == 1: P = P + M then = time.time() def worker(): for i in product(P, repeat=c): #check every possibilities k = '' for z in range(0, c): # k = k + str(i[z]) # print each possibility in a txt without parentesis or comma out_file.write( k + '\n') # out_file.close() now = time.time() diff = str(now - then) # To see how long does it take print(diff) worker() time.sleep(10) # just to check console The code check every single possibility and print it out in a test.txt file. It works but I really can't understand how can I speed it up. I saw it use 1 core out of my quad core CPU so I thought Multi-threading might work even though I don't know how. Please help me. Sorry for my English, I am from Italy.

    Read the article

  • Many users, many cpus, no delays. Good for cloud?

    - by Eric
    I wish to set up a CPU-intensive time-important query service for users on the internet. A usage scenario is described below. Is cloud computing the right way to go for such an implementation? If so, what cloud vendor(s) cater to this type of application? I ask specifically, in terms of: 1) pricing 2) latency resulting from: - slow CPUs, instance creations, JIT compiles, etc.. - internal management and communication of processes inside the cloud (e.g. a queuing process and a calculation process) - communication between cloud and end user 3) ease of deployment A usage scenario I am expecting is: - A typical user sends a query (XML of size around 1K) once every 30 seconds on average. - Each query requires a numerical computation of average time 0.2 sec and max time 1 sec on a 1 GHz Pentium. The computation requires no data other than the query itself and is performed by the same piece of code each time. - The delay a user experiences between sending a query and receiving a response should be on average no more than 2 seconds and in general no more than 5 seconds. - A background save to a DB of the response should occur (not time critical) - There can be up to 30000 simultaneous users - i.e., on average 1000 queries a second, each requiring an average 0.2 sec calculation, so that would necessitate around 200 CPUs. Currently I'm look at GAE Java (for quicker deployment and less IT hassle) and EC2 (Speed and price optimization) as options. Where can I learn more about the right way to set ups such a system? past threads, different blogs, books, etc.. BTW, if my terminology is wrong or confusing, please let me know. I'd greatly appreciate any help.

    Read the article

  • Link failure with either abnormal memory consumption or LNK1106 in Visual Studio 2005.

    - by Corvin
    Hello, I am trying to build a solution for windows XP in Visual Studio 2005. This solution contains 81 projects (static libs, exe's, dlls) and is being successfully used by our partners. I copied the solution bundle from their repository and tried setting it up on 3 similar machines of people in our group. I was successful on two machines and the solution failed to build on my machine. The build on my machine encountered two problems: During a simple build creation of the biggest static library (about 522Mb in debug mode) would fail with the message "13libd\ui1d.lib : fatal error LNK1106: invalid file or disk full: cannot seek to 0x20101879" Full solution rebuild creates this library, however when it comes to linking the library to main .exe file, devenv.exe spawns link.exe which consumes about 80Mb of physical memory and 250MB of virtual and spawns another link.exe, which does the same. This goes on until the system runs out of memory. On PCs of my colleagues where successful build could be performed, there is only one link.exe process which uses all the memory required for linking (about 500Mb physical). There is a plenty of hard drive space on my machine and the file system is NTFS. All three of our systems are similar - Core2Quad processors, 4Gb of RAM, Windows XP SP3. We are using Visual studio installed from the same source. I tried using a different RAM and CPU, using dedicated graphics adapter to eliminate possibility of video memory sharing influencing the build, putting solution files to different location, using different versions of VS 2005 (Professional, Standard and Team Suite), changing the amount of available virtual memory, running memtest86 and building the project from scratch (i.e. a clean bundle). I have read what MSDN says about LNK1106, none of the cases apply to me except for maybe "out of heap space", however I am not sure how I should fight this. The only idea that I have left is reinstalling the OS, however I am not sure that it would help and I am not sure that my situation wouldn't repeat itself on a different machine. Would anyone have any sort of advice for me? Thanks

    Read the article

  • Determing if an unordered vector<T> has all unique elements

    - by Hooked
    Profiling my cpu-bound code has suggested I that spend a long time checking to see if a container contains completely unique elements. Assuming that I have some large container of unsorted elements (with < and = defined), I have two ideas on how this might be done: The first using a set: template <class T> bool is_unique(vector<T> X) { set<T> Y(X.begin(), X.end()); return X.size() == Y.size(); } The second looping over the elements: template <class T> bool is_unique2(vector<T> X) { typename vector<T>::iterator i,j; for(i=X.begin();i!=X.end();++i) { for(j=i+1;j!=X.end();++j) { if(*i == *j) return 0; } } return 1; } I've tested them the best I can, and from what I can gather from reading the documentation about STL, the answer is (as usual), it depends. I think that in the first case, if all the elements are unique it is very quick, but if there is a large degeneracy the operation seems to take O(N^2) time. For the nested iterator approach the opposite seems to be true, it is lighting fast if X[0]==X[1] but takes (understandably) O(N^2) time if all the elements are unique. Is there a better way to do this, perhaps a STL algorithm built for this very purpose? If not, are there any suggestions eek out a bit more efficiency?

    Read the article

  • C++ pimpl idiom wastes an instruction vs. C style?

    - by Rob
    (Yes, I know that one machine instruction usually doesn't matter. I'm asking this question because I want to understand the pimpl idiom, and use it in the best possible way; and because sometimes I do care about one machine instruction.) In the sample code below, there are two classes, Thing and OtherThing. Users would include "thing.hh". Thing uses the pimpl idiom to hide it's implementation. OtherThing uses a C style – non-member functions that return and take pointers. This style produces slightly better machine code. I'm wondering: is there a way to use C++ style – ie, make the functions into member functions – and yet still save the machine instruction. I like this style because it doesn't pollute the namespace outside the class. Note: I'm only looking at calling member functions (in this case, calc). I'm not looking at object allocation. Below are the files, commands, and the machine code, on my Mac. thing.hh: class ThingImpl; class Thing { ThingImpl *impl; public: Thing(); int calc(); }; class OtherThing; OtherThing *make_other(); int calc(OtherThing *); thing.cc: #include "thing.hh" struct ThingImpl { int x; }; Thing::Thing() { impl = new ThingImpl; impl->x = 5; } int Thing::calc() { return impl->x + 1; } struct OtherThing { int x; }; OtherThing *make_other() { OtherThing *t = new OtherThing; t->x = 5; } int calc(OtherThing *t) { return t->x + 1; } main.cc (just to test the code actually works...) #include "thing.hh" #include <cstdio> int main() { Thing *t = new Thing; printf("calc: %d\n", t->calc()); OtherThing *t2 = make_other(); printf("calc: %d\n", calc(t2)); } Makefile: all: main thing.o : thing.cc thing.hh g++ -fomit-frame-pointer -O2 -c thing.cc main.o : main.cc thing.hh g++ -fomit-frame-pointer -O2 -c main.cc main: main.o thing.o g++ -O2 -o $@ $^ clean: rm *.o rm main Run make and then look at the machine code. On the mac I use otool -tv thing.o | c++filt. On linux I think it's objdump -d thing.o. Here is the relevant output: Thing::calc(): 0000000000000000 movq (%rdi),%rax 0000000000000003 movl (%rax),%eax 0000000000000005 incl %eax 0000000000000007 ret calc(OtherThing*): 0000000000000010 movl (%rdi),%eax 0000000000000012 incl %eax 0000000000000014 ret Notice the extra instruction because of the pointer indirection. The first function looks up two fields (impl, then x), while the second only needs to get x. What can be done?

    Read the article

  • Effective optimization strategies on modern C++ compilers

    - by user168715
    I'm working on scientific code that is very performance-critical. An initial version of the code has been written and tested, and now, with profiler in hand, it's time to start shaving cycles from the hot spots. It's well-known that some optimizations, e.g. loop unrolling, are handled these days much more effectively by the compiler than by a programmer meddling by hand. Which techniques are still worthwhile? Obviously, I'll run everything I try through a profiler, but if there's conventional wisdom as to what tends to work and what doesn't, it would save me significant time. I know that optimization is very compiler- and architecture- dependent. I'm using Intel's C++ compiler targeting the Core 2 Duo, but I'm also interested in what works well for gcc, or for "any modern compiler." Here are some concrete ideas I'm considering: Is there any benefit to replacing STL containers/algorithms with hand-rolled ones? In particular, my program includes a very large priority queue (currently a std::priority_queue) whose manipulation is taking a lot of total time. Is this something worth looking into, or is the STL implementation already likely the fastest possible? Along similar lines, for std::vectors whose needed sizes are unknown but have a reasonably small upper bound, is it profitable to replace them with statically-allocated arrays? I've found that dynamic memory allocation is often a severe bottleneck, and that eliminating it can lead to significant speedups. As a consequence I'm interesting in the performance tradeoffs of returning large temporary data structures by value vs. returning by pointer vs. passing the result in by reference. Is there a way to reliably determine whether or not the compiler will use RVO for a given method (assuming the caller doesn't need to modify the result, of course)? How cache-aware do compilers tend to be? For example, is it worth looking into reordering nested loops? Given the scientific nature of the program, floating-point numbers are used everywhere. A significant bottleneck in my code used to be conversions from floating point to integers: the compiler would emit code to save the current rounding mode, change it, perform the conversion, then restore the old rounding mode --- even though nothing in the program ever changed the rounding mode! Disabling this behavior significantly sped up my code. Are there any similar floating-point-related gotchas I should be aware of? One consequence of C++ being compiled and linked separately is that the compiler is unable to do what would seem to be very simple optimizations, such as move method calls like strlen() out of the termination conditions of loop. Are there any optimization like this one that I should look out for because they can't be done by the compiler and must be done by hand? On the flip side, are there any techniques I should avoid because they are likely to interfere with the compiler's ability to automatically optimize code? Lastly, to nip certain kinds of answers in the bud: I understand that optimization has a cost in terms of complexity, reliability, and maintainability. For this particular application, increased performance is worth these costs. I understand that the best optimizations are often to improve the high-level algorithms, and this has already been done.

    Read the article

  • Handling large (object) datasets with PHP

    - by Aron Rotteveel
    I am currently working on a project that extensively relies on the EAV model. Both entities as their attributes are individually represented by a model, sometimes extending other models (or at least, base models). This has worked quite well so far since most areas of the application only rely on filtered sets of entities, and not the entire dataset. Now, however, I need to parse the entire dataset (IE: all entities and all their attributes) in order to provide a sorting/filtering algorithm based on the attributes. The application currently consists of aproximately 2200 entities, each with aproximately 100 attributes. Every entity is represented by a single model (for example Client_Model_Entity) and has a protected property called $_attributes, which is an array of Attribute objects. Each entity object is about 500KB, which results in an incredible load on the server. With 2000 entities, this means a single task would take 1GB of RAM (and a lot of CPU time) in order to work, which is unacceptable. Are there any patterns or common approaches to iterating over such large datasets? Paging is not really an option, since everything has to be taken into account in order to provide the sorting algorithm.

    Read the article

  • Drawing line graphics leads Flash to spiral out of control!

    - by drpepper
    Hi, I'm having problems with some AS3 code that simply draws on a Sprite's Graphics object. The drawing happens as part of a larger procedure called on every ENTER_FRAME event of the stage. Flash neither crashes nor returns an error. Instead, it starts running at 100% CPU and grabs all the memory that it can, until I kill the process manually or my computer buckles under the pressure when it gets up to around 2-3 GB. This will happen at a random time, and without any noticiple slowdown beforehand. WTF? Has anyone seen anything like this? PS: I used to do the drawing within a MOUSE_MOVE event handler, which brought this problem on even faster. PPS: I'm developing on Linux, but reproduced the same problem on Windows. UPDATE: You asked for some code, so here we are. The drawing function looks like this: public static function drawDashedLine(i_graphics : Graphics, i_from : Point, i_to : Point, i_on : Number, i_off : Number) : void { const vecLength : Number = Point.distance(i_from, i_to); i_graphics.moveTo(i_from.x, i_from.y); var dist : Number = 0; var lineIsOn : Boolean = true; while(dist < vecLength) { dist = Math.min(vecLength, dist + (lineIsOn ? i_on : i_off)); const p : Point = Point.interpolate(i_from, i_to, 1 - dist / vecLength); if(lineIsOn) i_graphics.lineTo(p.x, p.y); else i_graphics.moveTo(p.x, p.y); lineIsOn = !lineIsOn; } } and is called like this (m_graphicsLayer is a Sprite): m_graphicsLayer.graphics.clear(); if (m_destinationPoint) { m_graphicsLayer.graphics.lineStyle(2, m_fixedAim ? 0xff0000 : 0x333333, 1); drawDashedLine(m_graphicsLayer.graphics, m_initialPos, m_destinationPoint, 10, 10); }

    Read the article

< Previous Page | 230 231 232 233 234 235 236 237 238 239 240 241  | Next Page >