Search Results

Search found 22 results on 1 pages for 'omp'.

Page 1/1 | 1 

  • How to break out of a nested parallel (OpenMP) Fortran loop idiomatically?

    - by J.F. Sebastian
    Here's sequential code: do i = 1, n do j = i+1, n if ("some_condition") then result = "here's result" return end if end do end do Is there a cleaner way to execute iterations of the outer loop concurrently other than: !$OMP PARALLEL private(i,j) !$OMP DO do i = 1, n if (found) goto 10 do j = i+1, n if (found) goto 10 if ("some_condition") then !$OMP CRITICAL !$OMP FLUSH if (.not.found) then found = .true. result = "here's result" end if !$OMP FLUSH !$OMP END CRITICAL goto 10 end if end do 10 continue end do !$OMP END DO NOWAIT !$OMP END PARALLEL

    Read the article

  • Dividing sections inside an omp parallel for : OpenMP

    - by Sayan Ghosh
    Hi, I have a situation like: #pragma omp parallel for private(i, j, k, val, p, l) for (i = 0; i < num1; i++) { for (j = 0; j < num2; j++) { for (k = 0; k < num3; k++) { val = m[i + j*somenum + k*2] if (val != 0) for (l = start; l <= end; l++) { someFunctionThatWritesIntoGlobalArray((i + l), j, k, (someFunctionThatGetsValueFromAnotherArray((i + l), j, k) * val)); } } } for (p = 0; p < num4; p++) { m[p] = 0; } } Thanks for reading, phew! Well I am noticing a very minor difference in the results (0.999967[omp] against 1[serial]), when I use the above (which is 3 times faster) against the serial implementation. Now I know I am doing a mistake here...especially the connection between loops is evident. Is it possible to parallelize this using omp sections? I tried some options like making shared(p) {doing this, I got correct values, as in the serial form}, but there was no speedup then. Any general advice on handling openmp pragmas over a slew of for loops would also be great for me!

    Read the article

  • The correct usage of nested #pragma omp for directives

    - by GoldenLee
    The following code runs like a charm before OpenMP parallelization was applied. In fact, the following code was in a state of endless loop! I'm sure that's result from my incorrect use to the OpenMP directives. Would you please show me the correct way? Thank you very much. #pragma omp parallel for for (int nY = nYTop; nY <= nYBottom; nY++) { for (int nX = nXLeft; nX <= nXRight; nX++) { // Use look-up table for performance dLon = theApp.m_LonLatLUT.LonGrid()[nY][nX] + m_FavoriteSVISSRParams.m_dNadirLon; dLat = theApp.m_LonLatLUT.LatGrid()[nY][nX]; // If you don't want to use longitude/latitude look-up table, uncomment the following line //NOMGeoLocate.XYToGEO(dLon, dLat, nX, nY); if (dLon > 180 || dLat > 180) { continue; } if (Navigation.GeoToXY(dX, dY, dLon, dLat, 0) > 0) { continue; } // Skip void data scanline dY = dY - nScanlineOffset; // Compute coefficients as well as its four neighboring points' values nX1 = int(dX); nX2 = nX1 + 1; nY1 = int(dY); nY2 = nY1 + 1; dCx = dX - nX1; dCy = dY - nY1; dP1 = pIRChannelData->operator [](nY1)[nX1]; dP2 = pIRChannelData->operator [](nY1)[nX2]; dP3 = pIRChannelData->operator [](nY2)[nX1]; dP4 = pIRChannelData->operator [](nY2)[nX2]; // Bilinear interpolation usNomDataBlock[nY][nX] = (unsigned short)BilinearInterpolation(dCx, dCy, dP1, dP2, dP3, dP4); } }

    Read the article

  • openmp sections running sequentially

    - by chi42
    I have the following code: #pragma omp parallel sections private(x,y,cpsrcptr) firstprivate(srcptr) lastprivate(srcptr) { #pragma omp section { //stuff } #pragma omp section { //stuff } } According to the Zoom profiler, two threads are created, one thread executes both the sections, and the other thread simply blocks! Has anyone encountered anything like this before? (And yes, I do have a dual core machine).

    Read the article

  • Small openmp programm freezes sometimes (gcc, c, linux)

    - by osgx
    Hello Just write a small omp test, and it does not work correctly all the times: #include <omp.h> int main() { int i,j=0; #pragma omp parallel for(i=0;i<1000;i++) { #pragma omp barrier j+= j^i; } return j; } The usage of j for writing from all threads is incorrect in this example, BUT there must be only nondeterministic value of j I have a freeze. Compiled with gcc-4.3.1 -fopenmp a.c -o gcc -static Run on 4-core x86_Core2 Linux server: $ ./gcc and got freeze (sometimes; like 1 freeze for 4-5 fast runs). Strace: [pid 13118] <... futex resumed> ) = 0 [pid 13118] futex(0x80d3014, FUTEX_WAIT, 2, NULL <unfinished ...> [pid 13120] <... futex resumed> ) = 0 [pid 13119] futex(0x80d3014, FUTEX_WAIT, 2, NULL <unfinished ...> [pid 13120] futex(0x80d3014, FUTEX_WAKE, 1) = 1 [pid 13120] futex(0x80cd798, FUTEX_WAIT, 1, NULL <unfinished ...> [pid 13109] <... futex resumed> ) = 0 [pid 13109] futex(0x80d3014, FUTEX_WAKE, 1) = 1 [pid 13109] futex(0x80d3020, FUTEX_WAIT, 251, NULL <unfinished ...> [pid 13118] <... futex resumed> ) = 0 [pid 13118] futex(0x80d3014, FUTEX_WAKE, 1) = 1 [pid 13119] <... futex resumed> ) = 0 [pid 13118] futex(0x80d3020, FUTEX_WAIT, 251, NULL <unfinished ...> [pid 13119] futex(0x80d3014, FUTEX_WAKE, 1) = 0 [pid 13119] futex(0x80d3020, FUTEX_WAIT, 251, NULL <freeze> Why do I have a freeze (deadlock)?

    Read the article

  • openmp in mex : stackoverflow error

    - by Edwin
    i have got the following fraction of code that getting me the stack overflow error #pragma omp parallel shared(Mo1, Mo2, sum_normalized_p_gn, Data, Mean_Out,Covar_Out,Prior_Out, det) private(i) num_threads( number_threads ) { //every thread has a new copy double* normalized_p_gn = (double*)malloc(NMIX*sizeof(double)); #pragma omp critical { int id = omp_get_thread_num(); int threads = omp_get_num_threads(); mexEvalString("drawnow"); } #pragma omp for //some parallel process..... } the variables declared in the shared are created by malloc. and they consumes with large amount of memory there are 2 questions regarding to the above code. 1) why this would generate the stack overflow error( i.e. segmentation fault) before it goes into the parallel for loop? it works fine when it runs in the sequential mode.... 2) am i right to dynamic allocate memory for each thread like "normalized_p_gn" above? Regards Edwin

    Read the article

  • Linker library for OpenMP for Snow Leopard?

    - by unknownthreat
    Currently, I am trying out OpenMP on XCode 3.2.2 on Snow Leopard: #include <omp.h> #include <iostream> #include <stdio.h> int main (int argc, char * const argv[]) { #pragma omp parallel printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads()); return 0; } I didn't include any linking libraries yet, so the linker complains: "_omp_get_thread_num", referenced from: _main in main.o "_omp_get_num_threads", referenced from: _main in main.o OK, fine, no problem, I take a look in the existing framework, looking for keywords such as openmp or omp... here comes the problem, where is the linking library? Or should I say, what is the name of the linking library for openMP? Is it dylib, framework or what? Or do I need to get it from somewhere first?

    Read the article

  • Synchronisation construct inside pragma for

    - by Sayan Ghosh
    Hi, I have a program block like: for (iIndex1=0; iIndex1 < iSize; iIndex1++) { for (iIndex2=iIndex1+1; iIndex2 < iSize; iIndex2++) { iCount++; fDist =(*this)[iIndex1].distance( (*this)[iIndex2] ); m_oPDF.addPairDistance( fDist ); if ((bShowProgress) && (iCount % 1000000 == 0)) xyz_exception::ui()->progress( iCount, (size()-1)*((size()-1))/2 ); } } } } I have tried parallelising the inner and outer loop and by putting iCount in a critical region. What would be the best approach to parallelise this? If I wrap iCount with omp single or omp atomic then the code gives an error and I figured out that would be invalid inside omp for. I guess I am adding many extraneous stuffs to paralellise this. Need some advice... Thanks, Sayan

    Read the article

  • how to implement a "soft barrier" in multithreaded c++

    - by Jason
    I have some multithreaded c++ code with the following structure: do_thread_specific_work(); update_shared_variables(); //checkpoint A do_thread_specific_work_not_modifying_shared_variables(); //checkpoint B do_thread_specific_work_requiring_all_threads_have_updated_shared_variables(); What follows checkpoint B is work that could have started if all threads have reached only checkpoint A, hence my notion of a "soft barrier". Typically multithreading libraries only provide "hard barriers" in which all threads must reach some point before any can continue. Obviously a hard barrier could be used at checkpoint B. Using a soft barrier can lead to better execution time, especially since the work between checkpoints A and B may not be load-balanced between the threads (i.e. 1 slow thread who has reached checkpoint A but not B could be causing all the others to wait at the barrier just before checkpoint B). I've tried using atomics to synchronize things and I know with 100% certainty that is it NOT guaranteed to work. For example using openmp syntax, before the parallel section start with: shared_thread_counter = num_threads; //known at compile time #pragma omp flush Then at checkpoint A: #pragma omp atomic shared_thread_counter--; Then at checkpoint B (using polling): #pragma omp flush while (shared_thread_counter > 0) { usleep(1); //can be removed, but better to limit memory bandwidth #pragma omp flush } I've designed some experiments in which I use an atomic to indicate that some operation before it is finished. The experiment would work with 2 threads most of the time but consistently fail when I have lots of threads (like 20 or 30). I suspect this is because of the caching structure of modern CPUs. Even if one thread updates some other value before doing the atomic decrement, it is not guaranteed to be read by another thread in that order. Consider the case when the other value is a cache miss and the atomic decrement is a cache hit. So back to my question, how to CORRECTLY implement this "soft barrier"? Is there any built-in feature that guarantees such functionality? I'd prefer openmp but I'm familiar with most of the other common multithreading libraries. As a workaround right now, I'm using a hard barrier at checkpoint B and I've restructured my code to make the work between checkpoint A and B automatically load-balancing between the threads (which has been rather difficult at times). Thanks for any advice/insight :)

    Read the article

  • Time with and without OpenMP

    - by was
    I have a question.. I tried to improve a well known program algorithm in C, FOX algorithm for matrix multiplication.. relative link without openMP: (http://web.mst.edu/~ercal/387/MPI/ppmpi_c/chap07/fox.c). The initial program had only MPI and I tried to insert openMP in the matrix multiplication method, in order to improve the time of computation: (This program runs in a cluster and computers have 2 cores, thus I created 2 threads.) The problem is that there is no difference of time, with and without openMP. I observed that using openMP sometimes, time is equivalent or greater than the time without openMP. I tried to multiply two 600x600 matrices. void Local_matrix_multiply( LOCAL_MATRIX_T* local_A /* in */, LOCAL_MATRIX_T* local_B /* in */, LOCAL_MATRIX_T* local_C /* out */) { int i, j, k; chunk = CHUNKSIZE; // 100 #pragma omp parallel shared(local_A, local_B, local_C, chunk, nthreads) private(i,j,k,tid) num_threads(2) { /* tid = omp_get_thread_num(); if(tid == 0){ nthreads = omp_get_num_threads(); printf("O Pollaplasiamos pinakwn ksekina me %d threads\n", nthreads); } printf("Thread %d use the matrix: \n", tid); */ #pragma omp for schedule(static, chunk) for (i = 0; i < Order(local_A); i++) for (j = 0; j < Order(local_A); j++) for (k = 0; k < Order(local_B); k++) Entry(local_C,i,j) = Entry(local_C,i,j) + Entry(local_A,i,k)*Entry(local_B,k,j); } //end pragma omp parallel } /* Local_matrix_multiply */

    Read the article

  • OpenMP: Get total number of running threads

    - by Konrad Rudolph
    I need to know the total number of threads that my application has spawned via OpenMP. Unfortunately, the omp_get_num_threads() function does not work here since it only yields the number of threads in the current team. However, my code runs recursively (divide and conquer, basically) and I want to spawn new threads as long as there are still idle processors, but no more. Is there a way to get around the limitations of omp_get_num_threads and get the total number of running threads? If more detail is required, consider the following pseudo-code that models my workflow quite closely: function divide_and_conquer(Job job, int total_num_threads): if job.is_leaf(): # Recurrence base case. job.process() return left, right = job.divide() current_num_threads = omp_get_num_threads() if current_num_threads < total_num_threads: # (1) #pragma omp parallel num_threads(2) #pragma omp section divide_and_conquer(left, total_num_threads) #pragma omp section divide_and_conquer(right, total_num_threads) else: divide_and_conquer(left, total_num_threads) divide_and_conquer(right, total_num_threads) job = merge(left, right) If I call this code with a total_num_threads value of 4, the conditional annotated with (1) will always evaluate to true (because each thread team will contain at most two threads) and thus the code will always spawn two new threads, no matter how many threads are already running at a higher level. I am searching for a platform-independent way of determining the total number of threads that are currently running in my application.

    Read the article

  • Segmentation fault while matrix multiplication using openMp?

    - by harshit
    My matrix multiplication code is int matMul(int ld, double** matrix) { //local variables initialize omp_set_num_threads(nthreads); #pragma omp parallel private(tid,diag,ld) shared(i,j,k,matrix) { /* Obtain and print thread id */ tid = omp_get_thread_num(); for ( k=0; k<ld; k++) { if (matrix[k][k] == 0.0) { error = 1; return error; } diag = 1.0 / matrix[k][k]; #pragma omp for for ( i=k+1; i < ld; i++) { matrix[i][k] = diag * matrix[i][k]; } for ( j=k+1; j<ld; j++) { for ( i=k+1; i<ld; i++) { matrix[i][j] = matrix[i][j] - matrix[i][k] * matrix[k][j]; } } } } return error; } I assume that it is because of matrix object only but why will it be null even though it is passed as a parameter..

    Read the article

  • How to install OpenCV 2.0 on win32

    - by Jive Dadson
    I need to install OpenCV on Win32. I do not have it installed currently. I downloaded OpenCV-2.0.0a-win32.exe and ran it. What the heck do I do now? There are no .lib's and whatnot. I found some instructions for building the release using cmake at http://opencv.willowgarage.com/wiki/InstallGuide . I downloaded the latest and greatest cmake, and tried to follow the instructions, but I was guessing. No joy. I specified VC++9 when I did the "configure," but cmake built a VC++ 6 dsw file. No vcproj. I converted the dsw into a vc++9 vcproj anyway, just to see if it would work. Nope. It compiled lots of files, but many failed because it could not find omp.h. Sure enough, it's not there, anywhere. The build log said, 'A tool returned an error code from "Performing Custom Build Step".' I am lost. Ideally, I would like to find a full installation with all the files pre-built for Win32 vc++ 2008. Failing that, I need instructions that even I can follow. Short sentences and small words, but lots of them. Please help! UPDATE: I tried to build just CXCORE. It complained, "cannot open file 'VCOMPD.lib'" There's that OMP again.

    Read the article

  • [C++][OpenMP] Proper use of "atomic directive" to lock STL container

    - by conradlee
    I have a large number of sets of integers, which I have, in turn, put into a vector of pointers. I need to be able to update these sets of integers in parallel without causing a race condition. More specifically. I am using OpenMP's "parallel for" construct. For dealing with shared resources, OpenMP offers a handy "atomic directive," which allows one to avoid a race condition on a specific piece of memory without using locks. It would be convenient if I could use the "atomic directive" to prevent simultaneous updating to my integer sets, however, I'm not sure whether this is possible. Basically, I want to know whether the following code could lead to a race condition vector< set<int>* > membershipDirectory(numSets, new set<int>); #pragma omp for schedule(guided,expandChunksize) for(int i=0; i<100; i++) { set<int>* sp = membershipDirectory[5]; #pragma omp atomic sp->insert(45); } (Apologies for any syntax errors in the code---I hope you get the point) I have seen a similar example of this for incrementing an integer, but I'm not sure whether it works when working with a pointer to a container as in my case.

    Read the article

  • OpenMP implementations in VC++ 2008, 2010

    - by John
    Depending on implementation, OMP can be quite useful to parallelize fairly arbitrary bits of code - e.g a parallel section inside a method that calls two independent methods - or it can be bad. It depends on how threads are created/cached, I think. How does the VC++ 2008 implementation work? And is the 2010 implementation significantly different in terms of features and performance/flexibility?

    Read the article

  • Iteration through std containers in openmp

    - by Sasun Hambardzumyan
    Hi, people. I try to use openmp for multithreading the loop through std::set. When I write the following code - #pragma omp parallel for for (std::set<A>::const_iterator i = s.begin(); i != s.end(); ++i) { const A a = *i; operate(a); } I get an error - error: invalid type for iteration variable 'i' error: invalid controlling predicate error: invalid increment expression. So is there an another way to correct iteration in std containers using openmp? There is a workaround to use int i and iterate from 0 to s.size() and using iterator inside a loop body, but this is not looks good.

    Read the article

  • No speed-up with useless printf's using OpenMP

    - by t2k32316
    I just wrote my first OpenMP program that parallelizes a simple for loop. I ran the code on my dual core machine and saw some speed up when going from 1 thread to 2 threads. However, I ran the same code on a school linux server and saw no speed-up. After trying different things, I finally realized that removing some useless printf statements caused the code to have significant speed-up. Below is the main part of the code that I parallelized: #pragma omp parallel for private(i) for(i = 2; i <= n; i++) { printf("useless statement"); prime[i-2] = is_prime(i); } I guess that the implementation of printf has significant overhead that OpenMP must be duplicating with each thread. What causes this overhead and why can OpenMP not overcome it?

    Read the article

  • Why aren't unsigned OpenMP index variables allowed?

    - by Moe
    I have a loop in my C++/OpenMP code that looks like this: #pragma omp parallel for for(unsigned int i=0; i<count; i++) { // do stuff } When I compile it (with Visual Studio 2005) I get the following error: error C3016: 'i' : index variable in OpenMP 'for' statement must have signed integral type I understand that the error occurs because i is unsigned instead of signed, and changing i to be signed removed this error. What I want to know is why is this an error? Why aren't unsigned index variables allowed? Looking at the MSDN page for this error gives me no clues.

    Read the article

  • Are there deprecated practices for multithread and multiprocessor programming that I should no longer use?

    - by DeveloperDon
    In the early days of FORTRAN and BASIC, essentially all programs were written with GOTO statements. The result was spaghetti code and the solution was structured programming. Similarly, pointers can have difficult to control characteristics in our programs. C++ started with plenty of pointers, but use of references are recommended. Libraries like STL can reduce some of our dependency. There are also idioms to create smart pointers that have better characteristics, and some version of C++ permit references and managed code. Programming practices like inheritance and polymorphism use a lot of pointers behind the scenes (just as for, while, do structured programming generates code filled with branch instructions). Languages like Java eliminate pointers and use garbage collection to manage dynamically allocated data instead of depending on programmers to match all their new and delete statements. In my reading, I have seen examples of multi-process and multi-thread programming that don't seem to use semaphores. Do they use the same thing with different names or do they have new ways of structuring protection of resources from concurrent use? For example, a specific example of a system for multithread programming with multicore processors is OpenMP. It represents a critical region as follows, without the use of semaphores, which seem not to be included in the environment. th_id = omp_get_thread_num(); #pragma omp critical { cout << "Hello World from thread " << th_id << '\n'; } This example is an excerpt from: http://en.wikipedia.org/wiki/OpenMP Alternatively, similar protection of threads from each other using semaphores with functions wait() and signal() might look like this: wait(sem); th_id = get_thread_num(); cout << "Hello World from thread " << th_id << '\n'; signal(sem); In this example, things are pretty simple, and just a simple review is enough to show the wait() and signal() calls are matched and even with a lot of concurrency, thread safety is provided. But other algorithms are more complicated and use multiple semaphores (both binary and counting) spread across multiple functions with complex conditions that can be called by many threads. The consequences of creating deadlock or failing to make things thread safe can be hard to manage. Do these systems like OpenMP eliminate the problems with semaphores? Do they move the problem somewhere else? How do I transform my favorite semaphore using algorithm to not use semaphores anymore?

    Read the article

  • Thread-safty of boost RNG

    - by Maciej Piechotka
    I have a loop which should be nicely pararellized by insering one openmp pragma: boost::normal_distribution<double> ddist(0, pow(retention, i - 1)); boost::variate_generator<gen &, BOOST_TYPEOF(ddist)> dgen(rng, ddist); // Diamond const std::uint_fast32_t dno = 1 << i - 1; // #pragma omp parallel for for (std::uint_fast32_t x = 0; x < dno; x++) for (std::uint_fast32_t y = 0; y < dno; y++) { const std::uint_fast32_t diff = size/dno; const std::uint_fast32_t x1 = x*diff, x2 = (x + 1)*diff; const std::uint_fast32_t y1 = y*diff, y2 = (y + 1)*diff; double avg = (arr[x1][y1] + arr[x1][y2] + arr[x2][y1] + arr[x2][y2])/4; arr[(x1 + x2)/2][(y1 + y2)/2] = avg + dgen(); } (unless I make an error each execution does not depend on others at all. Sorry that not all of code is inserted). However my question is - are boost RNG thread-safe? They seems to refer to gcc code for gcc so even if gcc code is thread-safe it may not be the case for other platforms.

    Read the article

  • What limits scaling in this simple OpenMP program?

    - by Douglas B. Staple
    I'm trying to understand limits to parallelization on a 48-core system (4xAMD Opteron 6348, 2.8 Ghz, 12 cores per CPU). I wrote this tiny OpenMP code to test the speedup in what I thought would be the best possible situation (the task is embarrassingly parallel): // Compile with: gcc scaling.c -std=c99 -fopenmp -O3 #include <stdio.h> #include <stdint.h> int main(){ const uint64_t umin=1; const uint64_t umax=10000000000LL; double sum=0.; #pragma omp parallel for reduction(+:sum) for(uint64_t u=umin; u<umax; u++) sum+=1./u/u; printf("%e\n", sum); } I was surprised to find that the scaling is highly nonlinear. It takes about 2.9s for the code to run with 48 threads, 3.1s with 36 threads, 3.7s with 24 threads, 4.9s with 12 threads, and 57s for the code to run with 1 thread. Unfortunately I have to say that there is one process running on the computer using 100% of one core, so that might be affecting it. It's not my process, so I can't end it to test the difference, but somehow I doubt that's making the difference between a 19~20x speedup and the ideal 48x speedup. To make sure it wasn't an OpenMP issue, I ran two copies of the program at the same time with 24 threads each (one with umin=1, umax=5000000000, and the other with umin=5000000000, umax=10000000000). In that case both copies of the program finish after 2.9s, so it's exactly the same as running 48 threads with a single instance of the program. What's preventing linear scaling with this simple program?

    Read the article

  • Valgrind says "stack allocation," I say "heap allocation"

    - by Joel J. Adamson
    Dear Friends, I am trying to trace a segfault with valgrind. I get the following message from valgrind: ==3683== Conditional jump or move depends on uninitialised value(s) ==3683== at 0x4C277C5: sparse_mat_mat_kron (sparse.c:165) ==3683== by 0x4C2706E: rec_mating (rec.c:176) ==3683== by 0x401C1C: age_dep_iterate (age_dep.c:287) ==3683== by 0x4014CB: main (age_dep.c:92) ==3683== Uninitialised value was created by a stack allocation ==3683== at 0x401848: age_dep_init_params (age_dep.c:131) ==3683== ==3683== Conditional jump or move depends on uninitialised value(s) ==3683== at 0x4C277C7: sparse_mat_mat_kron (sparse.c:165) ==3683== by 0x4C2706E: rec_mating (rec.c:176) ==3683== by 0x401C1C: age_dep_iterate (age_dep.c:287) ==3683== by 0x4014CB: main (age_dep.c:92) ==3683== Uninitialised value was created by a stack allocation ==3683== at 0x401848: age_dep_init_params (age_dep.c:131) However, here's the offending line: /* allocate mating table */ age_dep_data->mtable = malloc (age_dep_data->geno * sizeof (double *)); if (age_dep_data->mtable == NULL) error (ENOMEM, ENOMEM, nullmsg, __LINE__); for (int j = 0; j < age_dep_data->geno; j++) { 131=> age_dep_data->mtable[j] = calloc (age_dep_data->geno, sizeof (double)); if (age_dep_data->mtable[j] == NULL) error (ENOMEM, ENOMEM, nullmsg, __LINE__); } What gives? I thought any call to malloc or calloc allocated heap space; there is no other variable allocated here, right? Is it possible there's another allocation going on (the offending stack allocation) that I'm not seeing? You asked to see the code, here goes: /* Copyright 2010 Joel J. Adamson <[email protected]> $Id: age_dep.c 1010 2010-04-21 19:19:16Z joel $ age_dep.c:main file Joel J. Adamson -- http://www.unc.edu/~adamsonj Servedio Lab University of North Carolina at Chapel Hill CB #3280, Coker Hall Chapel Hill, NC 27599-3280 This file is part of an investigation of age-dependent sexual selection. This code is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with haploid. If not, see <http://www.gnu.org/licenses/>. */ #include "age_dep.h" /* global variables */ extern struct argp age_dep_argp; /* global error message variables */ char * nullmsg = "Null pointer: %i"; /* error message for conversions: */ char * errmsg = "Representation error: %s"; /* precision for formatted output: */ const char prec[] = "%-#9.8f "; const size_t age_max = AGEMAX; /* maximum age of males */ static int keep_going_p = 1; int main (int argc, char ** argv) { /* often used counters: */ int i, j; /* read the command line */ struct age_dep_args age_dep_args = { NULL, NULL, NULL }; argp_parse (&age_dep_argp, argc, argv, 0, 0, &age_dep_args); /* set the parameters here: */ /* initialize an age_dep_params structure, set the members */ age_dep_params_t * params = malloc (sizeof (age_dep_params_t)); if (params == NULL) error (ENOMEM, ENOMEM, nullmsg, __LINE__); age_dep_init_params (params, &age_dep_args); /* initialize frequencies: this initializes a list of pointers to initial frqeuencies, terminated by a NULL pointer*/ params->freqs = age_dep_init (&age_dep_args); params->by = 0.0; /* what range of parameters do we want, and with what stepsize? */ /* we should go from 0 to half-of-theta with a step size of about 0.01 */ double from = 0.0; double to = params->theta / 2.0; double stepsz = 0.01; /* did you think I would spell the whole word? */ unsigned int numparts = floor(to / stepsz); do { #pragma omp parallel for private(i) firstprivate(params) \ shared(stepsz, numparts) for (i = 0; i < numparts; i++) { params->by = i * stepsz; int tries = 0; while (keep_going_p) { /* each time through, modify mfreqs and mating table, then go again */ keep_going_p = age_dep_iterate (params, ++tries); if (keep_going_p == ERANGE) error (ERANGE, ERANGE, "Failure to converge\n"); } fprintf (stdout, "%i iterations\n", tries); } /* for i < numparts */ params->freqs = params->freqs->next; } while (params->freqs->next != NULL); return 0; } inline double age_dep_pmate (double age_dep_t, unsigned int genot, double bp, double ba) { /* the probability of mating between these phenotypes */ /* the female preference depends on whether the female has the preference allele, the strength of preference (parameter bp) and the male phenotype (age_dep_t); if the female lacks the preference allele, then this will return 0, which is not quite accurate; it should return 1 */ return bits_isset (genot, CLOCI)? 1.0 - exp (-bp * age_dep_t) + ba: 1.0; } inline double age_dep_trait (int age, unsigned int genot, double by) { /* return the male trait, a function of the trait locus, age, the age-dependent scaling parameter (bx) and the males condition genotype */ double C; double T; /* get the male's condition genotype */ C = (double) bits_popcount (bits_extract (0, CLOCI, genot)); /* get his trait genotype */ T = bits_isset (genot, CLOCI + 1)? 1.0: 0.0; /* return the trait value */ return T * by * exp (age * C); } int age_dep_iterate (age_dep_params_t * data, unsigned int tries) { /* main driver routine */ /* number of bytes for female frequencies */ size_t geno = data->age_dep_data->geno; size_t genosize = geno * sizeof (double); /* female frequencies are equal to male frequencies at birth (before selection) */ double ffreqs[geno]; if (ffreqs == NULL) error (ENOMEM, ENOMEM, nullmsg, __LINE__); /* do not set! Use memcpy (we need to alter male frequencies (selection) without altering female frequencies) */ memmove (ffreqs, data->freqs->freqs[0], genosize); /* for (int i = 0; i < geno; i++) */ /* ffreqs[i] = data->freqs->freqs[0][i]; */ #ifdef PRMTABLE age_dep_pr_mfreqs (data); #endif /* PRMTABLE */ /* natural selection: */ age_dep_ns (data); /* normalized mating table with new frequencies */ age_dep_norm_mtable (ffreqs, data); #ifdef PRMTABLE age_dep_pr_mtable (data); #endif /* PRMTABLE */ double * newfreqs; /* mutate here */ /* i.e. get the new frequency of 0-year-olds using recombination; */ newfreqs = rec_mating (data->age_dep_data); /* return block */ { if (sim_stop_ck (data->freqs->freqs[0], newfreqs, GENO, TOL) == 0) { /* if we have converged, stop the iterations and handle the data */ age_dep_sim_out (data, stdout); return 0; } else if (tries > MAXTRIES) return ERANGE; else { /* advance generations */ for (int j = age_max - 1; j < 0; j--) memmove (data->freqs->freqs[j], data->freqs->freqs[j-1], genosize); /* advance the first age-class */ memmove (data->freqs->freqs[0], newfreqs, genosize); return 1; } } } void age_dep_ns (age_dep_params_t * data) { /* calculate the new frequency of genotypes given additive fitness and selection coefficient s */ size_t geno = data->age_dep_data->geno; double w[geno]; double wbar, dtheta, ttheta, dcond, tcond; double t, cond; /* fitness parameters */ double mu, nu; mu = data->wparams[0]; nu = data->wparams[1]; /* calculate fitness */ for (int j = 0; j < age_max; j++) { int i; for (i = 0; i < geno; i++) { /* calculate male trait: */ t = age_dep_trait(j, i, data->by); /* calculate condition: */ cond = (double) bits_popcount (bits_extract(0, CLOCI, i)); /* trait-based fitness term */ dtheta = data->theta - t; ttheta = (dtheta * dtheta) / (2.0 * nu * nu); /* condition-based fitness term */ dcond = CLOCI - cond; tcond = (dcond * dcond) / (2.0 * mu * mu); /* calculate male fitness */ w[i] = 1 + exp(-tcond) - exp(-ttheta); } /* calculate mean fitness */ /* as long as we calculate wbar before altering any values of freqs[], we're safe */ wbar = gen_mean (data->freqs->freqs[j], w, geno); for (i = 0; i < geno; i++) data->freqs->freqs[j][i] = (data->freqs->freqs[j][i] * w[i]) / wbar; } } void age_dep_norm_mtable (double * ffreqs, age_dep_params_t * params) { /* this function produces a single mating table that forms the input for recombination () */ /* i is female genotype; j is male genotype; k is male age */ int i,j,k; double norm_denom; double trait; size_t geno = params->age_dep_data->geno; for (i = 0; i < geno; i++) { double norm_mtable[geno]; /* initialize the denominator: */ norm_denom = 0.0; /* find the probability of mating and add it to the denominator */ for (j = 0; j < geno; j++) { /* initialize entry: */ norm_mtable[j] = 0.0; for (k = 0; k < age_max; k++) { trait = age_dep_trait (k, j, params->by); norm_mtable[j] += age_dep_pmate (trait, i, params->bp, params->ba) * (params->freqs->freqs)[k][j]; } norm_denom += norm_mtable[j]; } /* now calculate entry (i,j) */ for (j = 0; j < geno; j++) params->age_dep_data->mtable[i][j] = (ffreqs[i] * norm_mtable[j]) / norm_denom; } } My current suspicion is the array newfreqs: I can't memmove, memcpy or assign a stack variable then hope it will persist, can I? rec_mating() returns double *.

    Read the article

1