Search Results

Search found 26 results on 2 pages for 'gprof'.

Page 1/2 | 1 2  | Next Page >

  • Using gprof with sockets

    - by Chris
    I have a program I want to profile with gprof. The problem (seemingly) is that it uses sockets. So I get things like this: ::select(): Interrupted system call I hit this problem a while back, gave up, and moved on. But I would really like to be able to profile my code, using gprof if possible. What can I do? Is there a gprof option I'm missing? A socket option? Is gprof totally useless in the presence of these types of system calls? If so, is there a viable alternative? EDIT: Platform: Linux 2.6 (x64) GCC 4.4.1 gprof 2.19

    Read the article

  • gprof and execl() - is it possible?

    - by Chris
    Background: I have a game (an old-school-esque MUD) which I've been attempting to profile with gprof. The documentation of gprof (on Linux 2.6) states that The profiled program must call "exit"(2) or return normally for the profiling information to be saved in the gmon.out file. Now, if I kill the server with the shutdown command, the application "returns normally" (i.e., main() returns) and I get a gmon.out to analyze. However, it's far more common to reboot the server. The reboot command does the following: Writes usernames and socket FD numbers to disc. Makes a call to execl(). The new process looks for the stored data, picks up the FDs, and moves on. I see the following error on the command line, as the whole process fails: Profiling timer expired ./program Question: Is it possible to get a gmon.out file from the execl()-calling process? Perhaps some environmental parameter to execl(), or else perhaps a different, gprof-friendly, system call to achieve the same effect (beginning a new process while preserving file descriptors)?

    Read the article

  • Compiling in g++ for gprof

    - by myahya
    I do not understand the documentation for gprof regarding how to compile your program for profiling with gprof. In g++, is it required to compile with the -g option (debugging information) in a addition to the -pg option or not. In each case I get different results, and I would like to see where the bottlenecks in my application are in release mode, not in debug mode, where many optimizations are left out by the compiler (e.g. inlining)

    Read the article

  • Get gprof to profile based on wall-clock time?

    - by jetwolf
    My understanding is that by default gprof takes into account CPU time. Is there a way to get it to profile based on wall-clock time? My program does a lot of disk i/o, so the CPU time it uses only represents a fraction of the actual execution time. I need to know which portions of the disk i/o take up the most time.

    Read the article

  • how to profile multi threaded c++ app on linux?

    - by anon
    I used to do all my linux profiling with gprof. However, with my multi threaded app, it's output appears to be inconsistent. Now, I dug this up http://sam.zoy.org/writings/programming/gprof.html howver, it's from a long time ago -- and in my gprof output, it appears my gprof is listing functions used by non-main threads. So, my questions are: 1) in 2010, can I easily use gprof to profile multi threaded linux c++ apps? (ubuntu 9.10) 2) what other tools should I look into for profiling? thanks!

    Read the article

  • Is it possible to get a graphical representation of gprof results?

    - by Werner
    Hi, I am interested in getting the profiling of some number crunching program. I compiled it with -g and -pg options and linked it and got it gmon.out. After reading the info (plain text) it looks a bit ugly. I wonder if there are some open source tools for getting a graphical representation of the 10 functions where the program spends the most of the time as well as a flux diagram. Thanks

    Read the article

  • in free(): error: junk pointer, too high to make sense Segmentation fault: 11 (core dumped) gprof

    - by Mayank
    I am trying to profile my application. For this I compiled my code with -pg and -lc_p option, it compiled successfully During run time I am getting the following error. in free(): error: junk pointer, too high to make sense Segmentation fault: 11 (core dumped) Doing GDB gives error as. (gdb) b main Breakpoint 1 at 0x5124d4: (gdb) r warning: Unable to get location for thread creation breakpoint: generic error [New LWP 100085] cacheIp in free(): error: junk pointer, too high to make sense Program received signal SIGSEGV, Segmentation fault. [Switching to LWP 100085] 0x00000000006c3a1f in pthread_sigmask () My application is multi threaded and is a combination of C and C++ code. uname -a FreeBSD 6.3-RELEASE FreeBSD 6.3-RELEASE #0: Wed Jan 16 01:43:02 UTC 2008 [email protected]:/usr/obj/usr/src/sys/SMP amd64 Why is the code crashing. Am I missing something.

    Read the article

  • Strange profiler behavior: same functions, different performances

    - by arthurprs
    I was learning to use gprof and then i got weird results for this code: int one(int a, int b) { return a / (b + 1); } int two(int a, int b) { return a / (b + 1); } int main() { for (int i = 1; i < 30000000; i++) { two(i, i * 2); one(i, i * 2); } return 0; } and this is the profiler output % cumulative self self total time seconds seconds calls ns/call ns/call name 48.39 0.90 0.90 29999999 30.00 30.00 one(int, int) 40.86 1.66 0.76 29999999 25.33 25.33 two(int, int) 10.75 1.86 0.20 main If i call one then two the result is the inverse, two takes more time than one both are the same functions, but the first calls always take less time then the second Why is that? Note: The assembly code is exactly the same and code is being compiled with no optimizations

    Read the article

  • Measuring execution time of selected loops

    - by user95281
    I want to measure the running times of selected loops in a C program so as to see what percentage of the total time for executing the program (on linux) is spent in these loops. I should be able to specify the loops for which the performance should be measured. I have tried out several tools (vtune, hpctoolkit, oprofile) in the last few days and none of them seem to do this. They all find the performance bottlenecks and just show the time for those. Thats because these tools only store the time taken that is above a threshold (~1ms). So if one loop takes lesser time than that then its execution time won't be reported. The basic block counting feature of gprof depends on a feature in older compilers thats not supported now. I could manually write a simple timer using gettimeofday or something like that but for some cases it won't give accurate results. For ex: for (i = 0; i < 1000; ++i) { for (j = 0; j < N; ++j) { //do some work here } } Now here I want to measure the total time spent in the inner loop and I will have to put a call to gettimeofday inside the first loop. So gettimeofday itself will get called a 1000 times which introduces its own overhead and the result will be inaccurate.

    Read the article

  • Why do debug symbols so adversely affect the performance of threaded applications on Linux?

    - by fluffels
    Hi. I'm writing a ray tracer. Recently, I added threading to the program to exploit the additional cores on my i5 Quad Core. In a weird turn of events the debug version of the application is now running slower, but the optimized build is running faster than before I added threading. I'm passing the "-g -pg" flags to gcc for the debug build and the "-O3" flag for the optimized build. Host system: Ubuntu Linux 10.4 AMD64. I know that debug symbols add significant overhead to the program, but the relative performance has always been maintained. I.e. a faster algorithm will always run faster in both debug and optimization builds. Any idea why I'm seeing this behavior?

    Read the article

  • Brief explanation for executables in a GNU/Clang Toolchain?

    - by ZhangChn
    I roughly understand that cc, ld and other parts are called in a certain sequence according to schemes like Makefiles etc. Some of those commands are used to generate those configs and Makefiles. And some other tools are used to deal with libraries. But what are other parts used for? How are they called in this process? Which tool would use various parser generators? Which part is optional? Why? Is there a brief summary get these explained on how the tools in a GNU or LLVM/Clang toolchain are organised and called in a C/C++ project building? Thanks in advance. EDIT: Here is a list of executables for Clang/LLVM on Mac OS X: ar clang dsymutil gperf libtool nmedit rpcgen unwinddump as clang++ dwarfdump gprof lorder otool segedit vgrind asa cmpdylib dyldinfo indent m4 pagestuff size what bison codesign_allocate flex install_name_tool mig ranlib strip yacc c++ ctags flex++ ld mkdep rebase unifdef cc ctf_insert gm4 lex nm redo_prebinding unifdefall

    Read the article

  • Please recommend a Java profiler

    - by Yuval F
    I am looking for the Java equivalent of gprof. I did a little Java profiling using System.getCurrentMillis(), and saw several GUI tools which seem too much. A good compromise could be a text-based Java profiler, preferably free or low-cost, which works in either Windows XP or Linux.

    Read the article

  • Linux C++: how to profile time wasted due to cache misses?

    - by anon
    I know that I can use gprof to benchmark my code. However, I have this problem -- I have a smart pointer that has an extra level of indirection (think of it as a proxy object). As a result, I have this extra layer that effects pretty much all functions, and screws with caching. Is there a way to measure the time my CPU wastes due to cache misses? Thanks!

    Read the article

  • LInux C++: how do profile time wasted due to cache misses?

    - by anon
    I know that I can use gprof to benchmark my code. However, I have this problem -- I have a smart pointer that has an extra level of indirection (think of it as a proxy object). As a result, I have this extra layer that effects pretty much all functions, and screws with caching. Is there a way to measure the time my CPU wastes due to cache misses? Thanks!

    Read the article

  • Is there memory usage profiler available?

    - by prosseek
    For time profiler for XYZ, I can just run 'time XYZ', or if I have the source code in C/C++, I even can use gprof to get profiled results. Is there any similar tool for memory usage? Is there any tool I can use something like 'memory XYZ', to get info such as min/max/median memory usage? What tool do you use for memory profile with C++/Objective C/C#/Java?

    Read the article

  • Enable H.264 (or x264) in AVIDemux

    - by Thomas
    I am trying to get AVIDemux set up with the X264 codec using this tutorial. The following is what goes down when I get to the ./configure --enable-mp4-output command Thomas-Phillipss-MacBook:x264 tomdabomb2u$ sudo ./configure --enable-mp4-output Password: Unknown option --enable-mp4-output, ignored Found no assembler Minimum version is yasm-0.6.2 If you really want to compile without asm, configure with --disable-asm. So I tried it. Thomas-Phillipss-MacBook:x264 tomdabomb2u$ sudo ./configure --enable-mp4-output --disable-asm Unknown option --enable-mp4-output, ignored Warning: gpac is too old, update to 2007-06-21 UTC or later Platform: X86_64 System: MACOSX asm: no avs: no lavf: no ffms: no gpac: no pthread: yes filters: crop select_every debug: no gprof: no PIC: no shared: no visualize: no bit depth: 8 You can run 'make' or 'make fprofiled' now. I issued make, and then Thomas-Phillipss-MacBook:x264 tomdabomb2u$ ./x264 -v -q 20 -o foreman.mp4 foreman_part_qcif.yuv 176x144. And as expected, the results are: x264 [error]: not compiled with MP4 output support So I'm stuck. Any ideas?

    Read the article

  • Measuring daemon CPU utilization over a portion of it's wall clock run time

    - by WhirlWind
    I am dealing with a network-related daemon: it takes data in, processes it, and spits it out. I would like to increase the performance of this daemon by profiling it and reducing it's CPU utilization. I can do this easily on Linux with gprof. However, I would also like to use something like "time" to measure it's total CPU utilization over a period of time. If possible, I would like to time it over a period that is less than its total run time: thus, I would like to start the daemon, wait awhile, generate CPU statistics, stop generating them, then stop the daemon at some later time. The "time" command would work well for me, but it seems to require that I start and stop the daemon as a child of time. Is there a way to measure CPU utilization for only a portion of the daemon's wall clock time?

    Read the article

  • Deeper function profiling/emulation

    - by Syntax_Error
    Hello everyone Merry Christmas I need an advice I have the following code: int main() { int k=5000000; int p; int sum=0; for (p=0;p<k;p++) { sum+=p; } return 0; } When I assemble it I get main: pushl %ebp movl %esp, %ebp subl $16, %esp movl $5000000, -4(%ebp) movl $0, -12(%ebp) movl $0, -8(%ebp) jmp .L2 .L3: movl -8(%ebp), %eax addl %eax, -12(%ebp) addl $1, -8(%ebp) .L2: movl -8(%ebp), %eax cmpl -4(%ebp), %eax jl .L3 movl $0, %eax leave ret If I run it through gprof I get that main executed the most, which is quite obvious! Yet I want to go a step further and be able to know if L2, or L3 executed the most. here it is obvious that L3 executed the most. yet is there some kind of profiler, emulator that can give me that data for an entire code?

    Read the article

  • The cost of passing by shared_ptr

    - by Artem
    I use std::tr1::shared_ptr extensively throughout my application. This includes passing objects in as function arguments. Consider the following: class Dataset {...} void f( shared_ptr< Dataset const > pds ) {...} void g( shared_ptr< Dataset const > pds ) {...} ... While passing a dataset object around via shared_ptr guarantees its existence inside f and g, the functions may be called millions of times, which causes a lot of shared_ptr objects being created and destroyed. Here's a snippet of the flat gprof profile from a recent run: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 9.74 295.39 35.12 2451177304 0.00 0.00 std::tr1::__shared_count::__shared_count(std::tr1::__shared_count const&) 8.03 324.34 28.95 2451252116 0.00 0.00 std::tr1::__shared_count::~__shared_count() So, ~17% of the runtime was spent on reference counting with shared_ptr objects. Is this normal? A large portion of my application is single-threaded and I was thinking about re-writing some of the functions as void f( const Dataset& ds ) {...} and replacing the calls shared_ptr< Dataset > pds( new Dataset(...) ); f( pds ); with f( *pds ); in places where I know for sure the object will not get destroyed while the flow of the program is inside f(). But before I run off to change a bunch of function signatures / calls, I wanted to know what the typical performance hit of passing by shared_ptr was. Seems like shared_ptr should not be used for functions that get called very often. Any input would be appreciated. Thanks for reading. -Artem

    Read the article

  • gcc compilations (sometimes) result in cpu underload

    - by confusedCoder
    I have a larger C++ program which starts out by reading thousands of small text files into memory and storing data in stl containers. This takes about a minute. Periodically, a compilation will exhibit behavior where that initial part of the program will run at about 22-23% CPU load. Once that step is over, it goes back to ~100% CPU. It is more likely to happen with O2 flag turned on but not consistently. It happens even less often with the -p flag which makes it almost impossible to profile. I did capture it once but the gprof output wasn't helpful - everything runs with the same relative speed just at low cpu usage. I am quite certain that this has nothing to do with multiple cores. I do have a quad-core cpu, and most of the code is multi-threaded, but I tested this issue running a single thread. Also, when I run the problematic step in multiple threads, each thread only runs at ~20% CPU. I apologize ahead of time for the vagueness of the question but I have run out of ideas as to how to troubleshoot it further, so any hints might be helpful. UPDATE: Just to make sure it's clear, the problematic part of the code does sometimes (~30-40% of the compilations) run at 100% CPU, so it's hard to buy the (otherwise reasonable) argument that I/O is the bottleneck

    Read the article

  • Monitoring UDP socket in glib(mm) eats up CPU time

    - by Gyorgy Szekely
    Hi, I have a GTKmm Windows application (built with MinGW) that receives UDP packets (no sending). The socket is native winsock and I use glibmm IOChannel to connect it to the application main loop. The socket is read with recvfrom. My problem is: this setup eats 25% percent CPU time on a 3GHz workstation. Can somebody tell me why? The application is idle in this case, and if I remove the UDP code, CPU usage drops down to almost zero. As the application has to perform some CPU intensive tasks, I could image better ways to spend that 25% Here are some code excerpts: (sorry for the printf's ;) ) /* bind */ void UDPInterface::bindToPort(unsigned short port) { struct sockaddr_in target; WSADATA wsaData; target.sin_family = AF_INET; target.sin_port = htons(port); target.sin_addr.s_addr = 0; if ( WSAStartup ( 0x0202, &wsaData ) ) { printf("WSAStartup failed!\n"); exit(0); // :) WSACleanup(); } sock = socket( AF_INET, SOCK_DGRAM, 0 ); if (sock == INVALID_SOCKET) { printf("invalid socket!\n"); exit(0); } if (bind(sock,(struct sockaddr*) &target, sizeof(struct sockaddr_in) ) == SOCKET_ERROR) { printf("failed to bind to port!\n"); exit(0); } printf("[UDPInterface::bindToPort] listening on port %i\n", port); } /* read */ bool UDPInterface::UDPEvent(Glib::IOCondition io_condition) { recvfrom(sock, (char*)buf, BUF_SIZE*4, 0, NULL, NULL); /* process packet... */ } /* glibmm connect */ Glib::RefPtr channel = Glib::IOChannel::create_from_win32_socket(udp.sock); Glib::signal_io().connect( sigc::mem_fun(udp, &UDPInterface::UDPEvent), channel, Glib::IO_IN ); I've read here in some other question, and also in glib docs (g_io_channel_win32_new_socket()) that the socket is put into nonblocking mode, and it's "a side-effect of the implementation and unavoidable". Does this explain the CPU effect, it's not clear to me? Whether or not I use glib to access the socket or call recvfrom() directly doesn't seem to make much difference, since CPU is used up before any packet arrives and the read handler gets invoked. Also glibmm docs state that it's ok to call recvfrom() even if the socket is polled (Glib::IOChannel::create_from_win32_socket()) I've tried compiling the program with -pg and created a per function cpu usage report with gprof. This wasn't usefull because the time is not spent in my program, but in some external glib/glibmm dll.

    Read the article

1 2  | Next Page >