How does loop address alignment affect the speed on Intel x86_64?

Posted by Alexander Gololobov on Stack Overflow See other posts from Stack Overflow or by Alexander Gololobov
Published on 2010-12-25T22:40:25Z Indexed on 2010/12/25 22:54 UTC
Read the original article Hit count: 355

Filed under:

I'm seeing 15% performance degradation of the same C++ code compiled to exactly same machine instructions but located on differently aligned addresses. When my tiny main loop starts at 0x415220 it's faster then when it is at 0x415250. I'm running this on Intel Core2 Duo. I use gcc 4.4.5 on x86_64 Ubuntu.

Can anybody explain the cause of slowdown and how I can force gcc to optimally align the loop?

Here is the disassembly for both cases with profiler annotation:

  415220 576      12.56% |XXXXXXXXXXXXXX       48 c1 eb 08           shr    $0x8,%rbx
  415224 110       2.40% |XX                   0f b6 c3              movzbl %bl,%eax
  415227           0.00% |                     41 0f b6 04 00        movzbl (%r8,%rax,1),%eax
  41522c 40        0.87% |                     48 8b 04 c1           mov    (%rcx,%rax,8),%rax
  415230 806      17.58% |XXXXXXXXXXXXXXXXXXX  4c 63 f8              movslq %eax,%r15
  415233 186       4.06% |XXXX                 48 c1 e8 20           shr    $0x20,%rax
  415237 102       2.22% |XX                   4c 01 f9              add    %r15,%rcx
  41523a 414       9.03% |XXXXXXXXXX           a8 0f                 test   $0xf,%al
  41523c 680      14.83% |XXXXXXXXXXXXXXXX     74 45                 je     415283 ::Run(char const*, char const*)+0x4b3>
  41523e           0.00% |                     41 89 c7              mov    %eax,%r15d
  415241           0.00% |                     41 83 e7 01           and    $0x1,%r15d
  415245           0.00% |                     41 83 ff 01           cmp    $0x1,%r15d
  415249           0.00% |                     41 89 c7              mov    %eax,%r15d

  415250 679      13.05% |XXXXXXXXXXXXXXXX     48 c1 eb 08           shr    $0x8,%rbx
  415254 124       2.38% |XX                   0f b6 c3              movzbl %bl,%eax
  415257           0.00% |                     41 0f b6 04 00        movzbl (%r8,%rax,1),%eax
  41525c 43        0.83% |X                    48 8b 04 c1           mov    (%rcx,%rax,8),%rax
  415260 828      15.91% |XXXXXXXXXXXXXXXXXXX  4c 63 f8              movslq %eax,%r15
  415263 388       7.46% |XXXXXXXXX            48 c1 e8 20           shr    $0x20,%rax
  415267 141       2.71% |XXX                  4c 01 f9              add    %r15,%rcx
  41526a 634      12.18% |XXXXXXXXXXXXXXX      a8 0f                 test   $0xf,%al
  41526c 749      14.39% |XXXXXXXXXXXXXXXXXX   74 45                 je     4152b3 ::Run(char const*, char const*)+0x4c3>
  41526e           0.00% |                     41 89 c7              mov    %eax,%r15d
  415271           0.00% |                     41 83 e7 01           and    $0x1,%r15d
  415275           0.00% |                     41 83 ff 01           cmp    $0x1,%r15d
  415279           0.00% |                     41 89 c7              mov    %eax,%r15d

Developer IT

How does loop address alignment affect the speed on Intel x86_64? - Developer IT

How does loop address alignment affect the speed on Intel x86_64?

c++

optimization

gcc

intel

x86-64

Related posts about c++

C++ : C++ Primer (Stanley Lipmann) or The C++ programming language (special edition)

Which C++ book shold I get between "C++ Primer" vs "C++ Primer Plus"

Managed c++ std::string not accessible in unmanaged c++

I need help on my C++ assignment using MS Visual C++

The Definitive C++ Book Guide and List

Related posts about optimization

Search Engine Optimization - The Importance of Page Optimization in Search Engine Optimization

SEO Optimization - How to Master the SEO Optimization Process in Four Easy Steps

Keywords Optimization For Website Optimization

The Expert Secret to Search Engine Optimization - Effective Website Optimization

Importance of On-Page Optimization in Search Engine Optimization (SEO)

Categories cloud