Need some constructive criticism on my SSE/Assembly attempt

Posted by Brett on Stack Overflow See other posts from Stack Overflow or by Brett
Published on 2010-05-27T17:40:14Z Indexed on 2010/05/27 17:41 UTC
Read the original article Hit count: 360

Filed under:

sse

Hello, I'm working on converting a bit of code to SSE, and while I have the correct output it turns out to be slower than standard c++ code.

The bit of code that I need to do this for is:

float ox = p2x - (px * c - py * s)*m;
float oy = p2y - (px * s - py * c)*m;

What I've got for SSE code is:

void assemblycalc(vector4 &p, vector4 &sc, float &m, vector4 &xy)
{
    vector4 r;
    __m128 scale = _mm_set1_ps(m);

__asm
{
    mov     eax,    p       //Load into CPU reg
    mov     ebx,    sc
    movups  xmm0,   [eax]   //move vectors to SSE regs
    movups  xmm1,   [ebx]

    mulps   xmm0,   xmm1    //Multiply the Elements

    movaps  xmm2,   xmm0    //make a copy of the array  
    shufps  xmm2,   xmm0,  0x1B //shuffle the array     

    subps   xmm0,   xmm2    //subtract the elements

    mulps   xmm0,   scale   //multiply the vector by the scale

    mov     ecx,    xy      //load the variable into cpu reg
    movups  xmm3,   [ecx]   //move the vector to the SSE regs

    subps   xmm3,   xmm0    //subtract xmm3 - xmm0

    movups  [r],    xmm3    //Save the retun vector, and use elements 0 and 3
    }
}

Since its very difficult to read the code, I'll explain what I did:

loaded vector4 , xmm0 _ p = [px , py , px , py ]
mult. by vector4, xmm1 _ cs = [c , c , s , s ]
_____________mult----------------------------
result,______ xmm0 = [px*c, py*c, px*s, py*s]

reuse result, xmm0 = [px*c, py*c, px*s, py*s]
shuffle result, xmm2 = [py*s, px*s, py*c, px*c]
___________subtract----------------------------
result, xmm0 = [px*c-py*s, py*c-px*s, px*s-py*c, py*s-px*c]

reuse result, xmm0 = [px*c-py*s, py*c-px*s, px*s-py*c, py*s-px*c]
load m vector4, scale = [m, m, m, m]
______________mult----------------------------
result, xmm0 = [(px*c-py*s)*m, (py*c-px*s)*m, (px*s-py*c)*m, (py*s-px*c)*m]

load xy vector4, xmm3 = [p2x, p2x, p2y, p2y]
reuse, xmm0 = [(px*c-py*s)*m, (py*c-px*s)*m, (px*s-py*c)*m, (py*s-px*c)*m]
___________subtract----------------------------
result, xmm3 = [p2x-(px*c-py*s)*m, p2x-(py*c-px*s)*m, p2y-(px*s-py*c)*m, p2y-(py*s-px*c)*m]

then ox = xmm3[0] and oy = xmm3[3], so I essentially don't use xmm3[1] or xmm3[4]

I apologize for the difficulty reading this, but I'm hoping someone might be able to provide some guidance for me, as the standard c++ code runs in 0.001444ms and the SSE code runs in 0.00198ms.

Let me know if there is anything I can do to further explain/clean this up a bit. The reason I'm trying to use SSE is because I run this calculation millions of times, and it is a part of what is slowing down my current code.

Thanks in advance for any help! Brett

Developer IT

Need some constructive criticism on my SSE/Assembly attempt - Developer IT

Need some constructive criticism on my SSE/Assembly attempt

c++

assembly

sse

Related posts about c++

C++ : C++ Primer (Stanley Lipmann) or The C++ programming language (special edition)

Which C++ book shold I get between "C++ Primer" vs "C++ Primer Plus"

Managed c++ std::string not accessible in unmanaged c++

I need help on my C++ assignment using MS Visual C++

The Definitive C++ Book Guide and List

Related posts about assembly

More information wanted on error: CREATE ASSEMBLY for assembly failed because assembly failed verif

Installed SQL Server 2008 and now TFS is broken.

How to Specify AssemblyKeyFile Attribute in .NET Assembly and Issues

Assembly Language Question: Display Characters to screen LIFO (Last In First Out) in assembly langua

c# - can you make a "weak" assembly reference to a strong named assembly

Categories cloud