add uchar values in ushort array with sse2 or sse3
- by pompolus
i have an unsigned short dst[16][16] matrix and a larger unsigned char src[m][n] matrix.
Now i have to access in the src matrix and add a 16x16 submatrix to dst, using sse2 or ss3.
In a my older implementation, I was sure that my summed values ??were never greater than 256, so i could do this:
for (int row = 0; row < 16; ++row)
  {
    __m128i subMat = _mm_lddqu_si128(reinterpret_cast<const __m128i*>(src));
    dst[row] = _mm_add_epi8(dst[row], subMat);
    src += W; // Step to next row i need to add
  }
where W is an offset to reach the desired rows.
This code works, but now my values in src are larger and summed could be greater than 256, so i need to store them as ushort.
i've tried this:
for (int row = 0; row < 16; ++row)
  {
    __m128i subMat = _mm_lddqu_si128(reinterpret_cast<const __m128i*>(src));
    dst[row] = _mm_add_epi16(dst[row], subMat);
    src += W; // Step to next row i need to add
  }
but it doesn't work.
I'm not so good with sse, so any help will be appreciated.