I have an application requiring me to be able to embed loss-less data into an image.
As such I've been experimenting with steganography, specifically via modification of DCT coefficients as the method I select, apart from being loss-less must also be relatively resilient against format conversion, scaling/DSP etc. From the research I've done thus far this method seems to be the best candidate. I've seen a number of papers on the subject which all seem to neglect specific details (some neglect to mention modification of 0 coefficients, or modification of AC coefficient etc). After combining the findings and making a few modifications of my own which include:
1) Using a more quantized version of the DCT matrix to ensure we only modify coefficients that would still be present should the image be JPEG'ed further or processed (I'm using this in place of simply following a zig-zag pattern).
2) I'm modifying bit 4 instead of the LSB and then based on what the original bit value was adjusting the lower bits to minimize the difference.
3) I'm only modifying the blue channel as it should be the least visible.
This process must modify the actual image and not the DCT values stored in file (like jsteg) as there is no guarantee the file will be a JPEG, it may also be opened and re-saved at a later stage in a different format.
For added robustness I've included the message multiple times and use the bits that occur most often, I had considered using a QR code as the message data or simply applying the reed-solomon error correction, but for this simple application and given that the "message" in question is usually going to be between 10-32 bytes I have plenty of room to repeat it which should provide sufficient redundancy to recover the true bits.
No matter what I do I don't seem to be able to recover the bits at the decode stage.
I've tried including / excluding various checks (even if it degrades image quality for the time being). I've tried using fixed point vs. double arithmetic, moving the bit to encode, I suspect that the message bits are being lost during the IDCT back to image.
Any thoughts or suggestions on how to get this working would be hugely appreciated.
(PS I am aware that the actual DCT/IDCT could be optimized from it's naive On4 operation using row column algorithm, or an FDCT like AAN, but for now it just needs to work :) )
Reference Papers:
http://www.lokminglui.com/dct.pdf
http://arxiv.org/ftp/arxiv/papers/1006/1006.1186.pdf  
Code for the Encode/Decode process in C# below:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Drawing.Imaging;
using System.Drawing;
namespace ImageKey
{
public class Encoder
{
    public const int HIDE_BIT_POS = 3; // use bit position 4 (1 << 3).
    public const int HIDE_COUNT = 16; // Number of times to repeat the message to avoid error.
    // JPEG Standard Quantization Matrix.
    // (to get higher quality multiply by (100-quality)/50 .. 
    // for lower than 50 multiply by 50/quality. Then round to integers and clip to ensure only positive integers.
    public static double[] Q = {16,11,10,16,24,40,51,61,  
                                12,12,14,19,26,58,60,55,
                                14,13,16,24,40,57,69,56,
                                14,17,22,29,51,87,80,62,
                                18,22,37,56,68,109,103,77,
                                24,35,55,64,81,104,113,92,
                                49,64,78,87,103,121,120,101,
                                72,92,95,98,112,100,103,99};
    // Maximum qauality quantization matrix (if all 1's doesn't modify coefficients at all).
    public  static double[] Q2 = {1,1,1,1,1,1,1,1,
                                  1,1,1,1,1,1,1,1,
                                  1,1,1,1,1,1,1,1,
                                  1,1,1,1,1,1,1,1,
                                  1,1,1,1,1,1,1,1,
                                  1,1,1,1,1,1,1,1,
                                  1,1,1,1,1,1,1,1,
                                  1,1,1,1,1,1,1,1};
    public static Bitmap Encode(Bitmap b, string key)
    {
        Bitmap response = new Bitmap(b.Width, b.Height, PixelFormat.Format32bppArgb);
        uint imgWidth = ((uint)b.Width) & ~((uint)7);           // Maximum usable X resolution (divisible by 8).
        uint imgHeight = ((uint)b.Height) & ~((uint)7);         // Maximum usable Y resolution (divisible by 8).
        // Start be transferring the unmodified image portions. 
        // As we'll be using slightly less width/height for the encoding process we'll need the edges to be populated.
        for (int y = 0; y < b.Height; y++)
            for (int x = 0; x < b.Width; x++)
            {
                if( (x >= imgWidth && x < b.Width) || (y>=imgHeight && y < b.Height))
                    response.SetPixel(x, y, b.GetPixel(x, y));
            }
        // Setup the counters and byte data for the message to encode.
        StringBuilder sb = new StringBuilder();
        for(int i=0;i<HIDE_COUNT;i++) sb.Append(key);
        byte[] codeBytes = System.Text.Encoding.ASCII.GetBytes(sb.ToString());
        int bitofs = 0;                                         // Current bit position we've encoded too.
        int totalBits = (codeBytes.Length * 8);                 // Total number of bits to encode.
        for (int y = 0; y < imgHeight; y += 8)
        {
            for (int x = 0; x < imgWidth; x += 8)
            {
                int[] redData = GetRedChannelData(b, x, y);
                int[] greenData = GetGreenChannelData(b, x, y);
                int[] blueData = GetBlueChannelData(b, x, y);
                int[] newRedData;
                int[] newGreenData;
                int[] newBlueData;
                if (bitofs < totalBits)
                {
                    double[] redDCT = DCT(ref redData);
                    double[] greenDCT = DCT(ref greenData);
                    double[] blueDCT = DCT(ref blueData);
                    int[] redDCTI = Quantize(ref redDCT, ref Q2);
                    int[] greenDCTI = Quantize(ref greenDCT, ref Q2);
                    int[] blueDCTI = Quantize(ref blueDCT, ref Q2);
                    int[] blueDCTC = Quantize(ref blueDCT, ref Q);
                    HideBits(ref blueDCTI, ref blueDCTC, ref bitofs, ref totalBits, ref codeBytes);
                    double[] redDCT2 = DeQuantize(ref redDCTI, ref Q2);
                    double[] greenDCT2 = DeQuantize(ref greenDCTI, ref Q2);
                    double[] blueDCT2 = DeQuantize(ref blueDCTI, ref Q2);
                    newRedData = IDCT(ref redDCT2);
                    newGreenData = IDCT(ref greenDCT2);
                    newBlueData = IDCT(ref blueDCT2);
                }
                else
                {
                    newRedData = redData;
                    newGreenData = greenData;
                    newBlueData = blueData;
                }
                MapToRGBRange(ref newRedData);
                MapToRGBRange(ref newGreenData);
                MapToRGBRange(ref newBlueData);
                for(int dy=0;dy<8;dy++)
                {
                    for(int dx=0;dx<8;dx++)
                    {
                        int col = (0xff<<24) + (newRedData[dx+(dy*8)]<<16) + (newGreenData[dx+(dy*8)]<<8) + (newBlueData[dx+(dy*8)]);
                        response.SetPixel(x+dx,y+dy,Color.FromArgb(col));
                    }
                }
            }
        }
        if (bitofs < totalBits) throw new Exception("Failed to encode data - insufficient cover image coefficients");
        return (response);
    }
    public static void HideBits(ref int[] DCTMatrix, ref int[] CMatrix, ref int bitofs, ref int totalBits, ref byte[] codeBytes)
    {
        int tempValue = 0;
        for (int u = 0; u < 8; u++)
        {
            for (int v = 0; v < 8; v++)
            {
                if ( (u != 0 || v != 0) && CMatrix[v+(u*8)] != 0 && DCTMatrix[v+(u*8)] != 0)
                {
                    if (bitofs < totalBits)
                    {
                        tempValue = DCTMatrix[v + (u * 8)];
                        int bytePos = (bitofs) >> 3;
                        int bitPos = (bitofs) % 8;
                        byte mask = (byte)(1 << bitPos);
                        byte value = (byte)((codeBytes[bytePos] & mask) >> bitPos); // 0 or 1.
                        if (value == 0)
                        {
                            int a = DCTMatrix[v + (u * 8)] & (1 << HIDE_BIT_POS);
                            if (a != 0) DCTMatrix[v + (u * 8)] |= (1 << HIDE_BIT_POS) - 1;
                            DCTMatrix[v + (u * 8)] &= ~(1 << HIDE_BIT_POS);
                        }
                        else if (value == 1)
                        {
                            int a = DCTMatrix[v + (u * 8)] & (1 << HIDE_BIT_POS);
                            if (a == 0) DCTMatrix[v + (u * 8)] &= ~((1 << HIDE_BIT_POS) - 1);
                            DCTMatrix[v + (u * 8)] |= (1 << HIDE_BIT_POS);
                        }
                        if (DCTMatrix[v + (u * 8)] != 0)
                            bitofs++;
                        else
                            DCTMatrix[v + (u * 8)] = tempValue;
                    }
                }
            }
        }
    }
    public static void MapToRGBRange(ref int[] data)
    {
        for(int i=0;i<data.Length;i++)
        {
            data[i] += 128;
            if(data[i] < 0) data[i] = 0;
            else if(data[i] > 255) data[i] = 255;
        }
    }
    public static int[] GetRedChannelData(Bitmap b, int sx, int sy)
    {
        int[] data = new int[8 * 8];
        for (int y = sy; y < (sy + 8); y++)
        {
            for (int x = sx; x < (sx + 8); x++)
            {
                uint col = (uint)b.GetPixel(x,y).ToArgb();
                data[(x - sx) + ((y - sy) * 8)] = (int)((col >> 16) & 0xff) - 128;
            }
        }
        return (data);
    }
    public static int[] GetGreenChannelData(Bitmap b, int sx, int sy)
    {
        int[] data = new int[8 * 8];
        for (int y = sy; y < (sy + 8); y++)
        {
            for (int x = sx; x < (sx + 8); x++)
            {
                uint col = (uint)b.GetPixel(x, y).ToArgb();
                data[(x - sx) + ((y - sy) * 8)] = (int)((col >> 8) & 0xff) - 128;
            }
        }
        return (data);
    }
    public static int[] GetBlueChannelData(Bitmap b, int sx, int sy)
    {
        int[] data = new int[8 * 8];
        for (int y = sy; y < (sy + 8); y++)
        {
            for (int x = sx; x < (sx + 8); x++)
            {
                uint col = (uint)b.GetPixel(x, y).ToArgb();
                data[(x - sx) + ((y - sy) * 8)] = (int)((col >> 0) & 0xff) - 128;
            }
        }
        return (data);
    }
    public static int[] Quantize(ref double[] DCTMatrix, ref double[] Q)
    {
        int[] DCTMatrixOut = new int[8*8];
        for (int u = 0; u < 8; u++)
        {
            for (int v = 0; v < 8; v++)
            {
                DCTMatrixOut[v + (u * 8)] = (int)Math.Round(DCTMatrix[v + (u * 8)] / Q[v + (u * 8)]);
            }
        }
        return(DCTMatrixOut);
    }
    public static double[] DeQuantize(ref int[] DCTMatrix, ref double[] Q)
    {
        double[] DCTMatrixOut = new double[8*8];
        for (int u = 0; u < 8; u++)
        {
            for (int v = 0; v < 8; v++)
            {
                DCTMatrixOut[v + (u * 8)] = (double)DCTMatrix[v + (u * 8)] * Q[v + (u * 8)];
            }
        }
        return(DCTMatrixOut);
    }
    public static double[] DCT(ref int[] data)
    {
        double[] DCTMatrix = new double[8 * 8];
        for (int v = 0; v < 8; v++)
        {
            for (int u = 0; u < 8; u++)
            {
                double cu = 1;
                if (u == 0) cu = (1.0 / Math.Sqrt(2.0));
                double cv = 1;
                if (v == 0) cv = (1.0 / Math.Sqrt(2.0));
                double sum = 0.0;
                for (int y = 0; y < 8; y++)
                {
                    for (int x = 0; x < 8; x++)
                    {
                        double s = data[x + (y * 8)];
                        double dctVal = Math.Cos((2 * y + 1) * v * Math.PI / 16)
                                      * Math.Cos((2 * x + 1) * u * Math.PI / 16);
                        sum += s * dctVal;
                    }
                }
                DCTMatrix[u + (v * 8)] = (0.25 * cu * cv * sum);
            }
        }
        return (DCTMatrix);
    }
    public static int[] IDCT(ref double[] DCTMatrix)
    {
        int[] Matrix = new int[8 * 8];
        for (int y = 0; y < 8; y++)
        {
            for (int x = 0; x < 8; x++)
            {
                double sum = 0;
                for (int v = 0; v < 8; v++)
                {
                    for (int u = 0; u < 8; u++)
                    {
                        double cu = 1;
                        if (u == 0) cu = (1.0 / Math.Sqrt(2.0));
                        double cv = 1;
                        if (v == 0) cv = (1.0 / Math.Sqrt(2.0));
                        double idctVal = (cu * cv) / 4.0 * Math.Cos((2 * y + 1) * v * Math.PI / 16)
                                                   * Math.Cos((2 * x + 1) * u * Math.PI / 16);
                        sum += (DCTMatrix[u + (v * 8)] * idctVal);
                    }
                }
                Matrix[x + (y * 8)] = (int)Math.Round(sum);
            }
        }
        return (Matrix);
    }
}
public class Decoder
{
    public static string Decode(Bitmap b, int expectedLength)
    {
        expectedLength *= Encoder.HIDE_COUNT;
        uint imgWidth = ((uint)b.Width) & ~((uint)7);           // Maximum usable X resolution (divisible by 8).
        uint imgHeight = ((uint)b.Height) & ~((uint)7);         // Maximum usable Y resolution (divisible by 8).
        // Setup the counters and byte data for the message to decode.
        byte[] codeBytes = new byte[expectedLength];
        byte[] outBytes = new byte[expectedLength / Encoder.HIDE_COUNT];
        int bitofs = 0;                                         // Current bit position we've decoded too.
        int totalBits = (codeBytes.Length * 8);                 // Total number of bits to decode.
        for (int y = 0; y < imgHeight; y += 8)
        {
            for (int x = 0; x < imgWidth; x += 8)
            {
                int[] blueData = ImageKey.Encoder.GetBlueChannelData(b, x, y);
                double[] blueDCT = ImageKey.Encoder.DCT(ref blueData);
                int[] blueDCTI = ImageKey.Encoder.Quantize(ref blueDCT, ref Encoder.Q2);
                int[] blueDCTC = ImageKey.Encoder.Quantize(ref blueDCT, ref Encoder.Q);
                if (bitofs < totalBits)
                    GetBits(ref blueDCTI, ref blueDCTC, ref bitofs, ref totalBits, ref codeBytes);
            }
        }
        bitofs = 0;
        for (int i = 0; i < (expectedLength / Encoder.HIDE_COUNT) * 8; i++)
        {
            int bytePos = (bitofs) >> 3;
            int bitPos = (bitofs) % 8;
            byte mask = (byte)(1 << bitPos);
            List<int> values = new List<int>();
            int zeroCount = 0;
            int oneCount = 0;
            for (int j = 0; j < Encoder.HIDE_COUNT; j++)
            {
                int val = (codeBytes[bytePos + ((expectedLength / Encoder.HIDE_COUNT) * j)] & mask) >> bitPos;
                values.Add(val);
                if (val == 0) zeroCount++;
                else oneCount++;
            }
            if (oneCount >= zeroCount) outBytes[bytePos] |= mask;
            bitofs++;
            values.Clear();
        }
        return (System.Text.Encoding.ASCII.GetString(outBytes));
    }
    public static void GetBits(ref int[] DCTMatrix, ref int[] CMatrix, ref int bitofs, ref int totalBits, ref byte[] codeBytes)
    {
        for (int u = 0; u < 8; u++)
        {
            for (int v = 0; v < 8; v++)
            {
                if ((u != 0 || v != 0) && CMatrix[v + (u * 8)] != 0 && DCTMatrix[v + (u * 8)] != 0)
                {
                    if (bitofs < totalBits)
                    {
                        int bytePos = (bitofs) >> 3;
                        int bitPos = (bitofs) % 8;
                        byte mask = (byte)(1 << bitPos);
                        int value = DCTMatrix[v + (u * 8)] & (1 << Encoder.HIDE_BIT_POS);
                        if (value != 0) codeBytes[bytePos] |= mask;
                        bitofs++;
                    }
                }
            }
        }
    }
}
}
UPDATE:
By switching to using a QR Code as the source message and swapping a pair of coefficients in each block instead of bit manipulation I've been able to get the message to survive the transform. However to get the message to come through without corruption I have to adjust both coefficients as well as swap them. For example swapping (3,4) and (4,3) in the DCT matrix and then respectively adding 8 and subtracting 8 as an arbitrary constant seems to work. This survives a re-JPEG'ing of 96 but any form of scaling/cropping destroys the message again. I was hoping that by operating on mid to low frequency values that the message would be preserved even under some light image manipulation.