I'm currently looking at ways to pass lists of integers in a SQL query, and try to decide which of them is best in which situation, what are the benefots of each, and what are the pitfalls, what should be avoided :)
Right now I know of 3 ways that we currently use in our application.
1) Table valued parameter:
Create a new Table Valued Parameter in sql server:
CREATE TYPE [dbo].[TVP_INT] AS TABLE(
[ID] [int] NOT NULL
)
Then run the query against it:
using (var conn = new SqlConnection(DataContext.GetDefaultConnectionString))
{
var comm = conn.CreateCommand();
comm.CommandType = CommandType.Text;
comm.CommandText = @"
UPDATE DA
SET [tsLastImportAttempt] = CURRENT_TIMESTAMP
FROM [Account] DA
JOIN @values IDs ON DA.ID = IDs.ID";
comm.Parameters.Add(new SqlParameter("values", downloadResults.Select(d => d.ID).ToDataTable()) { TypeName = "TVP_INT" });
conn.Open();
comm.ExecuteScalar();
}
The major disadvantages of this method is the fact that Linq doesn't support table valued params (if you create an SP with a TVP param, linq won't be able to run it) :(
2) Convert the list to Binary and use it in Linq!
This is a bit better.. Create an SP, and you can run it within linq :)
To do this, the SP will have an IMAGE parameter, and we'll be using a user defined function (udf) to convert this to a table.. We currently have implementations of this function written in C++ and in assembly, both have pretty much the same performance :)
Basically, each integer is represented by 4 bytes, and passed to the SP. In .NET we have an extension method that convers an IEnumerable to a byte array
The extension method:
public static Byte[] ToBinary(this IEnumerable intList)
{
return ToBinaryEnum(intList).ToArray();
}
private static IEnumerable<Byte> ToBinaryEnum(IEnumerable<Int32> intList)
{
IEnumerator<Int32> marker = intList.GetEnumerator();
while (marker.MoveNext())
{
Byte[] result = BitConverter.GetBytes(marker.Current);
Array.Reverse(result);
foreach (byte b in result)
yield return b;
}
}
The SP:
CREATE PROCEDURE [Accounts-UpdateImportAttempts]
@values IMAGE
AS
BEGIN
UPDATE DA
SET [tsLastImportAttempt] = CURRENT_TIMESTAMP
FROM [Account] DA
JOIN dbo.udfIntegerArray(@values, 4) IDs ON DA.ID = IDs.Value4
END
And we can use it by running the SP directly, or in any linq query we need
using (var db = new DataContext())
{
db.Accounts_UpdateImportAttempts(downloadResults.Select(d => d.ID).ToBinary());
// or
var accounts = db.Accounts
.Where(a => db.udfIntegerArray(downloadResults.Select(d => d.ID).ToBinary(), 4)
.Select(i => i.Value4)
.Contains(a.ID));
}
This method has the benefit of using compiled queries in linq (which will have the same sql definition, and query plan, so will also be cached), and can be used in SPs as well.
Both these methods are theoretically unlimited, so you can pass millions of ints at a time :)
3) The simple linq .Contains()
It's a more simple approach, and is perfect in simple scenarios. But is of course limited by this.
using (var db = new DataContext())
{
var accounts = db.Accounts
.Where(a => downloadResults.Select(d => d.ID).Contains(a.ID));
}
The biggest drawback of this method is that each integer in the downloadResults variable will be passed as a separate int.. In this case, the query is limited by sql (max allowed parameters in a sql query, which is a couple of thousand, if I remember right).
So I'd like to ask.. What do you think is the best of these, and what other methods and approaches have I missed?