Sensible Way to Pass Web Data in XML to a SQL Server Database
- by Emtucifor
After exploring several different ways to pass web data to a database for update purposes, I'm wondering if XML might be a good strategy. The database is currently SQL 2000. In a few months it will move to SQL 2005 and I will be able to change things if needed, but I need a SQL 2000 solution now.
First of all, the database in question uses the EAV model. I know that this kind of database is generally highly frowned on, so for the purposes of this question, please just accept that this is not going to change.
The current update method has the web server inserting values (that have all been converted first to their correct underlying types, then to sql_variant) to a temp table. A stored procedure is then run which expects the temp table to exist and it takes care of updating, inserting, or deleting things as needed.
So far, only a single element has needed to be updated at a time. But now, there is a requirement to be able to edit multiple elements at once, and also to support hierarchical elements, each of which can have its own list of attributes. Here's some example XML I hand-typed to demonstrate what I'm thinking of.
Note that in this database the Entity is Element and an ID of 0 signifies "create" aka an insert of a new item.
<Elements>
  <Element ID="1234">
    <Attr ID="221">Value</Attr>
    <Attr ID="225">287</Attr>
    <Attr ID="234">
      <Element ID="99825">
        <Attr ID="7">Value1</Attr>
        <Attr ID="8">Value2</Attr>
        <Attr ID="9" Action="delete" />
      </Element>
      <Element ID="99826" Action="delete" />
      <Element ID="0" Type="24">
        <Attr ID="7">Value4</Attr>
        <Attr ID="8">Value5</Attr>
        <Attr ID="9">Value6</Attr>
      </Element>
      <Element ID="0" Type="24">
        <Attr ID="7">Value7</Attr>
        <Attr ID="8">Value8</Attr>
        <Attr ID="9">Value9</Attr>
      </Element>
    </Attr>
    <Rel ID="3827" Action="delete" />
    <Rel ID="2284" Role="parent">
      <Element ID="3827" />
      <Element ID="3829" />
      <Attr ID="665">1</Attr>
    </Rel>
    <Rel ID="0" Type="23" Role="child">
      <Element ID="3830" />
      <Attr ID="67"
    </Rel>
  </Element>
  <Element ID="0" Type="87">
    <Attr ID="221">Value</Attr>
    <Attr ID="225">569</Attr>
    <Attr ID="234">
      <Element ID="0" Type="24">
        <Attr ID="7">Value10</Attr>
        <Attr ID="8">Value11</Attr>
        <Attr ID="9">Value12</Attr>
      </Element>
    </Attr>
  </Element>
  <Element ID="1235" Action="delete" />
</Elements>
Some Attributes are straight value types, such as AttrID 221. But AttrID 234 is a special "multi-value" type that can have a list of elements underneath it, and each one can have one or more values. Types only need to be presented when a new item is created, since the ElementID fully implies the type if it already exists. I'll probably support only passing in changed items (as detected by javascript). And there may be an Action="Delete" on Attr elements as well, since NULLs are treated as "unselected"--sometimes it's very important to know if a Yes/No question has intentionally been answered No or if no one's bothered to say Yes yet.
There is also a different kind of data, a Relationship. At this time, those are updated through individual AJAX calls as things are edited in the UI, but I'd like to include those so that changes to relationships can be canceled (right now, once you change it, it's done). So those are really elements, too, but they are called Rel instead of Element. Relationships are implemented as ElementID1 and ElementID2, so the RelID 2284 in the XML above is in the database as:
ElementID 2284 ElementID1 1234 ElementID2 3827
Having multiple children in one relationship isn't currently supported, but it would be nice later.
Does this strategy and the example XML make sense? Is there a more sensible way? I'm just looking for some broad critique to help save me from going down a bad path. Any aspect that you'd like to comment on would be helpful.
The web language happens to be Classic ASP, but that could change to ASP.Net at some point. A persistence engine like Linq or nHibernate is probably not acceptable right now--I just want to get this already working application enhanced without a huge amount of development time.
I'll choose the answer that shows experience and has a balance of good warnings about what not to do, confirmations of what I'm planning to do, and recommendations about something else to do. I'll make it as objective as possible.
P.S. I'd like to handle unicode characters as well as very long strings (10k +).
UPDATE
I have had this working for some time and I used the ADO Recordset Save-To-Stream trick to make creating the XML really easy. The result seems to be fairly fast, though if speed ever becomes a problem I may revisit this.
In the meantime, my code works to handle any number of elements and attributes on the page at once, including updating, deleting, and creating new items all in one go.
I settled on a scheme like so for all my elements:
Existing data elements
Example: input name e12345_a678 (element 12345, attribute 678), the input value is the value of the attribute.
New elements
Javascript copies a hidden template of the set of HTML elements needed for the type into the correct location on the page, increments a counter to get a new ID for this item, and prepends the number to the names of the form items.
var newid = 0;
function metadataAdd(reference, nameid, value) {
   var t = document.createElement('input');
   t.setAttribute('name', nameid);
   t.setAttribute('id', nameid);
   t.setAttribute('type', 'hidden');
   t.setAttribute('value', value);
   reference.appendChild(t);
}
function multiAdd(target, parentelementid, attrid, elementtypeid) {
   var proto = document.getElementById('a' + attrid + '_proto');
   var instance = document.createElement('p');
   target.parentNode.parentNode.insertBefore(instance, target.parentNode);
   var thisid = ++newid;
   instance.innerHTML = proto.innerHTML.replace(/{prefix}/g, 'n' + thisid + '_');
   instance.id = 'n' + thisid;  
   instance.className += ' new';
   metadataAdd(instance, 'n' + thisid + '_p', parentelementid);
   metadataAdd(instance, 'n' + thisid + '_c', attrid);
   metadataAdd(instance, 'n' + thisid + '_t', elementtypeid);
   return false;
}
Example: Template input name _a678 becomes n1_a678 (a new element, the first one on the page, attribute 678). all attributes of this new element are tagged with the same prefix of n1. The next new item will be n2, and so on. Some hidden form inputs are created:
n1_t, value is the elementtype of the element to be created
n1_p, value is the parent id of the element (if it is a relationship)
n1_c, value is the child id of the element (if it is a relationship)
Deleting elements
A hidden input is created in the form e12345_t with value set to 0. The existing controls displaying that attribute's values are disabled so they are not included in the form post. So "set type to 0" is treated as delete.
With this scheme, every item on the page has a unique name and can be distinguished properly, and every action can be represented properly.
When the form is posted, here's a sample of building one of the two recordsets used (classic ASP code):
Set Data = Server.CreateObject("ADODB.Recordset")
Data.Fields.Append "ElementID", adInteger, 4, adFldKeyColumn
Data.Fields.Append "AttrID", adInteger, 4, adFldKeyColumn
Data.Fields.Append "Value", adLongVarWChar, 2147483647, adFldIsNullable Or adFldMayBeNull
Data.CursorLocation = adUseClient
Data.CursorType = adOpenDynamic
Data.Open
This is the recordset for values, the other is for the elements themselves.
I step through the posted form and for the element recordset use a Scripting.Dictionary populated with instances of a custom Class that has the properties I need, so that I can add the values piecemeal, since they don't always come in order. New elements are added as negative to distinguish them from regular elements (rather than requiring a separate column to indicate if it is new or addresses an existing element). I use regular expression to tear apart the form keys: "^(e|n)([0-9]{1,10})_(a|p|t|c)([0-9]{0,10})$"
Then, adding an attribute looks like this.
Data.AddNew
ElementID.Value = DataID
AttrID.Value = Integerize(Matches(0).SubMatches(3))
AttrValue.Value = Request.Form(Key)
Data.Update
ElementID, AttrID, and AttrValue are references to the fields of the recordset. This method is hugely faster than using Data.Fields("ElementID").Value each time.
I loop through the Dictionary of element updates and ignore any that don't have all the proper information, adding the good ones to the recordset.
Then I call my data-updating stored procedure like so:
Set Cmd = Server.CreateObject("ADODB.Command")
With Cmd
   Set .ActiveConnection = MyDBConn
   .CommandType = adCmdStoredProc
   .CommandText = "DataPost"
   .Prepared = False
   .Parameters.Append .CreateParameter("@ElementMetadata", adLongVarWChar, adParamInput, 2147483647, XMLFromRecordset(Element))
   .Parameters.Append .CreateParameter("@ElementData", adLongVarWChar, adParamInput, 2147483647, XMLFromRecordset(Data))
End With
Result.Open Cmd ' previously created recordset object with options set
Here's the function that does the xml conversion:
Private Function XMLFromRecordset(Recordset)
   Dim Stream
   Set Stream = Server.CreateObject("ADODB.Stream")
   Stream.Open
   Recordset.Save Stream, adPersistXML
   Stream.Position = 0
   XMLFromRecordset = Stream.ReadText
End Function
Just in case the web page needs to know, the SP returns a recordset of any new elements, showing their page value and their created value (so I can see that n1 is now e12346 for example).
Here are some key snippets from the stored procedure. Note this is SQL 2000 for now, though I'll be able to switch to 2005 soon:
CREATE PROCEDURE [dbo].[DataPost]
   @ElementMetaData ntext,
   @ElementData ntext
AS
DECLARE @hdoc int
--- snip ---
EXEC sp_xml_preparedocument @hdoc OUTPUT, @ElementMetaData, '<xml xmlns:s="uuid:BDC6E3F0-6DA3-11d1-A2A3-00AA00C14882" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns:rs="urn:schemas-microsoft-com:rowset" xmlns:z="#RowsetSchema" />'
INSERT #ElementMetadata (ElementID, ElementTypeID, ElementID1, ElementID2)
SELECT *
FROM
   OPENXML(@hdoc, '/xml/rs:data/rs:insert/z:row', 0)
   WITH (
      ElementID int,
      ElementTypeID int,
      ElementID1 int,
      ElementID2 int
   )
ORDER BY ElementID -- orders negative items (new elements) first so they begin counting at 1 for later ID calculation
EXEC sp_xml_removedocument @hdoc
--- snip ---
UPDATE E
SET E.ElementTypeID = M.ElementTypeID
FROM
   Element E
   INNER JOIN #ElementMetadata M ON E.ElementID = M.ElementID
WHERE
   E.ElementID >= 1
   AND M.ElementTypeID >= 1
The following query does the correlation of the negative new element ids to the newly inserted ones:
UPDATE #ElementMetadata -- Correlate the new ElementIDs with the input rows
SET NewElementID = Scope_Identity() - @@RowCount + DataID
WHERE ElementID < 0
Other set-based queries do all the other work of validating that the attributes are allowed, are the correct data type, and inserting, updating, and deleting elements and attributes.
I hope this brief run-down is useful to others some day! Converting ADO Recordsets to an XML stream was a huge winner for me as it saved all sorts of time and had a namespace and schema already defined that made the results come out correctly.
Using a flatter XML format with 2 inputs was also much easier than sticking to some ideal about having everything in a single XML stream.