Search for multiple values in an xml column
- by Yuriy Gettya
Environment: SQL Server 2012. Primary and secondary (value) index is built on xml column.
Say I have a table Message with xml column WordIndex. I also have a table Word which has WordId and WordText. Xml for Message.WordIndex has the following schema:
<xs:schema attributeFormDefault="unqualified"
           elementFormDefault="qualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://www.example.com">
    <xs:element name="wi">
        <xs:complexType>
            <xs:sequence>
                <xs:element maxOccurs="unbounded" name="w">
                    <xs:complexType>
                        <xs:sequence>
                            <xs:element maxOccurs="unbounded" name="p" type="xs:unsignedByte" />
                        </xs:sequence>
                        <xs:attribute name="wid" type="xs:unsignedByte" use="required" />
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>
and some data to go with it:
<wi xmlns="http://www.example.com">
  <w wid="1">
    <p>28</p>
    <p>72</p>
    <p>125</p>
  </w>
  <w wid="4">
    <p>89</p>
  </w>
  <w wid="5">
    <p>11</p>
  </w>
</wi>
I need to search for multiple values in my xml column WordIndex either using OR or AND. What I'm doing is fairly rudimentary, since I'm a n00b in XQuery (taken from debug output, hence real values):
with xmlnamespaces(default 'http://www.example.com')
select
    m.Subject,
    m.MessageId,
    m.WordIndex.query('
        let $dummy := 0
        return
            <word_list>
            {
                for $w in /wi/w
                    where $w/@wid=64
                    return <word wid="64" pos="{data($w/p)}"/>
            }
            {
                for $w in /wi/w
                    where $w/@wid=70
                    return <word wid="70" pos="{data($w/p)}"/>
            }
            {
                for $w in /wi/w
                    where $w/@wid=63
                    return <word wid="63" pos="{data($w/p)}"/>
            }
            </word_list>
        ') as WordPosition
from
    Message as m
-- more joins go here ...
where
    -- more conditions go here ...
    and m.WordIndex.exist('/wi/w[@wid=64]') = 1
    and m.WordIndex.exist('/wi/w[@wid=70]') = 1
    and m.WordIndex.exist('/wi/w[@wid=63]') = 1
How can this be optimized?