Full Text Search in MSSQL2008 shows wrong display_item for Thai language

Posted by ensecoz on Stack Overflow See other posts from Stack Overflow or by ensecoz
Published on 2010-04-04T16:59:33Z Indexed on 2010/04/04 17:13 UTC
Read the original article Hit count: 335

Filed under:
|
|

I am working with MSSQL2008. My task is to investigate the issue where FTS cannot find the right result for Thai.

First, I have the table which enables the FTS on the column 'ItemName' which is nvarchar. The Catalog is created with the Thai Language. Note that the Thai language is one of the languages that doesn't separate the word by spaces, so '????' '???' '????' are written like this in a sentence: '???????????'

  • In the table, there are many rows that include the word (????); for example row#1 (ItemName: '???????????')
  • On the webpage, I try to search for '????' but SQLServer cannot find it.

So I try to investigate it by trying the following query in SQLServer:

select * from sys.dm_fts_parser(N'"???????????"', 1054, 0, 0)

...to see how the words are broken. The first one is the text to be broken. The second parameter is to specify that we're using Thai (WorkBreaker, so on). Here is the result:

  • row#1 (display_item: '????', source_item: '???????????')
  • row#2 (display_item: '????', source_item: '???????????')
  • row#3 (display_item: '??', source_item: '???????????')

Notice that the first and second row display the wrong display_item '?' in the '????' isn't even Thai characters. '?' in '????' is not a Thai character either.

So the question is where did those alien characters come from? I guess this why I cannot search for '????' because the word breaker is broken and keeping the wrong character in the indexes.

Please help!

© Stack Overflow or respective owner

Related posts about bugs

Related posts about sql-server