Finding Common Byte Sequences in MS SQL TEXT Column

Posted by regex on Stack Overflow See other posts from Stack Overflow or by regex
Published on 2010-04-26T23:55:20Z Indexed on 2010/04/27 0:33 UTC
Read the original article Hit count: 270

Hello All,

Short Desc:

I'm curious to see if I can use SQL Analysis services or some other MS SQL service to mine some data for me that will show commonalities between SQL TEXT fields in a dataset.

Long Desc

I am looking at a subset of data that consists of about 10,000 rows of TEXT blobs which are used as a notes column in a issue tracking (ticketing) software. I would like to use something out of the box (without having to build something) that might be able to parse through all of the rows and find commonly used byte sequences in the "Notes" column. In other words, I want to find commonly used phrases (two to three word phrases, so 9 - 20 character sections of the TEXT blob). This will help me better determine if associate's notes contain similar phrases (troubleshooting techniques) that we could standardize in our troubleshooting process flow.

Closing Note

I'd really rather not build an application to do this as my method will probably not be the most efficient way to do it.

Hopefully all this makes sense. Please let me know in the comments if anything needs clarification. Thanks in advance for your help.

© Stack Overflow or respective owner

Related posts about sql-server

Related posts about data-mining