Finding Common Byte Sequences in MS SQL TEXT Column

Posted by regex on Stack Overflow See other posts from Stack Overflow or by regex
Published on 2010-04-26T23:55:20Z Indexed on 2010/04/27 0:33 UTC
Read the original article Hit count: 339

Filed under:

sql-server

|

data-mining

|

analysis-services

|

data-analysis

Hello All,

Short Desc:

I'm curious to see if I can use SQL Analysis services or some other MS SQL service to mine some data for me that will show commonalities between SQL TEXT fields in a dataset.

Long Desc

I am looking at a subset of data that consists of about 10,000 rows of TEXT blobs which are used as a notes column in a issue tracking (ticketing) software. I would like to use something out of the box (without having to build something) that might be able to parse through all of the rows and find commonly used byte sequences in the "Notes" column. In other words, I want to find commonly used phrases (two to three word phrases, so 9 - 20 character sections of the TEXT blob). This will help me better determine if associate's notes contain similar phrases (troubleshooting techniques) that we could standardize in our troubleshooting process flow.

Closing Note

I'd really rather not build an application to do this as my method will probably not be the most efficient way to do it.

Hopefully all this makes sense. Please let me know in the comments if anything needs clarification. Thanks in advance for your help.

© Stack Overflow or respective owner

Related posts about sql-server

SQL SERVER – Beginning SQL Server: One Step at a Time – SQL Server Magazine

as seen on SQL Authority - Search for 'SQL Authority'
I am glad to announce that along with SQLAuthority.com, I will be blogging on the prominent site of SQL Server Magazine. My very first blog post there is already live; read here: Beginning SQL Server: One Step at a Time. My association with SQL Server Magazine has been quite long, I have written nearly… >>> More
How to install SQL Server 2005 Configuration Manager without installing SQL Server Management Studio

as seen on Server Fault - Search for 'Server Fault'
Hi, I need to configure SQL Server aliases on a public-facing production server. To do that, I need to install SQL Server Configuration Manager. I was not able to find a standalone installer for that, so I am having to install SQL Server 2005 Client Components. This approach is not ideal as we don't… >>> More
[MAJ] SQL Server 2005 Express Edition - La version gratuite de SQL Server 2005

as seen on ASP-PHP.net - Search for 'ASP-PHP.net'
Modification technique pour le site de l'article. >>> More
How to create SQL Server Express DB from SQL Server DB

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a SQL Server 2008 DB. I want to extract SOME tables (and associated schema, constraints, indexes, etc) and create a SQL Server Express DB. It isn't a sync of the target, we stomp on it. We ONLY need to do this in the file system (not across the wire). We are not fond of the synchronization… >>> More
sql server 2000 error, error trying to connect to sql server 2005

as seen on Stack Overflow - Search for 'Stack Overflow'
i am connecting to sql server 2000 on a remote computer with a dotnet application, but when i try to open the connection it gives the following error: When connecting to SQL Server 2005, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections What… >>> More

Related posts about data-mining

SQLAuthority News – Links to Book On Line – Data Mining Algorithms (Analysis Services – Data Mining)

as seen on SQL Authority - Search for 'SQL Authority'
I have quite often received request for the Data Mining Algorithms details. Book Online has wonderful resources for the same. I suggest to read them here. Data Mining Algorithms (Analysis Services – Data Mining) The data mining algorithm is the mechanism that creates a data mining model. To… >>> More
Data Mining Resources

as seen on SQL Blog - Search for 'SQL Blog'
There are many different types of analyses, each one with its own pros and cons. Relational reports have a predefined structure, and end users cannot change it. They are simple to use for end users. Reports can use real-time data and snapshots of data to show the state of a report at specific points… >>> More
Integrating Data Mining into your BI Solution (Presentation)

as seen on SQLIS - Search for 'SQLIS'
I recently gave a live meeting presentation to the UK User Group on Integrating Data Mining into your BI Solution. In it I talk about and demo ways of using your data mining models inside Integration Services, Analysis Services and Reporting Services. This is the first in a series of presentations… >>> More
What data mining tools do you use?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello everyone, Besides the two well-known Open Source tools RapidMiner and Weka, are there any other good tools (either Open Source or Commercial), which you can recommend for data mining? Thanks in advance! >>> More
NEW 2-Day Instructor Led Course on Oracle Data Mining Now Available!

as seen on Oracle Blogs - Search for 'Oracle Blogs'
A NEW 2-Day Instructor Led Course on Oracle Data Mining has been developed for customers and anyone wanting to learn more about data mining, predictive analytics and knowledge discovery inside the Oracle Database. Course Objectives: Explain basic data mining concepts and… >>> More