Configuring Full-Text Search for pdf and docx files

Posted by Lukasz Kurylo on Geeks with Blogs See other posts from Geeks with Blogs or by Lukasz Kurylo
Published on Wed, 10 Oct 2012 18:23:22 GMT Indexed on 2012/10/10 21:39 UTC
Read the original article Hit count: 272

Filed under:

I think in may I was creating a little filters module based on Full Text-Search. I have configured my dev machine, the same for two testing servers – in our company for internal testing before we deployed it to client, and then on the testing client server. Until last week this build  was still on the testing server and finally we got feedback that we can deploy it on the production one.

I only say that, I lost half a day because I had not correctly remembered what I was doing to configure the FTS on the previous servers and I had no notes for that. I foolishly believed in my memory. Lesson learned.

 

For future reference a bunch of steps to configure the FTS for searching in *.pdf and *.docx files (and by the way in other Office files like *.xlsx).

 

1. From the page (link) download and install the *.pdf IFilter for FTS.

2. To the PATH global system variable add path to the catalog, where you installed the plugin. Default for this version is: C:\Program Files\Adobe\Adobe PDF iFilter 9 for 64-bit platforms\bin

3. From the page (link) download a FilterPackx64.exe and install it.

4. Now from SSMS execute the following procedures:

-sp_fulltext_service 'load_os_resources',1

-sp_fulltext_service 'verify_signature', 0

5. Restart the server

6. Now we must check if the plugins are visible:

-select document_type, path from sys.fulltext_document_types where document_type = '.pdf'

-select document_type, path from sys.fulltext_document_types where document_type = '.docx'

7. If we see a result, then we can assume that everything is ok*.

8. Right now we can create a catalog for FTS and indexes on appropriate columns.

 

 

*I lost a lot of hours to find out, why the plugin for the *.pdf files wasn’t indexed any file in the database, but in the sys.fulltext_document_types table there was available a line for this plugin. After the deeper investigation I found that the *.pdf files actually were indexed. At least the EOF sign was added to the indexes and nothing more for each file. In the end the problem was that, I forgot to add the /bin in the path to the plugin in PATH variable..

© Geeks with Blogs or respective owner