Asp.net library to extract plain text from docx, pptx, xlsx (for search index)

Posted by Myster on Stack Overflow See other posts from Stack Overflow or by Myster
Published on 2010-05-06T03:37:59Z Indexed on 2010/05/17 4:00 UTC
Read the original article Hit count: 299

Filed under:
|
|
|
|

Is there a pre-existing library to extract plain text form docx, pptx, and xlsx files?

I require this to populate a lucene.net index.

I've found this example which extracts text from docx and it seems to work ok. But before building my own solution based on this I was wondering if there's something already available for the other file formats?

© Stack Overflow or respective owner

Related posts about docx

Related posts about pptx