How to remove control chars from UTF8 string

Posted by Mimefilt on Stack Overflow See other posts from Stack Overflow or by Mimefilt
Published on 2010-12-21T15:28:12Z Indexed on 2010/12/21 15:54 UTC
Read the original article Hit count: 225

Filed under:
|
|
|
|

Hi there,

i have a VB.NET program that handles the content of documents. The programm handles high volumes of documents as "batch"(>2Million documents;total 1TB volume) Some of this documents may contain control chars or chars like f0e8(http://www.fileformat.info/info/unicode/char/f0e8/browsertest.htm).

Is there a easy and especially fast way to remove that chars?(except space,newline,tab,...) If the answer is regex: Has anyone a complete regex for me?

Thanks!

© Stack Overflow or respective owner

Related posts about .NET

Related posts about regex