Handling UTF-8 with BOM in HTTP

Posted by Alois Mahdal on Server Fault See other posts from Server Fault or by Alois Mahdal
Published on 2012-04-15T09:16:53Z Indexed on 2012/04/15 11:33 UTC
Read the original article Hit count: 264

Filed under:
|
|

Say I have a script which at some point serves a plain text file as a content (right after "\n\n"). These files are provided by users, but I can expect they will be UTF-8. So I hard-wire Content-Type: text/plain; charset=UTF-8.

But while I can teach users to save everything in UTF-8, I can't be very sure that the files will be without BOM ("\xEE\xBB\xBF"), as at least on Windows, this is not very clearly distinguished in common plain text editors and not every one of them uses the same default.

So what about these files created on Windows, where they may/may not start with BOM? Should/will server or UA get rid of this debris for me? Or is it my task to prepare clean UTF-8, i.e. open each file and check whether BOM needs to be removed?

© Server Fault or respective owner

Related posts about http

Related posts about utf-8