Normalizing (webdav) unicode paths

Posted by Evert on Stack Overflow See other posts from Stack Overflow or by Evert
Published on 2010-03-26T11:04:53Z Indexed on 2010/03/26 12:23 UTC
Read the original article Hit count: 415

Filed under:
|
|
|
|

Hi guys,

I'm working on a WebDAV implementation for PHP. In order to make it easier for Windows and other operating systems to work together, I need jump through some character encoding hoops.

Windows uses ISO-8859-1 in it's HTTP request, while most other clients encode anything beyond ascii as UTF-8.

My first approach was to ignore this altogether, but I quickly ran into issues when returning urls. I then figured it's probably best to normalize all urls.

Using u¨ as an example. This will get sent over the wire by OS/X as

u%CC%88 (this is codepoint U+0308)

Windows sents this as:

%FC (latin1)

But, doing a utf8_encode on %FC, I get :

%C3%BC (this is codepoint U+00FC)

Should I treat %C3%BC and u%CC%88 as the same thing? If so.. how? Not touching it seems to work OK for windows. It somehow understands that it's a unicode character, but updating the same file throws an error (for no particular reason).

I'd be happy to provide more information.

© Stack Overflow or respective owner

Related posts about php

Related posts about webdav