I have some legacy code I am dealing with (so no I can't just use a URL with an encoded filename component) that allows a user to download a file from our website. Since our filenames are often in many different languages they are all stored as UTF-8. I wrote some code to handle the RFC5987 conversion to a proper filename* parameter. This works great until I have a filename with non-ascii characters and spaces. Per RFC, the space character is not part of attr_char so it gets encoded as %20. I have new versions of Chrome as well as Firefox and they are all converting to %20 to + on download. I have tried not encoding the space and putting the encoded filename in quotes and get the same result. I have sniffed the response coming from the server to verify that the servlet container wasn't mucking with my headers and they look correct to me. The RFC even has examples that contain %20. Am I missing something, or do all of these browsers have a bug related to this?
Many thanks in advance. The code I use to encode the filename is below.
Peter
public static boolean bcsrch(final char[] chars, final char c) {
final int len = chars.length;
int base = 0;
int last = len - 1; /* Last element in table */
int p;
while (last >= base) {
p = base + ((last - base) >> 1);
if (c == chars[p])
return true; /* Key found */
else if (c < chars[p])
last = p - 1;
else
base = p + 1;
}
return false; /* Key not found */
}
public static String rfc5987_encode(final String s) {
final int len = s.length();
final StringBuilder sb = new StringBuilder(len << 1);
final char[] digits = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};
final char[] attr_char = {'!','#','$','&','\'','+','-','.','0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','^','_','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','|', '~'};
for (int i = 0; i < len; ++i) {
final char c = s.charAt(i);
if (bcsrch(attr_char, c))
sb.append(c);
else {
final char[] encoded = {'%', 0, 0};
encoded[1] = digits[0x0f & (c >>> 4)];
encoded[2] = digits[c & 0x0f];
sb.append(encoded);
}
}
return sb.toString();
}
Update
Here is a screen shot of the download dialog I get for a file with Chinese characters with spaces as mentioned in my comment.