Java UTF-8 to ASCII conversion with supplements

Posted by bozo on Stack Overflow See other posts from Stack Overflow or by bozo
Published on 2010-03-30T12:43:19Z Indexed on 2010/03/30 13:13 UTC
Read the original article Hit count: 438

Filed under:

we are accepting all sorts of national characters in UTF-8 string on the input, and we need to convert them to ASCII string on the output for some legacy use. (we don't accept Chinese and Japanese chars, only European languages)

We have a small utility to get rid of all the diacritics:

public static final String toBaseCharacters(final String sText) {
    if (sText == null || sText.length() == 0)
        return sText;

    final char[] chars = sText.toCharArray();
    final int iSize = chars.length;
    final StringBuilder sb = new StringBuilder(iSize);

    for (int i = 0; i < iSize; i++) {
        String sLetter = new String(new char[] { chars[i] });
        sLetter = Normalizer.normalize(sLetter, Normalizer.Form.NFC);

        try {
            byte[] bLetter = sLetter.getBytes("UTF-8");
            sb.append((char) bLetter[0]);
        } catch (UnsupportedEncodingException e) {
        }
    }
    return sb.toString();
}

The question is how to replace all the german sharp s (ß, Ð, d) and other characters that get through the above normalization method, with their supplements (in case of ß, supplement would probably be "ss" and in case od Ð supplement would be either "D" or "Dj").

Is there some simple way to do it, without million of .replaceAll() calls?

So for example: Ðonardan = Djonardan, Blaß = Blass and so on.

We can replace all "problematic" chars with empty space, but would like to avoid this to make the output as similar to the input as possible.

Thank you for your answers,

Bozo

Developer IT

Java UTF-8 to ASCII conversion with supplements - Developer IT

Java UTF-8 to ASCII conversion with supplements

java

special-characters

character-encoding

Related posts about java

Tomcat 6: Access Control Exception?

Problem in creation MDB Queue connection at Jboss StartUp

failing to establish connection between Postgres db and gwt

failing to establish connection between postgre db and gwt

Migration and deployement problems JBoss 4.2.2.GA to JBoss 6.0.0.M2

Related posts about special-characters

Insert Special Characters & Coding in Online Forms in Firefox

Insert Special Characters & Coding in Online Forms in Firefox

removing special characters in asp

Batch file script to remove special characters from filenames (Windows)

Drupal node_save and special characters.

Categories cloud