How to map code points to unicode characters depending on the font used?

Posted by Alex Schröder on Stack Overflow See other posts from Stack Overflow or by Alex Schröder
Published on 2012-10-09T15:16:38Z Indexed on 2012/10/09 15:37 UTC
Read the original article Hit count: 233

Filed under:
|
|
|
|

The client prints labels and has been using a set of symbolic (?) fonts to do this. The application uses a single byte database (Oracle with Latin-1). The old application I am replacing was not Unicode aware. It somehow did OK. The replacement application I am writing is supposed to handle the old data.

The symbols picked from the charmap application often map to particular Unicode characters, but sometimes they don't. What looks like the Moon using the LAB3 font, for example, is in fact U+2014 (EM DASH). When users paste this character into a Swing text field, the character has the code point 8212. It was "moved" into the Private Use Area (by Windows? Java?). When saving this character to the database, Oracle decides that it cannot be safely encoded and replaces it with the dreaded ¿. Thus, I started shifting the characters by 8000: -= 8000 when saving, += 8000 when displaying the field. Unfortunately I discovered that other characters were not shifted by the same amount. In one particular font, for example, ž has the code point 382, so I shifted it by +/-256 to "fix" it.

By now I'm dreading the discovery of more strange offsets and I wonder: Can I get at this mapping using Java? Perhaps the TTF font has a list of the 255 glyphs it encodes and what Unicode characters those correspond to and I can do it "right"?

Right now I'm using the following kludge:

static String fromDatabase(String str, String fontFamily) {

  if (str != null && fontFamily != null) {
    Font font = new Font(fontFamily, Font.PLAIN, 1);

    boolean changed = false;
    char[] chars = str.toCharArray();
    for (int i = 0; i < chars.length; i++) {
      if (font.canDisplay(chars[i] + 0xF000)) {
        // WE8MSWIN1252 + WinXP
        chars[i] += 0xF000;
        changed = true;
      }
      else if (chars[i] >= 128 && font.canDisplay(chars[i] + 8000)) {
        // WE8ISO8859P1 + WinXP
        chars[i] += 8000;
        changed = true;
      }
      else if (font.canDisplay(chars[i] + 256)) {
        // ž in LAB1 Eastern = 382
        chars[i] += 256;
        changed = true;
      }
    }
    if (changed) str = new String(chars);
  }
  return str;
}

static String toDatabase(String str, String fontFamily) {

  if (str != null && fontFamily != null) {
    boolean changed = false;
    char[] chars = str.toCharArray();
    for (int i = 0; i < chars.length; i++) {
      int chr = chars[i];
      if (chars[i] > 0xF000) {
        // WE8MSWIN1252 + WinXP
        chars[i] -= 0xF000;
        changed = true;
      }
      else if (chars[i] > 8000) {
        // WE8ISO8859P1 + WinXP
        chars[i] = (char) (chars[i] - 8000);
        changed = true;
      }
      else if (chars[i] > 256) {
        // ž in LAB1 Eastern = 382
        chars[i] = (char) (chars[i] - 256);
        changed = true;
      }
    }
    if (changed) return new String(chars);
  }

  return str;
}

© Stack Overflow or respective owner

Related posts about java

Related posts about Windows