How to support 3-byte UTF-8 Characters in ANT

Posted by efelton on Stack Overflow See other posts from Stack Overflow or by efelton
Published on 2012-06-18T19:38:04Z Indexed on 2012/06/19 15:16 UTC
Read the original article Hit count: 754

Filed under:
|

I am trying to support UTF-8 characters in my ANT script.

As long as the character string are made up of 2-byte UTF-8 characters, such as:

  • Lògìñ
  • Ùsèr ÌÐ

Then things work fine.

When I use Unicode Han Character:

  • ?

Which, according to this site: http://www.fileformat.info/info/unicode/char/6211/index.htm Has a UTF-8 encoding of 0xE6 0x88 0x91

I can see in UltraEdit, my input properties file has the values "E6 88 91" all in a row, so I'm fairly confident that my input is correct. And When I open the same file in Notepad++ I can see all the characters correctly.

Here is my Build Script:

    <?xml version="1.0" encoding="UTF-8" ?>

    <project
        name="utf8test"
        default="all"
        basedir=".">

        <target name="all">
            <loadproperties  encoding="UTF-8" srcfile="./apps.properties.all.txt"  />

            <echo>No encoding ${common.app.name}</echo>
            <echo encoding="UTF-8">UTF-8 ${common.app.name}</echo>
            <echo encoding="UnicodeLittle">UnicodeLittle ${common.app.name}</echo>
            <echo encoding="UnicodeLittleUnmarked">UnicodeLittleUnmarked ${common.app.name}</echo>
            <echo>${common.app.ServerName}</echo>
            <echo>${bb.vendor}</echo>

            <echo>No encoding ${common.app.UserIdText}</echo>
            <echo encoding="UTF-8">UTF-8 ${common.app.UserIdText}</echo>
            <echo encoding="UnicodeLittle">UnicodeLittle ${common.app.UserIdText}</echo>
            <echo encoding="UnicodeLittleUnmarked">UnicodeLittleUnmarked ${common.app.UserIdText}</echo>

            <echoproperties />
      </target>
    </project>

And here is my properties file:

    common.app=VrvPsLTst
    common.app.name=??
    common.app.description=Pseudo Loc Test App for Build Script testing
    common.app.ServerName=http://Vèrìvò.com
    bb.vendor=Vèrìvò
    common.app.PasswordText=Pàsswòrð
    bb.override.list=MP_COPYRIGHTTEXT, "Çòpÿrìght 2012 Vèrívó Bùîlð TéàM"
    common.app.LoginButtonText=Lògìñ
    common.app.UserIdText=Ùsèr ÌÐ
    bb.SMSSuccess=Mèssàgéß Sùççêssfúllÿ Sëñt
    common.app.LoginScreenMessage=WèlçòMé Mêssàgë
    common.app.LoginProgressMessage=Àùthèñtìçàtíòñ îñ prógréss...
    ios.RegistrationText=Règìstràtíòñ Téxt
    ios.RegistrationURL=http://www.josscrowcroft.com/2011/code/utf-8-multibyte-characters-in-url-parameters-%E2%9C%93/

Here is what the output looks like:

    Buildfile: C:\Temp\utf8\build.xml

    all:
         [echo] No encoding ??
         [echo] UTF-8 ??
         [echo] ÿþU n i c o d e L i t t l e   ? ? 
         [echo] U n i c o d e L i t t l e U n m a r k e d   ? ? 
         [echo] http://Vèrìvò.com
         [echo] Vèrìvò
         [echo] No encoding Ùsèr ÌÐ
         [echo] UTF-8 Ùsèr Ì�
         [echo] ÿþU n i c o d e L i t t l e   Ù s è r   Ì Ð 
         [echo] U n i c o d e L i t t l e U n m a r k e d   Ù s è r   Ì Ð 
    [echoproperties] #Ant properties
    [echoproperties] #Mon Jun 18 15:25:13 EDT 2012
    [echoproperties] ant.core.lib=C\:\\ant\\lib\\ant.jar
    [echoproperties] ant.file=C\:\\Temp\\utf8\\build.xml
    [echoproperties] ant.file.type=file
    [echoproperties] ant.file.type.utf8test=file
    [echoproperties] ant.file.utf8test=C\:\\Temp\\utf8\\build.xml
    [echoproperties] ant.home=c\:\\ant\\bin\\..
    [echoproperties] ant.java.version=1.6
    [echoproperties] ant.library.dir=C\:\\ant\\lib
    [echoproperties] ant.project.default-target=all
    [echoproperties] ant.project.invoked-targets=all
    [echoproperties] ant.project.name=utf8test
    [echoproperties] ant.version=Apache Ant version 1.8.1 compiled on April 30 2010
    [echoproperties] awt.toolkit=sun.awt.windows.WToolkit
    [echoproperties] basedir=C\:\\Temp\\utf8
    [echoproperties] bb.SMSSuccess=M\u00E8ss\u00E0g\u00E9\u00DF S\u00F9\u00E7\u00E7\u00EAssf\u00FAll\u00FF S\u00EB\u00F1t
    [echoproperties] bb.override.list=MP_COPYRIGHTTEXT, "\u00C7\u00F2p\u00FFr\u00ECght 2012 V\u00E8r\u00EDv\u00F3 B\u00F9\u00EEl\u00F0 T\u00E9\u00E0?"
    [echoproperties] bb.vendor=V\u00E8r\u00ECv\u00F2
    [echoproperties] common.app=VrvPsLTst
    [echoproperties] common.app.LoginButtonText=L\u00F2g\u00EC\u00F1
    [echoproperties] common.app.LoginProgressMessage=\u00C0\u00F9th\u00E8\u00F1t\u00EC\u00E7\u00E0t\u00ED\u00F2\u00F1 \u00EE\u00F1 pr\u00F3gr\u00E9ss...
    [echoproperties] common.app.LoginScreenMessage=W\u00E8l\u00E7\u00F2?\u00E9 M\u00EAss\u00E0g\u00EB
    [echoproperties] common.app.PasswordText=P\u00E0ssw\u00F2r\u00F0
    [echoproperties] common.app.ServerName=http\://V\u00E8r\u00ECv\u00F2.com
    [echoproperties] common.app.UserIdText=\u00D9s\u00E8r \u00CC\u00D0
    [echoproperties] common.app.description=Pseudo Loc Test App for Build Script testing
    [echoproperties] common.app.name=??
    [echoproperties] file.encoding=Cp1252
    [echoproperties] file.encoding.pkg=sun.io
    [echoproperties] file.separator=\\
    [echoproperties] ios.RegistrationText=R\u00E8g\u00ECstr\u00E0t\u00ED\u00F2\u00F1 T\u00E9xt
    [echoproperties] ios.RegistrationURL=http\://www.josscrowcroft.com/2011/code/utf-8-multibyte-characters-in-url-parameters-%E2%9C%93/
    [echoproperties] java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment
    [echoproperties] java.awt.printerjob=sun.awt.windows.WPrinterJob
    [echoproperties] java.class.path=c\:\\ant\\bin\\..\\lib\\ant-launcher.jar;C\:\\Temp\\utf8\\.\\;C\:\\Program Files (x86)\\Java\\jre7\\lib\\ext\\QTJava.zip;C\:\\ant\\lib\\ant-antlr.jar;C\:\\ant\\lib\\ant-apache-bcel.jar;C\:\\ant\\lib\\ant-apache-bsf.jar;C\:\\ant\\lib\\ant-apache-log4j.jar;C\:\\ant\\lib\\ant-apache-oro.jar;C\:\\ant\\lib\\ant-apache-regexp.jar;C\:\\ant\\lib\\ant-apache-resolver.jar;C\:\\ant\\lib\\ant-apache-xalan2.jar;C\:\\ant\\lib\\ant-commons-logging.jar;C\:\\ant\\lib\\ant-commons-net.jar;C\:\\ant\\lib\\ant-contrib-1.0b3.jar;C\:\\ant\\lib\\ant-jai.jar;C\:\\ant\\lib\\ant-javamail.jar;C\:\\ant\\lib\\ant-jdepend.jar;C\:\\ant\\lib\\ant-jmf.jar;C\:\\ant\\lib\\ant-jsch.jar;C\:\\ant\\lib\\ant-junit.jar;C\:\\ant\\lib\\ant-launcher.jar;C\:\\ant\\lib\\ant-netrexx.jar;C\:\\ant\\lib\\ant-nodeps.jar;C\:\\ant\\lib\\ant-starteam.jar;C\:\\ant\\lib\\ant-stylebook.jar;C\:\\ant\\lib\\ant-swing.jar;C\:\\ant\\lib\\ant-testutil.jar;C\:\\ant\\lib\\ant-trax.jar;C\:\\ant\\lib\\ant-weblogic.jar;C\:\\ant\\lib\\ant.jar;C\:\\ant\\lib\\bb-ant-tools.jar;C\:\\ant\\lib\\xercesImpl.jar;C\:\\ant\\lib\\xml-apis.jar;C\:\\Program Files\\Java\\jre7\\lib\\tools.jar
    [echoproperties] java.class.version=51.0
    [echoproperties] java.endorsed.dirs=C\:\\Program Files\\Java\\jre7\\lib\\endorsed
    [echoproperties] java.ext.dirs=C\:\\Program Files\\Java\\jre7\\lib\\ext;C\:\\Windows\\Sun\\Java\\lib\\ext
    [echoproperties] java.home=C\:\\Program Files\\Java\\jre7
    [echoproperties] java.io.tmpdir=C\:\\Users\\efelton\\AppData\\Local\\Temp\\
    [echoproperties] java.library.path=C\:\\Windows\\SYSTEM32;C\:\\Windows\\Sun\\Java\\bin;C\:\\Windows\\system32;C\:\\Windows;C\:\\Windows\\SYSTEM32;C\:\\Windows;C\:\\Windows\\SYSTEM32\\WBEM;C\:\\Windows\\SYSTEM32\\WINDOWSPOWERSHELL\\V1.0\\;C\:\\PROGRAM FILES\\INTEL\\WIFI\\BIN\\;C\:\\PROGRAM FILES\\COMMON FILES\\INTEL\\WIRELESSCOMMON\\;C\:\\PROGRAM FILES (X86)\\MICROSOFT SQL SERVER\\100\\TOOLS\\BINN\\;C\:\\PROGRAM FILES\\MICROSOFT SQL SERVER\\100\\TOOLS\\BINN\\;C\:\\PROGRAM FILES\\MICROSOFT SQL SERVER\\100\\DTS\\BINN\\;C\:\\PROGRAM FILES (X86)\\MICROSOFT SQL SERVER\\100\\TOOLS\\BINN\\VSSHELL\\COMMON7\\IDE\\;C\:\\PROGRAM FILES (X86)\\MICROSOFT SQL SERVER\\100\\DTS\\BINN\\;C\:\\Program Files\\ThinkPad\\Bluetooth Software\\;C\:\\Program Files\\ThinkPad\\Bluetooth Software\\syswow64;C\:\\Program Files (x86)\\QuickTime\\QTSystem\\;C\:\\Program Files (x86)\\AccuRev\\bin;C\:\\Program Files\\Java\\jdk1.7.0_04\\bin;C\:\\Program Files (x86)\\IDM Computer Solutions\\UltraEdit\\;.
    [echoproperties] java.runtime.name=Java(TM) SE Runtime Environment
    [echoproperties] java.runtime.version=1.7.0_04-b22
    [echoproperties] java.specification.name=Java Platform API Specification
    [echoproperties] java.specification.vendor=Oracle Corporation
    [echoproperties] java.specification.version=1.7
    [echoproperties] java.vendor=Oracle Corporation
    [echoproperties] java.vendor.url=http\://java.oracle.com/
    [echoproperties] java.vendor.url.bug=http\://bugreport.sun.com/bugreport/
    [echoproperties] java.version=1.7.0_04
    [echoproperties] java.vm.info=mixed mode
    [echoproperties] java.vm.name=Java HotSpot(TM) 64-Bit Server VM
    [echoproperties] java.vm.specification.name=Java Virtual Machine Specification
    [echoproperties] java.vm.specification.vendor=Oracle Corporation
    [echoproperties] java.vm.specification.version=1.7
    [echoproperties] java.vm.vendor=Oracle Corporation
    [echoproperties] java.vm.version=23.0-b21
    [echoproperties] line.separator=\r\n
    [echoproperties] os.arch=amd64
    [echoproperties] os.name=Windows 7
    [echoproperties] os.version=6.1
    [echoproperties] path.separator=;
    [echoproperties] sun.arch.data.model=64
    [echoproperties] sun.boot.class.path=C\:\\Program Files\\Java\\jre7\\lib\\resources.jar;C\:\\Program Files\\Java\\jre7\\lib\\rt.jar;C\:\\Program Files\\Java\\jre7\\lib\\sunrsasign.jar;C\:\\Program Files\\Java\\jre7\\lib\\jsse.jar;C\:\\Program Files\\Java\\jre7\\lib\\jce.jar;C\:\\Program Files\\Java\\jre7\\lib\\charsets.jar;C\:\\Program Files\\Java\\jre7\\lib\\jfr.jar;C\:\\Program Files\\Java\\jre7\\classes
    [echoproperties] sun.boot.library.path=C\:\\Program Files\\Java\\jre7\\bin
    [echoproperties] sun.cpu.endian=little
    [echoproperties] sun.cpu.isalist=amd64
    [echoproperties] sun.desktop=windows
    [echoproperties] sun.io.unicode.encoding=UnicodeLittle
    [echoproperties] sun.java.command=org.apache.tools.ant.launch.Launcher -cp .;C\:\\Program Files (x86)\\Java\\jre7\\lib\\ext\\QTJava.zip
    [echoproperties] sun.java.launcher=SUN_STANDARD
    [echoproperties] sun.jnu.encoding=Cp1252
    [echoproperties] sun.management.compiler=HotSpot 64-Bit Tiered Compilers
    [echoproperties] sun.os.patch.level=Service Pack 1
    [echoproperties] user.country=US
    [echoproperties] user.dir=C\:\\Temp\\utf8
    [echoproperties] user.home=C\:\\Users\\efelton
    [echoproperties] user.language=en
    [echoproperties] user.name=efelton
    [echoproperties] user.script=
    [echoproperties] user.timezone=
    [echoproperties] user.variant=

    BUILD SUCCESSFUL
    Total time: 1 second

Thank you for your help

EDIT\UPDATE 6/19/2012

I am developing in a Windows environment.

I have installed a TTF from: http://freedesktop.org/wiki/Software/CJKUnifonts/Download

I have updated UltraEdit to use the TTF and I can see the Chinese characters.

    <?xml version="1.0" encoding="UTF-8" ?>

    <project name="utf8test" default="all" basedir="."> 

    <target name="all"> 


      <echo>??</echo>
      <echo encoding="ISO-8859-1">ISO-8859-1 ??</echo> 
      <echo encoding="UTF-8">UTF-8 ??</echo> 


      <echo file="echo_output.txt" append="true" >?? ${line.separator}</echo>
      <echo file="echo_output.txt" append="true"  encoding="ISO-8859-1">ISO-8859-1 ?? ${line.separator}</echo> 
      <echo file="echo_output.txt" append="true"  encoding="UTF-8">UTF-8 ?? ${line.separator}</echo> 
      <echo file="echo_output.txt" append="true"  encoding="UnicodeLittle">UnicodeLittle ?? ${line.separator}</echo> 
      <echo file="echo_output.txt" append="true"  encoding="UnicodeLittleUnmarked">UnicodeLittleUnmarked ?? ${line.separator}</echo> 

    </target> 
    </project> 

The output captured by running inside UltraEdit is: Buildfile: E:\temp\utf8\build.xml

    all:
         [echo] ??
         [echo] ISO-8859-1 ??
         [echo] UTF-8 ??

    BUILD SUCCESSFUL
    Total time: 1 second

And the echo_output.txt file shows up like this:

    ?? 
    ISO-8859-1 ?? 
    UTF-8 ?? 
    ÿþU n i c o d e L i t t l e   ? ?   

     U n i c o d e L i t t l e U n m a r k e d   ? ?   

So there appears to be somehting fundamentally wrong with how my ANT environment is set up since I cannot simply echo the character to the screen or to a file.

© Stack Overflow or respective owner

Related posts about ant

Related posts about utf-8