Search Results

Search found 4604 results on 185 pages for 'utf 8'.

Page 1/185 | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

How can I tell if a CSV is in UTF-7 or UTF-8

- by dru-zod

Excel seems to save CSV files in (what I think is) UTF-7, despite the fact that most information I have read suggest that in general, you should not UTF-7. Indeed, other applications (Text pad, which lets me choose) save things in UTF-8 (or Unicode etc, but UTF-7 is not even an option). Using .NET, I read the stream, and have to provide the encoding. If I get it wrong, accented characters are replaced with question marks. If I try and let StreamReader work it out (using detectEncodingFromByteOrderMarks), it gets it wrong (at least, it does if the file has been saved in Excel). It is unlikely that anything other then Excel will be used, so I could just assume UTF-7. Are there any other options? I need to support French (accented), German, Dutch, and Norwegian characters.

Read the article
Confused about C++'s std::wstring, UTF-16, UTF-8 and displaying strings in a windows GUI

- by dfrey

I'm working on a english only C++ program for Windows where we were told "always use std::wstring", but it seems like nobody on the team really has much of an understanding beyond that. I already read the question titled "std::wstring VS std::string. It was very helpful, but I still don't quite understand how to apply all of that information to my problem. The program I'm working on displays data in a Windows GUI. That data is persisted as XML. We often transform that XML using XSLT into HTML or XSL:FO for reporting purposes. My feeling based on what I have read is that the HTML should be encoded as UTF-8. I know very little about GUI development, but the little bit I have read indicates that the GUI stuff is all based on UTF-16 encoded strings. I'm trying to understand where this leaves me. Say we decide that all of our persisted data should be UTF-8 encoded XML. Does this mean that in order to display persisted data in a UI component, I should really be performing some sort of explicit UTF-8 to UTF-16 transcoding process? I suspect my explanation could use clarification, so I'll try to provide that if you have any questions.

Read the article
Why can't I change the AU_AU locale to en_US?

- by sachin

/bin/bash: warning: setlocale: LC_ALL: cannot change locale ( (unset)) Generating locales... en_US.ISO-8859-1... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done Generation complete. ganesha@ubuntu:~$ sudo update_locale LANG=en_US sudo: update_locale: command not found ganesha@ubuntu:~$ sudo update-locale LANG=en_US perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = "en_US:en", LC_ALL = " (unset)", LC_PAPER = "AU_AU.UTF-8", LC_ADDRESS = "AU_AU.UTF-8", LC_MONETARY = "AU_AU.UTF-8", LC_MEASUREMAUT = "AU_AU.UTF-8", LC_NUMERIC = "AU_AU.UTF-8", LC_TELEPHONE = "AU_AU.UTF-8", LC_MESSAGES = "AU_AU.UTF-8", LC_COLLATE = "AU_AU.UTF-8", LC_IDAUTIFICATION = "AU_AU.UTF-8", LC_CTYPE = "AU_AU.UTF-8", LC_TIME = "AU_AU.UTF-8", LC_NAME = "AU_AU.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). ganesha@ubuntu:~$ sudo update-locale LANG=en_US.UTF8 perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = "en_US:en", LC_ALL = " (unset)", LC_PAPER = "AU_AU.UTF-8", LC_ADDRESS = "AU_AU.UTF-8", LC_MONETARY = "AU_AU.UTF-8", LC_MEASUREMAUT = "AU_AU.UTF-8", LC_NUMERIC = "AU_AU.UTF-8", LC_TELEPHONE = "AU_AU.UTF-8", LC_MESSAGES = "AU_AU.UTF-8", LC_COLLATE = "AU_AU.UTF-8", LC_IDAUTIFICATION = "AU_AU.UTF-8", LC_CTYPE = "AU_AU.UTF-8", LC_TIME = "AU_AU.UTF-8", LC_NAME = "AU_AU.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). ganesha@ubuntu:~$ gedit ~/.profile (process:6467): Gtk-WARNING **: Locale not supported by C library. Using the fallback 'C' locale. ganesha@ubuntu:~$ sudo update-locale LANG=en_US.UTF8 perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = "en_US:en", LC_ALL = " (unset)", LC_PAPER = "AU_AU.UTF-8", LC_ADDRESS = "AU_AU.UTF-8", LC_MONETARY = "AU_AU.UTF-8", LC_MEASUREMAUT = "AU_AU.UTF-8", LC_NUMERIC = "AU_AU.UTF-8", LC_TELEPHONE = "AU_AU.UTF-8", LC_MESSAGES = "AU_AU.UTF-8", LC_COLLATE = "AU_AU.UTF-8", LC_IDAUTIFICATION = "AU_AU.UTF-8", LC_CTYPE = "AU_AU.UTF-8", LC_TIME = "AU_AU.UTF-8", LC_NAME = "AU_AU.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). ganesha@ubuntu:~$ sudo apt-get install --reinstall locales Reading package lists... Done Building dependency tree Reading state information... Done 0 upgraded, 0 newly installed, 1 reinstalled, 0 to remove and 0 not upgraded. Need to get 0 B/3359 kB of archives. After this operation, 0 B of additional disk space will be used. perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = "en_US:en", LC_ALL = " (unset)", LC_TIME = "AU_AU.UTF-8", LC_MONETARY = "AU_AU.UTF-8", LC_CTYPE = "AU_AU.UTF-8", LC_COLLATE = "AU_AU.UTF-8", LC_ADDRESS = "AU_AU.UTF-8", LC_TELEPHONE = "AU_AU.UTF-8", LC_MESSAGES = "AU_AU.UTF-8", LC_NAME = "AU_AU.UTF-8", LC_MEASUREMAUT = "AU_AU.UTF-8", LC_IDAUTIFICATION = "AU_AU.UTF-8", LC_NUMERIC = "AU_AU.UTF-8", LC_PAPER = "AU_AU.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). (Reading database ... 157848 files and directories currently installed.) Preparing to replace locales 2.13+git20120306-3 (using .../locales_2.13+git20120306-3_all.deb) ... Unpacking replacement locales ... Processing triggers for man-db ... Setting up locales (2.13+git20120306-3) ... /bin/bash: warning: setlocale: LC_ALL: cannot change locale ( (unset)) Generating locales... en_AG.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_AU.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_BW.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_CA.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_DK.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_GB.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_HK.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_IE.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_IN.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_NG.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_NZ.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_PH.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_SG.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_US.ISO-8859-1... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_US.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_ZA.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_ZM.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done en_ZW.UTF-8... /usr/sbin/locale-gen: line 177: warning: setlocale: LC_ALL: cannot change locale ( (unset)) done Generation complete. ganesha@ubuntu:~$ How to correct this?

Read the article
Is there any reason to prefer UTF-16 over UTF-8?

- by Oak

Examining the attributes of UTF-16 and UTF-8, I can't find any reason to prefer UTF-16. However, checking out Java and C#, it looks like strings and chars there default to UTF-16. I was thinking that it might be for historic reasons, or perhaps for performance reasons, but couldn't find any information. Anyone knows why these languages chose UTF-16? And is there any valid reason for me to do that as well?

Read the article
UTF GET parameter codification problem in JSP (JBoss 2.0.1)

- by GreyMen

I´m trying to take a string from a GET or POST parameter in JSP with some accents in UTF-8: <%@ page contentType="text/html; charset=UTF-8" pageEncoding="UTF-8" % <% request.setCharacterEncoding("UTF-8"); String value = request.getParameter("q"); out.print(value+" | aáa"); % The codification of the hardcoded string is correct but the codification of the value obtrained from the parameter (example: http://whatever/utf.jsp?q=a%E9a) it´s wrong... I have already modify the the server.xml dropping the URIEnconding UTF-8... So I don´t now what I have to do to show the data in the correct format... Any idea?

Read the article
Convert UTF-16 to UTF-8 under Windows and Linux, in C

- by DooriBar

I was wondering if there is a recommended 'cross' Windows and Linux method for the purpose of converting strings from UTF-16LE to UTF-8? or one should use different methods for each environment? I've managed to google few references to 'iconv' , but for somreason I can't find samples of basic conversions, such as - converting a wchar_t UTF-16 to UTF-8. Anybody can recommend a method that would be 'cross', and if you know of references or a guide with samples, would very appreciate it. Thanks, Doori Bar

Read the article
How to detect UTF-8-based encoded strings [closed]

- by Diego Sendra

A customer of asked us to build him a multi-language based support VB6 scraper, for which we had the need to detect UTF-8 based encoded strings to decode it later for proper displaying in application UI. It's necessary to point out that this need arises based on VB6 limitations to natively support UTF-8 in its controls, contrary to what it happens in .NET where you can tell a control that it should expect UTF-8 encoding. VB6 natively supports ISO 8859-1 and/or Windows-1252 encodings only, for which textboxes, dropdowns, listview controls, others can't be defined to natively support/expect UTF-8 as you can do in .NET considering what we just explained; so we would see weird symbols such as Ã©, Ã¨ among others, making it a whole mess at the time of displaying. So, next function contains whole UTF-8 encoded punctuation marks and symbols from languages like Spanish, Italian, German, Portuguese, French and others, based on an excellent UTF-8 based list we got from this link - Ref. http://home.telfort.nl/~t876506/utf8tbl.html Basically, the function compares if each and one of the listed UTF-8 encoded sentences, separated by | (pipe) are found in our passed string making a substring search first. Whether it's not found, it makes an alternative ASCII value based search to get a match. Say, a string like "Societé" (Society in english) would return FALSE through calling isUTF8("Societé") while it would return TRUE when calling isUTF8("SocietÃˆ") since Ãˆ is the UTF-8 encoded representation of é. Once you got it TRUE or FALSE, you can decode the string through DecodeUTF8() function for properly displaying it, a function we found somewhere else time ago and also included in this post. Function isUTF8(ByVal ptstr As String) Dim tUTFencoded As String Dim tUTFencodedaux Dim tUTFencodedASCII As String Dim ptstrASCII As String Dim iaux, iaux2 As Integer Dim ffound As Boolean ffound = False ptstrASCII = "" For iaux = 1 To Len(ptstr) ptstrASCII = ptstrASCII & Asc(Mid(ptstr, iaux, 1)) & "|" Next tUTFencoded = "Ã„|Ã…|Ã‡|Ã‰|Ã‘|Ã–|ÃŒ|Ã¡|Ã|Ã¢|Ã¤|Ã£|Ã¥|Ã§|Ã©|Ã¨|Ãª|Ã«|Ã|Ã¬|Ã®|Ã¯|Ã±|Ã³|Ã²|Ã´|Ã¶|Ãµ|Ãº|Ã¹|Ã»|Ã¼|â€|Â°|Â¢|Â£|Â§|â€¢|Â¶|ÃŸ|Â®|Â©|â„¢|Â´|Â¨|â‰|Ã†|Ã˜|âˆž|Â±|â‰¤|â‰¥|Â¥|Âµ|âˆ‚|âˆ‘|âˆ|Ï€|âˆ«|Âª|Âº|Î©|Ã¦|Ã¸|Â¿|Â¡|Â¬|âˆš|Æ’|â‰ˆ|âˆ†|Â«|Â»|â€¦|Â|Ã€|Ãƒ|Ã•|Å’|Å“|â€“|â€”|â€œ|â€|â€˜|â€™|Ã·|â—Š|Ã¿|Å¸|â„|â‚¬|â€¹|â€º|ï¬|ï¬‚|â€¡|Â·|â€š|â€ž|â€°|Ã‚|Ãš|Ã|Ã‹|Ãˆ|Ã|ÃŽ|Ã|ÃŒ|Ã“|Ã”|ï£¿|Ã’|Ãš|Ã›|Ã™|Ä±|Ë†|Ëœ|Â¯|Ë˜|Ë™|Ëš|Â¸|Ë|Ë›|Ë‡" & _ "Å|Å¡|Â¦|Â²|Â³|Â¹|Â¼|Â½|Â¾|Ã|Ã—|Ã|Ãž|Ã°|Ã½|Ã¾" & _ "â‰|âˆž|â‰¤|â‰¥|âˆ‚|âˆ‘|âˆ|Ï€|âˆ«|Î©|âˆš|â‰ˆ|âˆ†|â—Š|â„|ï¬|ï¬‚|ï£¿|Ä±|Ë˜|Ë™|Ëš|Ë|Ë›|Ë‡" tUTFencodedaux = Split(tUTFencoded, "|") If UBound(tUTFencodedaux) > 0 Then iaux = 0 Do While Not ffound And Not iaux > UBound(tUTFencodedaux) If InStr(1, ptstr, tUTFencodedaux(iaux), vbTextCompare) > 0 Then ffound = True End If If Not ffound Then 'ASCII numeric search tUTFencodedASCII = "" For iaux2 = 1 To Len(tUTFencodedaux(iaux)) 'gets ASCII numeric sequence tUTFencodedASCII = tUTFencodedASCII & Asc(Mid(tUTFencodedaux(iaux), iaux2, 1)) & "|" Next 'tUTFencodedASCII = Left(tUTFencodedASCII, Len(tUTFencodedASCII) - 1) 'compares numeric sequences If InStr(1, ptstrASCII, tUTFencodedASCII) > 0 Then ffound = True End If End If iaux = iaux + 1 Loop End If isUTF8 = ffound End Function Function DecodeUTF8(s) Dim i Dim c Dim n s = s & " " i = 1 Do While i <= Len(s) c = Asc(Mid(s, i, 1)) If c And &H80 Then n = 1 Do While i + n < Len(s) If (Asc(Mid(s, i + n, 1)) And &HC0) <> &H80 Then Exit Do End If n = n + 1 Loop If n = 2 And ((c And &HE0) = &HC0) Then c = Asc(Mid(s, i + 1, 1)) + &H40 * (c And &H1) Else c = 191 End If s = Left(s, i - 1) + Chr(c) + Mid(s, i + n) End If i = i + 1 Loop DecodeUTF8 = s End Function

Read the article
What if I put two kinds of encoded strings, say utf-8 and utf-16, in one file?

- by jonny

In Python, for example: f = open('test','w') f.write('this is a test\n'.encode('utf-16')) f.write('another test\n'.encode('utf-8')) f.close() That file gets messy when I re-open it: f = open("test") print f.readline().decode('utf-16') # it leads to UnicodeDecodeError print f.readline().decode('utf-8') # it works fine However if I keep the texts encoded in one style (say utf-16 only), it could read back ok. So I'm guessing mixing two types of encoding in the same file is wrong and couldn't be decoded back, even if I do know the encoding rules of each specific string? Any suggestion is welcome, thank you!

Read the article
Find non-ascii characters from a UTF-8 string

- by user10607

I need to find the non-ASCII characters from a UTF-8 string. my understanding: UTF-8 is a superset of character encoding in which 0-127 are ascii characters. So if in a UTF-8 string , a characters value is Not between 0-127, then it is not a ascii character , right? Please correct me if i'm wrong here. On the above understanding i have written following code in C : Note: I'm using the Ubuntu gcc compiler to run C code utf-string is xvab c long i; char arr[] = "xvab c"; printf("length : %lu \n", sizeof(arr)); for(i=0; i<sizeof(arr); i++){ char ch = arr[i]; if (isascii(ch)) printf("Ascii character %c\n", ch); else printf("Not ascii character %c\n", ch); } Which prints the output like: length : 9 Ascii character x Not ascii character Not ascii character ? Not ascii character ? Ascii character a Ascii character b Ascii character Ascii character c Ascii character To naked eye length of xvab c seems to be 6, but in code it is coming as 9 ? Correct answer for the xvab c is 1 ...i.e it has only 1 non-ascii character , but in above output it is coming as 3 (times Not ascii character). How can i find the non-ascii character from UTF-8 string, correctly. Please guide on the subject.

Read the article
Python UTF-16 encoding hex representation

- by Romeno

I have a string in Python 2.7.2 say u"\u0638". When I write it to file: f = open("J:\\111.txt", "w+") f.write(u"\u0638".encode('utf-16')) f.close() In hex it looks like: FF FE 38 06 When i print such a string to stdout i will see: '\xff\xfe8\x06'. The querstion: Where is \x38 in the string output to stdout? In other words why the string output to stdout is not '\xff\xfe\x38\x06'? If I write the string to file twice: f = open("J:\\111.txt", "w+") f.write(u"\u0638".encode('utf-16')) f.write(u"\u0638".encode('utf-16')) f.close() The hex representation in file contains byte order mark (BOM) \xff\xfe twice: FF FE 38 06 FF FE 38 06 I wonder what is the techique to avoid writting BOM in UTF-16 encoded strings?

Read the article
Reading a plist utf-8 value as utf-16

- by ennuikiller

I'm working on an iphone app that needs to display superscripts and subscripts. I'm using a picker to read in data from a plist but the unicode values aren't being displayed corretly in the pickerview. Subscripts and superscripts are not being recognized. I'm assuming this is due to the encoding of the plist as utf-8, so the question is how do a convert a plist string encoding from utf-8 to utf-16 ? Just a little more elaboration: If I do this it displays properly at least in a textfield: NSString *equation = @"x\u00B2 + y\u00B2 = z\u00B2" However if I define the same string in a plist and try to read it in and assign it to a string and display it on a pickerview it just displays the the encoding and not the superscripts. @Matt: thanks for your suggestion the unicode is being escaped that is \u00B2 = \u00B2. Googling for "escaped values in plists" returned no useful results, and I haven't been able to use the keyboard cmd-ctrl-shift-+ to work. Any further suggestions would be greatly appreciated!!

Read the article
Forcing a mixed ISO-8859-1 and UTF-8 multi-line string into UTF-8

- by knorv

Consider the following problem: A multi-line string $junk contains some lines which are encoded in UTF-8 and some in ISO-8859-1. I don't know a priori which lines are in which encoding, so heuristics will be needed. I want to turn $junk into pure UTF-8 with proper re-encoding of the ISO-8859-1 lines. Also, in the event of errors in the processing I want to provide a "best effort result" rather than throwing an error. My current attempt looks like this: $junk = &force_utf8($junk); sub force_utf8 { my $input = shift; my $output = ''; foreach my $line (split(/\n/, $input)) { if (utf8::valid($line)) { utf8::decode($line); } $output .= "$line\n"; } return $output; } While this appears to work I'm certain this is not the optimal solution. How would you improve my force_utf8(...) sub?

Read the article
remove non-UTF-8 characters from xml with declared encoding=utf-8 - Java

- by St Nietzke

I have to handle this scenario in Java: I'm getting a request in XML form from a client with declared encoding=utf-8. Unfortunately it may contain not utf-8 characters and there is a requirement to remove these characters from the xml on my side (legacy). Let's consider an example where this invalid XML contains £ (pound). 1) I get xml as java String with £ in it (I don't have access to interface right now, but I probably get xml as a java String). Can I use replaceAll(£, "") to get rid of this character? Any potential issues? 2) I get xml as an array of bytes - how to handle this operation safely in that case?

Read the article
Convert matched string of UTF-8 values to UTF-8 characters in Ruby

- by user1475154

Trying to convert output from a rest_client GET to the characters that are represented with escape sequences. Input: ..."sub_id":"\u0d9c\u8138\u8134\u3f30\u8139\u2b71"... (which I put in 'all_subs') Match: m = /sub_id\"\:\"([^\"]+)\"/.match(all_subs.to_str) [1] Print: puts m.force_encoding("UTF-8").unpack('U*').pack('U*') But it just comes out the same way I put it in. ie, "\u0d9c\u8138\u8134\u3f30\u8139\u2b71" However, if I convert a raw string of it: puts "\u0d9c\u8138\u8134\u3f30\u8139\u2b71".unpack('U*').pack('U*') The output is perfect as "??????"

Read the article
PHP utf encoding problem

- by shyam

How can I encode strings on UTF-16BE format in PHP? For "Demo Message!!!" the encoded string should be '00440065006D006F0020004D00650073007300610067006'. Also, I need to encode Arabic characters to this format.

Read the article
Can not input or print Chinese on PuTTY

- by hetaoblog

On Red Hat Enterprise Linux AS release 3, I've set my environment variable as below $ echo $LANG zh_CN.UTF-8 $ echo $LANGUAGE zh_CN.UTF-8 $ echo $SUPPORTED en_US.UTF-8:en_US:en:zh_CN.UTF-8 $ locale LANG=zh_CN.UTF-8 LC_CTYPE="zh_CN.UTF-8" LC_NUMERIC="zh_CN.UTF-8" LC_TIME="zh_CN.UTF-8" LC_COLLATE="zh_CN.UTF-8" LC_MONETARY="zh_CN.UTF-8" LC_MESSAGES="zh_CN.UTF-8" LC_PAPER="zh_CN.UTF-8" LC_NAME="zh_CN.UTF-8" LC_ADDRESS="zh_CN.UTF-8" LC_TELEPHONE="zh_CN.UTF-8" LC_MEASUREMENT="zh_CN.UTF-8" LC_IDENTIFICATION="zh_CN.UTF-8" LC_ALL=zh_CN.UTF-8 Meanwhile I've set PuTTY's transmission encoding as utf-8 and appearance-font setting to have a font as 'Fixedsys' which does support chinese. However, when I try to print a file with Chinese, it can not print it correctly $ cat 1.txt hello¦¦¦ $ and I can not input Chinese correctly on shell.

Read the article
All password with '$' inside (phpmyadmin) won't work [UTF-8 problem]

- by Tristan

Hello, i set up a dedicaced server with a tutorial. I set in PHP : mbstring.language=UTF-8 mbstring.internal_encoding=UTF-8 mbstring.http_input=UTF-8 mbstring.http_output=UTF-8 mbstring.detect_order=auto But each time there is a $ in the password (i have one for the root of mysql + other script) the password won't work. For example, I just removed the $ in the password for a script, and it worked. When i connect @root on mysql via phpmyadmin : don't work When i connect @root via PHP : works What can i do for this problem please ?

Read the article
Mod_rewrite with UTF-8 accent, multiviews , .htaccess

- by GuruJR

Problem: with Mod_rewrite, multiview & Apache config Introduction: The website is in french and i had problem with unicode encoding and mod_rewrite within php wihtout multiviews Old server was not handling utf8 correctly (somewhere between PHP, apache mod rewrite or mysql) Updated Server to Ubuntu 11.04 , the process was destructive lost all files in var/www/ (the site was mainly 2 files index.php & static.php) lost the site specific .Htaccess file lost MySQL dbs lost old apache.conf What i have done so far: What works: Setup GNutls for SSL, Listen 443 = port.conf Created 2 Vhosts in one file for :80 and :443 = website.conf Enforce SSL = Redirecting :80 to :443 with a mod_rewrite redirect Tried to set utf-8 everywhere.. Set charset and collation , db connection , mb_settings , names utf-8 and utf8_unicode_ci, everywhere (php,mysql,apache) to be sure to serve files as UTF-8 i enabled multiview renamed index.php.utf8.fr and static.php.utf8.fr With multiview enabled, Multibytes Accents in URL works SSL TLS 1.0 What dont work: With multiview enabled , mod_rewrite works for only one of my rewriterules With multiview Disabled, i loose access to the document root as "Forbidden" With multiview Disabled, i loose Multibytes (single charater accent) The Apache Default server is full of settings. (what can i safely remove ?) these are my configuration files so far :80 Vhost file (this one work you can use this to force redirect to https) RewriteEngine On RewriteCond %{HTTPS} off RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} LanguagePriority fr :443 Vhost file (GnuTls is working) DocumentRoot /var/www/x ServerName example.com ServerAlias www.example.com <Directory "/var/www/x"> allow from all Options FollowSymLinks +MultiViews AddLanguage fr .fr AddCharset UTF-8 .utf8 LanguagePriority fr </Directory> GnuTLSEnable on GnuTLSPriorities SECURE:+VERS-TLS1.1:+AES-256-CBC:+RSA:+SHA1:+COMP-NULL GnuTLSCertificateFile /path/to/certificate.crt GnuTLSKeyFile /path/to/certificate.key <Directory "/var/www/x/base"> </Directory> Basic .htaccess file AddDefaultCharset utf-8 Options FollowSymLinks +MultiViews RewriteEngine on RewriteRule ^api/$ /index.php.utf8.fr?v=4 [L,NC,R] RewriteRule ^contrib/$ /index.php.utf8.fr?v=2 [L,NC,R] RewriteRule ^coop/$ /index.php.utf8.fr?v=3 [L,NC,R] RewriteRule ^crowd/$ /index.php.utf8.fr?v=2 [L,NC,R] RewriteRule ^([^/]*)/([^/]*)$ /static.php.utf8.fr?VALUEONE=$2&VALUETWO=$1 [L] So my quesiton is whats wrong , what do i have missing is there extra settings that i need to kill from the apache default . in order to be sure all parts are using utf-8 at all time, and that my mod_rewrite rules work with accent Thank you all in advance for your help, I will follow this question closely , to add any needed information.

Read the article
Handling UTF-8 with BOM in HTTP

- by Alois Mahdal

Say I have a script which at some point serves a plain text file as a content (right after "\n\n"). These files are provided by users, but I can expect they will be UTF-8. So I hard-wire Content-Type: text/plain; charset=UTF-8. But while I can teach users to save everything in UTF-8, I can't be very sure that the files will be without BOM ("\xEE\xBB\xBF"), as at least on Windows, this is not very clearly distinguished in common plain text editors and not every one of them uses the same default. So what about these files created on Windows, where they may/may not start with BOM? Should/will server or UA get rid of this debris for me? Or is it my task to prepare clean UTF-8, i.e. open each file and check whether BOM needs to be removed?

Read the article
How to support 3-byte UTF-8 Characters in ANT

- by efelton

I am trying to support UTF-8 characters in my ANT script. As long as the character string are made up of 2-byte UTF-8 characters, such as: Lògìñ Ùsèr ÌÐ Then things work fine. When I use Unicode Han Character: ? Which, according to this site: http://www.fileformat.info/info/unicode/char/6211/index.htm Has a UTF-8 encoding of 0xE6 0x88 0x91 I can see in UltraEdit, my input properties file has the values "E6 88 91" all in a row, so I'm fairly confident that my input is correct. And When I open the same file in Notepad++ I can see all the characters correctly. Here is my Build Script: <?xml version="1.0" encoding="UTF-8" ?> <project name="utf8test" default="all" basedir="."> <target name="all"> <loadproperties encoding="UTF-8" srcfile="./apps.properties.all.txt" /> <echo>No encoding ${common.app.name}</echo> <echo encoding="UTF-8">UTF-8 ${common.app.name}</echo> <echo encoding="UnicodeLittle">UnicodeLittle ${common.app.name}</echo> <echo encoding="UnicodeLittleUnmarked">UnicodeLittleUnmarked ${common.app.name}</echo> <echo>${common.app.ServerName}</echo> <echo>${bb.vendor}</echo> <echo>No encoding ${common.app.UserIdText}</echo> <echo encoding="UTF-8">UTF-8 ${common.app.UserIdText}</echo> <echo encoding="UnicodeLittle">UnicodeLittle ${common.app.UserIdText}</echo> <echo encoding="UnicodeLittleUnmarked">UnicodeLittleUnmarked ${common.app.UserIdText}</echo> <echoproperties /> </target> </project> And here is my properties file: common.app=VrvPsLTst common.app.name=?? common.app.description=Pseudo Loc Test App for Build Script testing common.app.ServerName=http://Vèrìvò.com bb.vendor=Vèrìvò common.app.PasswordText=Pàsswòrð bb.override.list=MP_COPYRIGHTTEXT, "Çòpÿrìght 2012 Vèrívó Bùîlð TéàM" common.app.LoginButtonText=Lògìñ common.app.UserIdText=Ùsèr ÌÐ bb.SMSSuccess=Mèssàgéß Sùççêssfúllÿ Sëñt common.app.LoginScreenMessage=WèlçòMé Mêssàgë common.app.LoginProgressMessage=Àùthèñtìçàtíòñ îñ prógréss... ios.RegistrationText=Règìstràtíòñ Téxt ios.RegistrationURL=http://www.josscrowcroft.com/2011/code/utf-8-multibyte-characters-in-url-parameters-%E2%9C%93/ Here is what the output looks like: Buildfile: C:\Temp\utf8\build.xml all: [echo] No encoding ?? [echo] UTF-8 ?? [echo] ÿþU n i c o d e L i t t l e ? ? [echo] U n i c o d e L i t t l e U n m a r k e d ? ? [echo] http://Vèrìvò.com [echo] Vèrìvò [echo] No encoding Ùsèr ÌÐ [echo] UTF-8 Ã™sÃ¨r ÃŒÃ? [echo] ÿþU n i c o d e L i t t l e Ù s è r Ì Ð [echo] U n i c o d e L i t t l e U n m a r k e d Ù s è r Ì Ð [echoproperties] #Ant properties [echoproperties] #Mon Jun 18 15:25:13 EDT 2012 [echoproperties] ant.core.lib=C\:\\ant\\lib\\ant.jar [echoproperties] ant.file=C\:\\Temp\\utf8\\build.xml [echoproperties] ant.file.type=file [echoproperties] ant.file.type.utf8test=file [echoproperties] ant.file.utf8test=C\:\\Temp\\utf8\\build.xml [echoproperties] ant.home=c\:\\ant\\bin\\.. [echoproperties] ant.java.version=1.6 [echoproperties] ant.library.dir=C\:\\ant\\lib [echoproperties] ant.project.default-target=all [echoproperties] ant.project.invoked-targets=all [echoproperties] ant.project.name=utf8test [echoproperties] ant.version=Apache Ant version 1.8.1 compiled on April 30 2010 [echoproperties] awt.toolkit=sun.awt.windows.WToolkit [echoproperties] basedir=C\:\\Temp\\utf8 [echoproperties] bb.SMSSuccess=M\u00E8ss\u00E0g\u00E9\u00DF S\u00F9\u00E7\u00E7\u00EAssf\u00FAll\u00FF S\u00EB\u00F1t [echoproperties] bb.override.list=MP_COPYRIGHTTEXT, "\u00C7\u00F2p\u00FFr\u00ECght 2012 V\u00E8r\u00EDv\u00F3 B\u00F9\u00EEl\u00F0 T\u00E9\u00E0?" [echoproperties] bb.vendor=V\u00E8r\u00ECv\u00F2 [echoproperties] common.app=VrvPsLTst [echoproperties] common.app.LoginButtonText=L\u00F2g\u00EC\u00F1 [echoproperties] common.app.LoginProgressMessage=\u00C0\u00F9th\u00E8\u00F1t\u00EC\u00E7\u00E0t\u00ED\u00F2\u00F1 \u00EE\u00F1 pr\u00F3gr\u00E9ss... [echoproperties] common.app.LoginScreenMessage=W\u00E8l\u00E7\u00F2?\u00E9 M\u00EAss\u00E0g\u00EB [echoproperties] common.app.PasswordText=P\u00E0ssw\u00F2r\u00F0 [echoproperties] common.app.ServerName=http\://V\u00E8r\u00ECv\u00F2.com [echoproperties] common.app.UserIdText=\u00D9s\u00E8r \u00CC\u00D0 [echoproperties] common.app.description=Pseudo Loc Test App for Build Script testing [echoproperties] common.app.name=?? [echoproperties] file.encoding=Cp1252 [echoproperties] file.encoding.pkg=sun.io [echoproperties] file.separator=\\ [echoproperties] ios.RegistrationText=R\u00E8g\u00ECstr\u00E0t\u00ED\u00F2\u00F1 T\u00E9xt [echoproperties] ios.RegistrationURL=http\://www.josscrowcroft.com/2011/code/utf-8-multibyte-characters-in-url-parameters-%E2%9C%93/ [echoproperties] java.awt.graphicsenv=sun.awt.Win32GraphicsEnvironment [echoproperties] java.awt.printerjob=sun.awt.windows.WPrinterJob [echoproperties] java.class.path=c\:\\ant\\bin\\..\\lib\\ant-launcher.jar;C\:\\Temp\\utf8\\.\\;C\:\\Program Files (x86)\\Java\\jre7\\lib\\ext\\QTJava.zip;C\:\\ant\\lib\\ant-antlr.jar;C\:\\ant\\lib\\ant-apache-bcel.jar;C\:\\ant\\lib\\ant-apache-bsf.jar;C\:\\ant\\lib\\ant-apache-log4j.jar;C\:\\ant\\lib\\ant-apache-oro.jar;C\:\\ant\\lib\\ant-apache-regexp.jar;C\:\\ant\\lib\\ant-apache-resolver.jar;C\:\\ant\\lib\\ant-apache-xalan2.jar;C\:\\ant\\lib\\ant-commons-logging.jar;C\:\\ant\\lib\\ant-commons-net.jar;C\:\\ant\\lib\\ant-contrib-1.0b3.jar;C\:\\ant\\lib\\ant-jai.jar;C\:\\ant\\lib\\ant-javamail.jar;C\:\\ant\\lib\\ant-jdepend.jar;C\:\\ant\\lib\\ant-jmf.jar;C\:\\ant\\lib\\ant-jsch.jar;C\:\\ant\\lib\\ant-junit.jar;C\:\\ant\\lib\\ant-launcher.jar;C\:\\ant\\lib\\ant-netrexx.jar;C\:\\ant\\lib\\ant-nodeps.jar;C\:\\ant\\lib\\ant-starteam.jar;C\:\\ant\\lib\\ant-stylebook.jar;C\:\\ant\\lib\\ant-swing.jar;C\:\\ant\\lib\\ant-testutil.jar;C\:\\ant\\lib\\ant-trax.jar;C\:\\ant\\lib\\ant-weblogic.jar;C\:\\ant\\lib\\ant.jar;C\:\\ant\\lib\\bb-ant-tools.jar;C\:\\ant\\lib\\xercesImpl.jar;C\:\\ant\\lib\\xml-apis.jar;C\:\\Program Files\\Java\\jre7\\lib\\tools.jar [echoproperties] java.class.version=51.0 [echoproperties] java.endorsed.dirs=C\:\\Program Files\\Java\\jre7\\lib\\endorsed [echoproperties] java.ext.dirs=C\:\\Program Files\\Java\\jre7\\lib\\ext;C\:\\Windows\\Sun\\Java\\lib\\ext [echoproperties] java.home=C\:\\Program Files\\Java\\jre7 [echoproperties] java.io.tmpdir=C\:\\Users\\efelton\\AppData\\Local\\Temp\\ [echoproperties] java.library.path=C\:\\Windows\\SYSTEM32;C\:\\Windows\\Sun\\Java\\bin;C\:\\Windows\\system32;C\:\\Windows;C\:\\Windows\\SYSTEM32;C\:\\Windows;C\:\\Windows\\SYSTEM32\\WBEM;C\:\\Windows\\SYSTEM32\\WINDOWSPOWERSHELL\\V1.0\\;C\:\\PROGRAM FILES\\INTEL\\WIFI\\BIN\\;C\:\\PROGRAM FILES\\COMMON FILES\\INTEL\\WIRELESSCOMMON\\;C\:\\PROGRAM FILES (X86)\\MICROSOFT SQL SERVER\\100\\TOOLS\\BINN\\;C\:\\PROGRAM FILES\\MICROSOFT SQL SERVER\\100\\TOOLS\\BINN\\;C\:\\PROGRAM FILES\\MICROSOFT SQL SERVER\\100\\DTS\\BINN\\;C\:\\PROGRAM FILES (X86)\\MICROSOFT SQL SERVER\\100\\TOOLS\\BINN\\VSSHELL\\COMMON7\\IDE\\;C\:\\PROGRAM FILES (X86)\\MICROSOFT SQL SERVER\\100\\DTS\\BINN\\;C\:\\Program Files\\ThinkPad\\Bluetooth Software\\;C\:\\Program Files\\ThinkPad\\Bluetooth Software\\syswow64;C\:\\Program Files (x86)\\QuickTime\\QTSystem\\;C\:\\Program Files (x86)\\AccuRev\\bin;C\:\\Program Files\\Java\\jdk1.7.0_04\\bin;C\:\\Program Files (x86)\\IDM Computer Solutions\\UltraEdit\\;. [echoproperties] java.runtime.name=Java(TM) SE Runtime Environment [echoproperties] java.runtime.version=1.7.0_04-b22 [echoproperties] java.specification.name=Java Platform API Specification [echoproperties] java.specification.vendor=Oracle Corporation [echoproperties] java.specification.version=1.7 [echoproperties] java.vendor=Oracle Corporation [echoproperties] java.vendor.url=http\://java.oracle.com/ [echoproperties] java.vendor.url.bug=http\://bugreport.sun.com/bugreport/ [echoproperties] java.version=1.7.0_04 [echoproperties] java.vm.info=mixed mode [echoproperties] java.vm.name=Java HotSpot(TM) 64-Bit Server VM [echoproperties] java.vm.specification.name=Java Virtual Machine Specification [echoproperties] java.vm.specification.vendor=Oracle Corporation [echoproperties] java.vm.specification.version=1.7 [echoproperties] java.vm.vendor=Oracle Corporation [echoproperties] java.vm.version=23.0-b21 [echoproperties] line.separator=\r\n [echoproperties] os.arch=amd64 [echoproperties] os.name=Windows 7 [echoproperties] os.version=6.1 [echoproperties] path.separator=; [echoproperties] sun.arch.data.model=64 [echoproperties] sun.boot.class.path=C\:\\Program Files\\Java\\jre7\\lib\\resources.jar;C\:\\Program Files\\Java\\jre7\\lib\\rt.jar;C\:\\Program Files\\Java\\jre7\\lib\\sunrsasign.jar;C\:\\Program Files\\Java\\jre7\\lib\\jsse.jar;C\:\\Program Files\\Java\\jre7\\lib\\jce.jar;C\:\\Program Files\\Java\\jre7\\lib\\charsets.jar;C\:\\Program Files\\Java\\jre7\\lib\\jfr.jar;C\:\\Program Files\\Java\\jre7\\classes [echoproperties] sun.boot.library.path=C\:\\Program Files\\Java\\jre7\\bin [echoproperties] sun.cpu.endian=little [echoproperties] sun.cpu.isalist=amd64 [echoproperties] sun.desktop=windows [echoproperties] sun.io.unicode.encoding=UnicodeLittle [echoproperties] sun.java.command=org.apache.tools.ant.launch.Launcher -cp .;C\:\\Program Files (x86)\\Java\\jre7\\lib\\ext\\QTJava.zip [echoproperties] sun.java.launcher=SUN_STANDARD [echoproperties] sun.jnu.encoding=Cp1252 [echoproperties] sun.management.compiler=HotSpot 64-Bit Tiered Compilers [echoproperties] sun.os.patch.level=Service Pack 1 [echoproperties] user.country=US [echoproperties] user.dir=C\:\\Temp\\utf8 [echoproperties] user.home=C\:\\Users\\efelton [echoproperties] user.language=en [echoproperties] user.name=efelton [echoproperties] user.script= [echoproperties] user.timezone= [echoproperties] user.variant= BUILD SUCCESSFUL Total time: 1 second Thank you for your help EDIT\UPDATE 6/19/2012 I am developing in a Windows environment. I have installed a TTF from: http://freedesktop.org/wiki/Software/CJKUnifonts/Download I have updated UltraEdit to use the TTF and I can see the Chinese characters. <?xml version="1.0" encoding="UTF-8" ?> <project name="utf8test" default="all" basedir="."> <target name="all"> <echo>??</echo> <echo encoding="ISO-8859-1">ISO-8859-1 ??</echo> <echo encoding="UTF-8">UTF-8 ??</echo> <echo file="echo_output.txt" append="true" >?? ${line.separator}</echo> <echo file="echo_output.txt" append="true" encoding="ISO-8859-1">ISO-8859-1 ?? ${line.separator}</echo> <echo file="echo_output.txt" append="true" encoding="UTF-8">UTF-8 ?? ${line.separator}</echo> <echo file="echo_output.txt" append="true" encoding="UnicodeLittle">UnicodeLittle ?? ${line.separator}</echo> <echo file="echo_output.txt" append="true" encoding="UnicodeLittleUnmarked">UnicodeLittleUnmarked ?? ${line.separator}</echo> </target> </project> The output captured by running inside UltraEdit is: Buildfile: E:\temp\utf8\build.xml all: [echo] ?? [echo] ISO-8859-1 ?? [echo] UTF-8 ?? BUILD SUCCESSFUL Total time: 1 second And the echo_output.txt file shows up like this: ?? ISO-8859-1 ?? UTF-8 ?? ÿþU n i c o d e L i t t l e ? ? U n i c o d e L i t t l e U n m a r k e d ? ? So there appears to be somehting fundamentally wrong with how my ANT environment is set up since I cannot simply echo the character to the screen or to a file.

Read the article
php regex word boundary matching in utf-8

- by dontomaso

Hi, I have the following php code in a utf-8 php file: var_dump(setlocale(LC_CTYPE, 'de_DE.utf8', 'German_Germany.utf-8', 'de_DE', 'german')); var_dump(mb_internal_encoding()); var_dump(mb_internal_encoding('utf-8')); var_dump(mb_internal_encoding()); var_dump(mb_regex_encoding()); var_dump(mb_regex_encoding('utf-8')); var_dump(mb_regex_encoding()); var_dump(preg_replace('/\bweiß\b/iu', 'weiss', 'weißbier')); I would like the last regex to replace only full words and not parts of words. On my windows computer, it returns: string 'German_Germany.1252' (length=19) string 'ISO-8859-1' (length=10) boolean true string 'UTF-8' (length=5) string 'EUC-JP' (length=6) boolean true string 'UTF-8' (length=5) string 'weißbier' (length=9) On the webserver (linux), I get: string(10) "de_DE.utf8" string(10) "ISO-8859-1" bool(true) string(5) "UTF-8" string(10) "ISO-8859-1" bool(true) string(5) "UTF-8" string(9) "weissbier" Thus, the regex works as I expected on windows but not on linux. So the main question is, how should I write my regex to only match at word boundaries? A secondary questions is how I can let windows know that I want to use utf-8 in my php application.

Read the article
Vietnamese character in .NET Console Application (UTF-8)

- by DucDigital

Im trying to write down an UTF8 string (Vietnamese) into C# Console but no success. Im running on windows 7. I tried to use the Encoding class that convert string to char[] to byte[] and then to String, but no help, the string is input directly fron the database. Here is some example Tôi tên là Ð?c, cu?c s?ng th?t vui v? tuy?t v?i It does not show the special character like : Ð or ?... instead it show up ?, much worse with the Encoding class. Does anyone can try this out or know about this problem? Thank you My code static void Main(string[] args) { XDataContext _new = new XDataContext(); Console.OutputEncoding = Encoding.GetEncoding("UTF-8"); string srcString = _new.Posts.First().TITLE; Console.WriteLine(srcString); // Convert the UTF-16 encoded source string to UTF-8 and ASCII. byte[] utf8String = Encoding.UTF8.GetBytes(srcString); byte[] asciiString = Encoding.ASCII.GetBytes(srcString); // Write the UTF-8 and ASCII encoded byte arrays. Console.WriteLine("UTF-8 Bytes: {0}", BitConverter.ToString(utf8String)); Console.WriteLine("ASCII Bytes: {0}", BitConverter.ToString(asciiString)); // Convert UTF-8 and ASCII encoded bytes back to UTF-16 encoded // string and write. Console.WriteLine("UTF-8 Text : {0}", Encoding.UTF8.GetString(utf8String)); Console.WriteLine("ASCII Text : {0}", Encoding.ASCII.GetString(asciiString)); Console.WriteLine(Encoding.UTF8.GetString(utf8String)); Console.WriteLine(Encoding.ASCII.GetString(asciiString)); } and here is the outstanding output NhÃ bÃ¡o Ä‘i há»™i bÃ¡o XuÃ¢n UTF-8 Bytes: 4E-68-C3-A0-20-62-C3-A1-6F-20-C4-91-69-20-68-E1-BB-99-69-20-62-C3- A1-6F-20-58-75-C3-A2-6E ASCII Bytes: 4E-68-3F-20-62-3F-6F-20-3F-69-20-68-3F-69-20-62-3F-6F-20-58-75-3F- 6E UTF-8 Text : NhÃ bÃ¡o Ä‘i há»™i bÃ¡o XuÃ¢n ASCII Text : Nh? b?o ?i h?i b?o Xu?n NhÃ bÃ¡o Ä‘i há»™i bÃ¡o XuÃ¢n Nh? b?o ?i h?i b?o Xu?n Press any key to continue . . .

Read the article
how to properly display utf encoded characters on my utf-8 encoded page?

- by Ali

Hi guys I'm retrieving emails and some of my emails have utf encoded text. However even though my page is encoded as utf 8 - in some places when I try to out put utf text I get funny characters like : =?utf-8?B?Rlc6INqp24zYpyDYotm+INin2LMg2YXYs9qp2LHYp9uB2bkg2qnbjCDZhtmC?= =?utf-8?B?2YQg2qnYsdiz2qnYqtuSINuB24zaug==?= Whereas in other areas of the same page it displays fine. WHats going on?

Read the article
Getting this error in rails: `ActionView::Template::Error (code converter not found (UTF-8 to UTF-16))`

- by Victor S

I've started getting the following error in one of my views for some reason, I don't get it in development, but only in production. Here is a snippet of the backtrace, any ideas? ActionView::Template::Error (code converter not found (UTF-8 to UTF-16)): 19: [title, summary.gsub(/^/, " "), nil].join("\n\n") 20: end 21: end.join 22: sections = sections.force_encoding('UTF-8').encode('UTF-16', :invalid => :replace).encode('UTF-8') if sections.respond_to?(:force_encoding) 23: %> 24: 25: <%= raw sections %>

Read the article
UTF-8 and !# shell scripts

- by sal

Is there a way to configure bash on Linux (red hat and ubuntu) to allow shell scripts to be encoded in UTF-8? I can't find a simple way to change just one little thing and have the whole system just use UTF-8 files without having to worry about encoding.

Read the article

1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >