utf - Page 7 - Developer IT

Dreamweaver utf-8 encoded php page displays wrong chinese character in IE and Chrome, correct in FF

- by user1334485

I have an issue with character encoding: I have this page: http://www.studiomille.jp/class/ (its in japanese but the character in question is from chinese i think) FF shows it correctly, IE (all versions) and Chrome doesn't (sorry larger screenshots): FF Screenshot: IE Screenshot: (there are other character that are different throughout the site, this is just one example) Everything is set to UTF-8: * PHP sends header: Content-Type:text/html; charset=UTF-8 * PHP starts with: mb_language('uni'); mb_internal_encoding('UTF-8'); * meta tag: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/> * all files are saved with UTF-8 encoding with DreamWeaver CS3 * the same font is used in all the browsers. On that page nothing comes from the db, everything is hard coded. The site has the same behavior on my localhost too. So why only FF gets it right and how can I make it work on IE also?

Read the article

Servlet receiving data both in ISO-8859-1 and UTF-8. How to URL-decode?

- by AJPerez

I've a web application (well, in fact is just a servlet) which receives data from 3 different sources: Source A is a HTML document written in UTF-8, and sends the data via <form method="get">. Source B is written in ISO-8859-1, and sends the data via <form method="get">, too. Source C is written in ISO-8859-1, and sends the data via <a href="http://my-servlet-url?param=value&param2=value2&etc">. The servlet receives the request params and URL-decodes them using UTF-8. As you can expect, A works without problems, while B and C fail (you can't URL-decode in UTF-8 something that's encoded in ISO-8859-1...). I can make slight modifications to B and C, but I am not allowed to change them from ISO-8859-1 to UTF-8, which would solve all the problems. In B, I've been able to solve the problem by adding accept-charset="UTF-8" to the <form>. So the <form> sends the data in UTF-8 even with the page being ISO. What can I do to fix C? Alternatively, is there any way to determine the charset on the servlet, so I can call URL-decode with the right encoding in each case?

Read the article

What are the commonly confused encodings that may result in identical test data?

- by makerofthings7

I'm fixing code that is using ASCIIEncoding in some places and UTF-8 encoding in other functions. Since we aren't using the UTF-8 features, all of our unit tests passed, but I want to create a heightened awareness of encodings that produce similar results and may not be fully tested. I don't want to limit this to just UTF-8 vs ASCII, since I think issue with code that handles ASN.1 fields and other code working with Base64. So, what are the commonly confused encodings that may result in identical test data?

Read the article

How do you get Matlab to write the BOM (byte order markers) for UTF-16 text files?

- by Richard Povinelli

I am creating UTF16 text files with Matlab, which I am later reading in using Java. In Matlab, I open a file called fileName and write to it as follows: fid = fopen(fileName, 'w','n','UTF16-LE'); fprintf(fid,"Some stuff."); In Java, I can read the text file using the following code: FileInputStream fileInputStream = new FileInputStream(fileName); Scanner scanner = new Scanner(fileInputStream, "UTF-16LE"); String s = scanner.nextLine(); Here is the hex output: Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 00000000 73 00 6F 00 6D 00 65 00 20 00 73 00 74 00 75 00 66 00 66 00 s.o.m.e. .s.t.u.f.f. The above approach works fine. But, I want to be able to write out the file using UTF16 with a BOM to give me more flexibility so that I don't have to worry about big or little endian. In Matlab, I've coded: fid = fopen(fileName, 'w','n','UTF16'); fprintf(fid,"Some stuff."); In Java, I change the code to: FileInputStream fileInputStream = new FileInputStream(fileName); Scanner scanner = new Scanner(fileInputStream, "UTF-16"); String s = scanner.nextLine(); In this case, the string s is garbled, because Matlab is not writing the BOM. I can get the Java code to work just fine if I add the BOM manually. With the added BOM, the following file works fine. Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 00000000 FF FE 73 00 6F 00 6D 00 65 00 20 00 73 00 74 00 75 00 66 00 66 00 ÿþs.o.m.e. .s.t.u.f.f. How can I get Matlab to write out the BOM? I know I could write the BOM out separately, but I'd rather have Matlab do it automatically. Addendum I selected the answer below from Amro because it exactly solves the question I posed. One key discovery for me was the difference between the Unicode Standard and a UTF (Unicode transformation format) (see http://unicode.org/faq/utf_bom.html). The Unicode Standard provides unique identifiers (code points) for characters. UTFs provide mappings of every code point "to a unique byte sequence." Since all but a handful of the characters I am using are in the first 128 code points, I'm going to switch to using UTF-8 as Romeo suggests. UTF-8 is supported by Matlab (The warning shown below won't need to be suppressed.) and Java, and for my application will generate smaller text files. I suppress the Matlab warning Warning: The encoding 'UTF-16LE' is not supported. with warning off MATLAB:iofun:UnsupportedEncoding;

Read the article

How to test an application for correct encoding (e.g. UTF-8)

- by Olaf

Encoding issues are among the one topic that have bitten me most often during development. Every platform insists on its own encoding, most likely some non-UTF-8 defaults are in the game. (I'm usually working on Linux, defaulting to UTF-8, my colleagues mostly work on german Windows, defaulting to ISO-8859-1 or some similar windows codepage) I believe, that UTF-8 is a suitable standard for developing an i18nable application. However, in my experience encoding bugs are usually discovered late (even though I'm located in Germany and we have some special characters that along with ISO-8859-1 provide some detectable differences). I believe that those developers with a completely non-ASCII character set (or those that know a language that uses such a character set) are getting a head start in providing test data. But there must be a way to ease this for the rest of us as well. What [technique|tool|incentive] are people here using? How do you get your co-developers to care for these issues? How do you test for compliance? Are those tests conducted manually or automatically? Adding one possible answer upfront: I've recently discovered fliptitle.com (they are providing an easy way to get weird characters written "u?op ?pisdn" *) and I'm planning on using them to provide easily verifiable UTF-8 character strings (as most of the characters used there are at some weird binary encoding position) but there surely must be more systematic tests, patterns or techniques for ensuring UTF-8 compatibility/usage. Note: Even though there's an accepted answer, I'd like to know of more techniques and patterns if there are some. Please add more answers if you have more ideas. And it has not been easy choosing only one answer for acceptance. I've chosen the regexp answer for the least expected angle to tackle the problem although there would be reasons to choose other answers as well. Too bad only one answer can be accepted. Thank you for your input. *) that's "upside down" written "upside down" for those that cannot see those characters due to font problems

Read the article

Does Process.StartInfo.Arguments support a UTF-8 string?

- by Patrick Klug

Can you use a UTF-8 string as the Arguments for a StartInfo? I am trying to pass a UTF-8 (in this case a Japanese string) to an application as a console argument. Something like this (this is just an example! (cmd.exe would be a custom app)) var process = new System.Diagnostics.Process(); process.StartInfo.Arguments = "/K \"echo ????????\""; process.StartInfo.FileName = "cmd.exe"; process.StartInfo.UseShellExecute = true; process.Start(); process.WaitForExit(); Executing this seems to loose the UTF-8 string and all the target application sees is "echo ?????????" When executing this command directly on the command line (by pasting the arguments) the target application receives the string correctly even though the command line itself doesn't seem to display it correctly. Do I need to do anything special to enable UTF-8 support in the arguments or is this just not supported?

Read the article

How to force javax xslt transformer to encode entities in utf-8?

- by calavera.info

I'm working on filter that should transform an output with some stylesheet. Important sections of code looks like this: PrintWriter out = response.getWriter(); ... StringReader sr = new StringReader(content); Source xmlSource = new StreamSource(sr, requestSystemId); transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); transformer.setParameter("encoding", "UTF-8"); //same result when using ByteArrayOutputStream xo = new java.io.ByteArrayOutputStream(); StringWriter xo = new StringWriter(); StreamResult result = new StreamResult(xo); transformer.transform(xmlSource, result); out.write(xo.toString()); The problem is that national characters are encoded as html entities and not by using UTF. Is there any way to force transformer to use UTF-8 instead of entities?

Read the article

Best way to convert a Unicode URL to ASCII (UTF-8 percent-escaped) in Python?

- by benhoyt

I'm wondering what's the best way -- or if there's a simple way with the standard library -- to convert a URL with Unicode chars in the domain name and path to the equivalent ASCII URL, encoded with domain as IDNA and the path %-encoded, as per RFC 3986. I get from the user a URL in UTF-8. So if they've typed in http://?.ws/? I get 'http://\xe2\x9e\xa1.ws/\xe2\x99\xa5' in Python. And what I want out is the ASCII version: 'http://xn--hgi.ws/%E2%99%A5'. What I do at the moment is split the URL up into parts via a regex, and then manually IDNA-encode the domain, and separately encode the path and query string with different urllib.quote() calls. # url is UTF-8 here, eg: url = u'http://?.ws/?'.encode('utf-8') match = re.match(r'([a-z]{3,5})://(.+\.[a-z0-9]{1,6})' r'(:\d{1,5})?(/.*?)(\?.*)?$', url, flags=re.I) if not match: raise BadURLException(url) protocol, domain, port, path, query = match.groups() try: domain = unicode(domain, 'utf-8') except UnicodeDecodeError: return '' # bad UTF-8 chars in domain domain = domain.encode('idna') if port is None: port = '' path = urllib.quote(path) if query is None: query = '' else: query = urllib.quote(query, safe='=&?/') url = protocol + '://' + domain + port + path + query # url is ASCII here, eg: url = 'http://xn--hgi.ws/%E3%89%8C' Is this correct? Any better suggestions? Is there a simple standard-library function to do this?

Read the article

Is there a standard literal constant that I can use instead of "utf-8" in C# (.Net 3.5)?

- by Hamish Grubijan

Hi, I would like to find a better way to do this: XmlNode nodeXML = xmlDoc.AppendChild( xmlDoc.CreateXmlDeclaration( "1.0", "utf-8", String.Empty) ); I do not want to think about "utf-8" vs "UTF-8" vs "UTF8" vs "utf8" as I type code. I would like to make my code less prone to typos. I am sure that some standard library has declatred "utf-8" as a const / readonly string. How can I find it? Also, what about "1.0"? I am assuming that major XML versions have been enumerated somewhere as well. Thanks!

Read the article

How to convert any possible format to UTF-8 using Iconv?

- by Ole Jak

so for example this will turn 1251 into utf-8. $utf8 = iconv('windows-1251', 'utf-8', $ansi); But how to turn unknown (when it comes to us we do not know yet what format it is) ( in general any ) format (possibly known by Iconv ) to utf-8?

Read the article

C++ iterate or split utf-8 string into array of symbols?

- by topright

Searching for a platform- and 3rd-party-library- independent way of iterating utf-8 string or splitting it into array of utf-8 symbols.

Read the article

PHP: is urlencode() a safe way to allow valid UTF-8 strings in the URL?

- by Xeoncross

I have user submitted tags that can be any type of (valid) UTF-8 string. I want to know if it is safe to include them in the URL merly by running them through urlencode(). In other words, is urlencode() safe to use for valid UTF-8 strings? (by valid I mean id have already force-encoded them to UTF-8)

Read the article

SFML title bar with weird characters when using UTF-8

- by TheOm3ga

(Previously asked at http://stackoverflow.com/questions/4922478/sfml-title-bar-with-weird-characters-when-using-utf-8) I've just started using SFML and one of the first problems I've come across is some weird characters on the the titlebar whenever I try to use accents or any other extended char. For instance, I've got: sf::RenderWindow Ventana(sf::VideoMode(800, 600, 32), "Año nuevóóó"); And the titlebar renders like AÂ+o nuevoA³A³A³ This ONLY HAPPENS if my source code file is enconded in UTF-8. If I change the file encoding to ISO-8859-1, it shows properly. Obviously all of my files use UTF-8, as its the system-wide encoding. I'm using GCC under Ubuntu GNU/Linux. I've tried using the different utilities in sf::Unicode to adapt the text, but none of them seems to work.

Read the article

How can I convert a bunch of files from ISO-8859-1 to UTF-8 using Perl?

- by tau

I have several documents I need to convert from ISO-8859-1 to UTF-8 (without the BOM of course). This is the issue though. I have so many of these documents (it is actually a mix of documents, some UTF-8 and some ISO-8859-1) that I need an automated way of converting them. Unfortunately I only have ActivePerl installed and don't know much about encoding in that language. I may be able to install PHP, but I am not sure as this is not my personal computer. Just so you know, I use Scite or Notepad++, but both do not convert correctly. For example, if I open a document in Czech that contains the character "ž" and go to the "Convert to UTF-8" option in Notepad++, it incorrectly converts it to an unreadable character. There is a way I CAN convert them, but it is tedious. If I open the document with the special characters and copy the document to Windows clipboard, then paste it into a UTF-8 document and save it, it is okay. This is too tedious (opening every file and copying/pasting into a new document) for the amount of documents I have. Any ideas? Thanks!!!

Read the article

UTF-8 MySQL and Charset, pls help me understand this once and for all!

- by FFish

Can someone explain me when I set everything to UTF-8 I keep getting those damn ??? MySQL Server version: 5.1.44 MySQL charset: UTF-8 Unicode (utf8) I create a new database name: utf8test collation: utf8_general_ci MySQL connection collation: utf8_general_ci My SQL looks like this: SET SQL_MODE="NO_AUTO_VALUE_ON_ZERO"; CREATE TABLE IF NOT EXISTS `test_table` ( `test_id` int(11) NOT NULL, `test_text` text NOT NULL, PRIMARY KEY (`test_id`) ) ENGINE=MyISAM DEFAULT CHARSET=utf8; INSERT INTO `test_table` (`test_id`, `test_text`) VALUES (1, 'hééélo'), (2, 'wööörld'); My PHP / HTML: <?php $db_conn = mysql_connect("localhost", "root", "") or die("Can't connect to db"); mysql_select_db("utf8test", $db_conn) or die("Can't select db"); // $result = mysql_query("set names 'utf8'"); // this works... why?? $query = "SELECT * FROM test_table"; $result = mysql_query($query); $output = ""; while($row = mysql_fetch_assoc($result)) { $output .= "id: " . $row['test_id'] . " - text: " . $row['test_text'] . "<br />"; } ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html lang="it" xmlns="http://www.w3.org/1999/xhtml" xml:lang="it"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>UTF-8 test</title> </head> <body> <?php echo $output; ?> </body> </html>

Read the article

How can I decode UTF-16 data in Perl when I don't know the byte order?

- by Geo

If I open a file ( and specify an encoding directly ) : open(my $file,"<:encoding(UTF-16)","some.file") || die "error $!\n"; while(<$file>) { print "$_\n"; } close($file); I can read the file contents nicely. However, if I do: use Encode; open(my $file,"some.file") || die "error $!\n"; while(<$file>) { print decode("UTF-16",$_); } close($file); I get the following error: UTF-16:Unrecognised BOM d at F:/Perl/lib/Encode.pm line 174 How can I make it work with decode?

Read the article

How can I decode UTF-16 data in Perl?

- by Geo

If I open a file ( and specify an encoding directly ) : open(my $file,"<:encoding(UTF-16)","some.file") || die "error $!\n"; while(<$file>) { print "$_\n"; } close($file); I can read the file contents nicely. However, if I do: use Encode; open(my $file,"some.file") || die "error $!\n"; while(<$file>) { print decode("UTF-16",$_); } close($file); I get the following error: UTF-16:Unrecognised BOM d at F:/Perl/lib/Encode.pm line 174 How can I make it work with decode?

Read the article

the characters except 0x00-0x7F are not been shown when converted to "UTF-8" from "ISO-8859-1"

- by Mike.Huang

I need to get a string from URL request of brower, and then create a text image by requested text. I know the default encoding of the Java net transmission is "ISO-8859-1", it can works normally with all characters what defined in "ISO-8859-1". But when I request a multi-byte Unicode character (e.g. chinese or something like ¤?), then I need to decode it by "UTF-8" from "ISO-8859-1". My codes like: String reslut = new String(requestString.getBytes("ISO-8859-1"), "UTF-8"); Everything is fine, but I found some characters in ISO-8859-1 are not been shown now, which characters are 0x80 - 0xFF(defined in" ISO-8859-1"), i.e. the characters except 0x00-0x7F are not been shown when converted to "UTF-8" from "ISO-8859-1" Any other method can solve this query?

Read the article

using C# how to convert iso8859-1 encoded text files that contain Latin-1 accented characters to utf

- by Tim

I am being sent text files saved in iso88591-1 format that contain accented characters from the Latin-1 range (as well as normal ASCII a-z etc). How to convert these files to utf-8 using C# so that the single-byte accented characters in iso8859-1 become valid utf-8 characters? I have tried to use a StreamReader with ASCIIEncoding, and then converting the ascii string to UTF-8 by instantiating an ascii encoding and a utf8 encoding and then using Encoding.Convert(ascii, utf8, ascii.GetBytes( asciiString) ) — but the accented characters are being rendered as question marks. What step am I missing? Thanks

Read the article

How do I obtain a code point integer from a 1 to 4 byte UTF-8 encoded sequence in Windows?

- by Patrick Niedzielski

Hello, I am Patrick Niedzielski, a programmer for the Free Software 3D adventure game Humm and Strumm. I'm working on a minimal Unicode character class in C++. I currently have an array of four bytes representing a UTF-8 sequence. On GNU/Linux, I can just convert to UTF-32 with iconv(), but on Windows, I cannot do this. Is it possible to convert the array to a single code point? Thanks, Patrick

Read the article

Can base64 encoding applied to multibyte utf-8 characters ?

- by cppdev

Can base64 encoding applied to multibyte utf-8 characters ? How base64 encoded string is converted back to multibyte utf-8 string ?

Read the article

delphi vs c# post returns different strings - utf problem?

- by argh

I'm posting two forms - one in c# and one in delphi. But the result string seems to be different: c# returns: ¤@@1@@@@1@@@@1@@xsmË±Â0Ð... delphi returns: #$1E'@@1@@@@1@@@@1@@x'#$009C... and sice both are compressed streams I'm getting errors while trying to decompress it... The C# is 'correct' - ie. extracts. I'm not an expert on delphi - I just need to convert some piece of code from c# to delphi. c# code: string GetData(Hashtable aParam, string ServerURL) { string Result = ""; WebRequest Request = HttpWebRequest.Create(ServerURL); Request.Method = "POST"; Request.ContentType = "application/x-www-form-urlencoded; charset=UTF-8"; UTF8Encoding encUTF8 = new System.Text.UTF8Encoding(false); StreamWriter writer = new StreamWriter(Request.GetRequestStream(), encUTF8); foreach (DictionaryEntry element in aParam) { writer.Write(element.Key + "=" + element.Value + "&"); } writer.Close(); writer.Dispose(); WebResponse Response = Request.GetResponse(); StreamReader Reader = new StreamReader(Response.GetResponseStream(), System.Text.Encoding.Default); Result = Reader.ReadToEnd(); Reader.Close(); Response.Close(); Reader.Dispose(); return Result; } delphi code: function GetData(aParam:TStringList; ServerURL:string):string; var req: TIdHTTP; res: string; begin req := TIdHTTP.Create(); with req do begin Request.ContentType := 'application/x-www-form-urlencoded; charset=UTF-8'; Request.Method := 'POST'; Request.CharSet := 'utf-8'; Request.AcceptCharSet := 'utf-8'; res := Post(ServerURL, aParam); end; Result := res; req.Free; end; -edit- I'm using delphi 2010

Read the article

How can I convert a large ANSI text file to UTF-8?

- by mjustin

For a database migration I need a tool which can convert a 1 GB file from Ansi codepage to UTF8 on Windows. Maybe I can use sed (given that I know all search and replace values), but is there something included in GNU or Windows tools?

Read the article

Can PuTTY be configured to display the following UTF-8 characters?

- by Stuart Powers

I'd like to be able to render the characters as seen in this tweet: I saved the tweet's JSON data and wrote a one-liner python script for testing. python -c 'import json,urllib; print json.load(urllib.urlopen("http://c.sente.cc/BUCq/tweet.json"))["text"]' This next image shows the output of this command on two different putty sessions, one with Bitstream Vera Sans Mono font and the other is using Courier New: Next is an example of correct output (I wasn't using PuTTY): The original JSON is at this link using Twitter's API. How can I get PuTTY to display those characters?

Read the article

How to make emacs accept UTF-8 from the keyboard

- by Brent.Longborough

My friends have persuaded me to "try again" (about the 5th time in about 12 years) with emacs. I'm currently suffering a little, and need help with emacs + utf-8. I'm running the 23.3.1 emacs gui on Windows 7 with my own custom keyboard layout (built with MS Keyboard Layout Creator). The layout has a full ISO-8859-1 (Latin-1) character set, plus some additional characters from ISO-8859-9 (Latin-5, gis etc for Turkish) and w for Welsh (don't know where that one lives). In my .emacs, I have (blindly) added these lines: ;; key board / input method settings (setq locale-coding-system 'utf-8) (set-terminal-coding-system 'utf-8) (set-keyboard-coding-system 'utf-8) (set-language-environment 'UTF-8) ; prefer utf-8 for language settings Now, when I enter characters from ISO Latin-1 from the keyboard, they are accepted without problems, but characters from outside Latin-1 are "translated" to an approximate character in Latin-1. Thus, for example, Latin-5 "g" gets converted to a plain "g". Cutting and pasting, however, work fine. Can anyone tell me what I'm doing wrong? I should like to make everything I do with emacs utf-8 with BOM.

Search Results

Search found 4604 results on 185 pages for 'utf'.

Page 7/185 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >

- by user1334485

- by AJPerez

- by makerofthings7

- by Richard Povinelli

- by Olaf

- by Patrick Klug

- by calavera.info

- by benhoyt

- by Hamish Grubijan

- by Ole Jak

- by topright

- by Xeoncross

- by TheOm3ga

- by tau

- by FFish

- by Geo

- by Geo

- by Mike.Huang

- by Tim

- by Patrick Niedzielski

- by cppdev

- by argh

- by mjustin

- by Stuart Powers

- by Brent.Longborough

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14 | Next Page >