Search Results

Search found 4604 results on 185 pages for 'utf 8'.

Page 5/185 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

SQLite, python, unicode, and non-utf data

- by Nathan Spears

I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur RÃ³s' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

Read the article
SQLite, python, unicode, and non-utf data

- by Nathan Spears

I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur RÃ³s' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

Read the article
Print UTF-8 with VBA

- by Karsten W.

Hello, how can I write UTF-8 encoded strings to a textfile from vba, like Dim fnum As Integer fnum = FreeFile Open "myfile.txt" For Output As fnum Print #fnum, "special characters: äöüß" 'latin-1 or something by default Close fnum Is there some setting on Application level?

Read the article
UTF-8 BOM Problem

- by jsonx

Hi, I am using Komodo Edit as an IDE. I have to encode some files as UTF-8 without BOM in Komodo. In my localhost and site there is no problem but on some sites i am seeing BOM sign and this is a terrible problem for AJAX-JSON response. Any advices? Thanks.

Read the article
Saving FileSystemObject as UTF

- by bob

how can i save the file in utf-8? Dim FSO, File Set FSO = Server.CreateObject("Scripting.FileSystemObject") Set File = FSO.OpenTextFile(Path,2,true,-1) File.Write(xml1) File.Close Set File = Nothing Set FSO = Nothing

Read the article
Parsing a UTF-16 encoded xml file in ruby

- by Matthew Toohey

Hello I've been trying to parse a UTF-16 encoded xml file in Ruby (1.8.7), and I can't seem to find how to do it by searching (google and stack overflow) Here's the xml file url: http://www.abc.net.au/triplej/feeds/playout/triplejsydneyplayout.xml?_5366 Getting the xml string from Net::HTTP and passing it to REXML, then calling logger.info xmlDoc.inspect produces: <UNDEFINED> ... </> Any ideas? Cheers

Read the article
Listings in Latex with UTF-8 (or at least german umlauts)

- by Janosch

Trying to include a source-file into my latex document using the listings package, i got problems with german umlauts inside of the comments in the code. Using \lstset{ extendedchars=\true, inputencoding=utf8x } Umlauts in the source files (encoded in UTF-8 without BOM) are processed, but they are somehow moved to the beginning of the word they are contained in. So // die Größe muss berücksichtigt werden in the input source file, becomes // die ößGre muss übercksichtigt werden in the output file. NOTE: since i found errors in my initial setup, i heavily edited this question

Read the article
Save all files in Visual Studio project as UTF-8

- by jesperlind

I wonder if it's possible to save all files in a Visual Studio 2008 project into a specific character encoding. I got a solution with mixed encodings and I want to make them all the same (UTF-8 with signature). I know how to save single files, but how about all files in a project?

Read the article
C++ simple logging class with UTF-8 output [code example]

- by Andrew

Hello everyone, I was working on one of my academic projects and for the first time I needed pure C++ without GUI. After googling for a while, I did not find any simple and easy to use implementation for logging and created my own. This is a simple implementation with iostreams that logs messages to screen and to the file simultaneously. I was thinking of using templates but then I realized that I do not expect any changes and removed that. It is modified std::wostream with two added modifiers: 1. TimeStamp - prints time-stamp 2. LogMode(LogModes) - switches output: file only, screen only, file+screen. *Boost::utf8_codecvt_facet* is used for UTF-8 output. // ############################################################################ // # Name: MyLog.h # // # Purpose: Logging Class Header # // # Author: Andrew Drach # // # Modified by: <somebody> # // # Created: 03/21/10 # // # SVN-ID: $Id$ # // # Copyright: (c) 2010 Andrew Drach # // # Licence: <license> # // ############################################################################ #ifndef INCLUDED_MYLOG_H #define INCLUDED_MYLOG_H // headers -------------------------------------------------------------------- #include <string> #include <iostream> #include <fstream> #include <exception> #include <boost/program_options/detail/utf8_codecvt_facet.hpp> using namespace std; // definitions ---------------------------------------------------------------- // ---------------------------------------------------------------------------- // DblBuf class // Splits up output stream into two // Inspired by http://wordaligned.org/articles/cpp-streambufs // ---------------------------------------------------------------------------- class DblBuf : public wstreambuf { private: // private member declarations DblBuf(); wstreambuf *bf1; wstreambuf *bf2; virtual int_type overflow(int_type ch) { int_type eof = traits_type::eof(); int_type not_eof = !eof; if ( traits_type::eq_int_type(ch,eof) ) return not_eof; else { char_type ch1 = traits_type::to_char_type(ch); int_type r1( bf1on ? bf1->sputc(ch1) : not_eof ); int_type r2( bf2on ? bf2->sputc(ch1) : not_eof ); return (traits_type::eq_int_type(r1,eof) || traits_type::eq_int_type(r2,eof) ) ? eof : ch; } } virtual int sync() { int r1( bf1on ? bf1->pubsync() : NULL ); int r2( bf2on ? bf2->pubsync() : NULL ); return (r1 == 0 && r2 == 0) ? 0 : -1; } public: // public member declarations explicit DblBuf(wstreambuf *bf1, wstreambuf *bf2) : bf1(bf1), bf2(bf2) { if (bf1) bf1on = true; else bf1on = false; if (bf2) bf2on = true; else bf2on = false; } bool bf1on; bool bf2on; }; // ---------------------------------------------------------------------------- // logstream class // Wrapper for a standard wostream with access to modified buffer // ---------------------------------------------------------------------------- class logstream : public wostream { private: // private member declarations logstream(); public: // public member declarations DblBuf *buf; explicit logstream(wstreambuf *StrBuf, bool isStd = false) : wostream(StrBuf, isStd), buf((DblBuf*)StrBuf) {} }; // ---------------------------------------------------------------------------- // Logging mode Class // ---------------------------------------------------------------------------- enum LogModes{LogToFile=1, LogToScreen, LogToBoth}; class LogMode { private: // private member declarations LogMode(); short mode; public: // public member declarations LogMode(short mode1) : mode(mode1) {} logstream& operator()(logstream &stream1) { switch(mode) { case LogToFile: stream1.buf->bf1on = true; stream1.buf->bf2on = false; break; case LogToScreen: stream1.buf->bf1on = false; stream1.buf->bf2on = true; break; case LogToBoth: stream1.buf->bf1on = true; stream1.buf->bf2on = true; } return stream1; } }; logstream& operator<<(logstream &out, LogMode mode) { return mode(out); } wostream& TimeStamp1(wostream &out1) { time_t time1; struct tm timeinfo; wchar_t timestr[512]; // Get current time and convert it to a string time(&time1); localtime_s (&timeinfo, &time1); wcsftime(timestr, 512,L"[%Y-%b-%d %H:%M:%S %p] ",&timeinfo); return out1 << timestr; } // ---------------------------------------------------------------------------- // MyLog class // Logs events to both file and screen // ---------------------------------------------------------------------------- class MyLog { private: // private member declarations MyLog(); auto_ptr<DblBuf> buf; string mErrorMsg1; string mErrorMsg2; string mErrorMsg3; string mErrorMsg4; public: // public member declarations explicit MyLog(string FileName1, wostream *ScrLog1, locale utf8locale1); ~MyLog(); void NewEvent(wstring str1, bool TimeStamp = true); string FileName; wostream *ScrLog; wofstream File; auto_ptr<logstream> Log; locale utf8locale; }; // ---------------------------------------------------------------------------- // MyLog constructor // ---------------------------------------------------------------------------- MyLog::MyLog(string FileName1, wostream *ScrLog1, locale utf8locale1) : // ctors mErrorMsg1("Failed to open file for application logging! []"), mErrorMsg2("Failed to write BOM! []"), mErrorMsg3("Failed to write to file! []"), mErrorMsg4("Failed to close file! []"), FileName(FileName1), ScrLog(ScrLog1), utf8locale(utf8locale1), File(FileName1.c_str()) { // Adjust error strings mErrorMsg1.insert(mErrorMsg1.length()-1,FileName1); mErrorMsg2.insert(mErrorMsg2.length()-1,FileName1); mErrorMsg3.insert(mErrorMsg3.length()-1,FileName1); mErrorMsg4.insert(mErrorMsg4.length()-1,FileName1); // check for file open errors if ( !File ) throw ofstream::failure(mErrorMsg1); // write UTF-8 BOM File << wchar_t(0xEF) << wchar_t(0xBB) << wchar_t(0xBF); // switch locale to UTF-8 File.imbue(utf8locale); // check for write errors if ( File.bad() ) throw ofstream::failure(mErrorMsg2); buf.reset( new DblBuf(File.rdbuf(),ScrLog->rdbuf()) ); Log.reset( new logstream(&*buf) ); } // ---------------------------------------------------------------------------- // MyLog destructor // ---------------------------------------------------------------------------- MyLog::~MyLog() { *Log << TimeStamp1 << "Log finished." << endl; // clean up objects Log.reset(); buf.reset(); File.close(); // check for file close errors if ( File.bad() ) throw ofstream::failure(mErrorMsg4); } //--------------------------------------------------------------------------- #endif // INCLUDED_MYLOG_H Tested on MSVC 2008, boost 1.42. I do not know if this is the right place to share it. Hope it helps anybody. Feel free to make it better.

Read the article
Jena result in UTF-8 format

- by john

How can I get in Jena (Java language) result in UTF-8 format? My code: Query query= QueryFactory.create(queryString); QueryExecution qexec= QueryExecutionFactory.sparqlService("http://lod.openlinksw.com/sparql", queryString); ResultSet results = qexec.execSelect(); List<QuerySolution> list = ResultSetFormatter.toList(results); System.out.println(list.get(i).get("churchname"));

Read the article
utf-8 word boundary regex in javascript

- by cherouvim

In JavaScript: "ab abc cab ab ab".replace(/\bab\b/g, "AB"); correctly gives me: "AB abc cab AB AB" When I use utf-8 characters though: "aß aß? ?aß aß aß".replace(/\baß\b/g, "AB"); the word boundary operator doesn't seem to work: "aß aß? ?aß aß aß" Is there a solution to this?

Read the article
php validation function for name that allow UTF-8 characters plus - (minus) and space

- by REALSOFO

Hello everybody, Please help me with a function that validate an input string to allow: 1) UTF-8 characters (ex: staîâ) ; 2) space ; 3) minus symbol(-) String cannot start or end with space or minus. Thanks!

Read the article
Windows to Linux utf-8 file

- by Teneke

I have a file UTF-8 encoding in windows, and when i use it under windows it shows everithing right, but when i copy the file in Linux, the Unicode characters are giberish. The file is plain textfile. How can i get this file to be readable in linux, or how can i copy it properly?? thanks in advance

Read the article
How ensure if java program uses UTF-8 encoding

- by Nayn

Hi, I recently discovered that relying on default encoding of JVM causes bugs. I should explicitly use specific encoding ex. UTF-8 while working with String, InputStynreams etc. I have a huge codebase to scan for ensuring this. Could somebody suggest me some simpler way to check this than searching the whole codebase. Thanks Nayn

Read the article
C++ std::string and UTF-8

- by poiloi

Hello, I just want to write some few simple lines to a text file in C++, but I want them to be encoded in UTF-8. What is the easiest and simple way to do so? Thanks

Read the article
How does CouchDB handle UTF-8?

- by borghe

I'm quite puzzled by CouchDB: if I send a PUT request with some JSON string fields encoded as UTF-8, the non 7 bit ASCII characters get converted to the "\uXXXX" escape sequence. Is there any way to tell it not to escape UNICODE?

Read the article
Windows-1251 file inside UTF-8 site?

- by Spoonk

Hello everyone Masters Of Web Delevopment :) I have a piece of PHP script that fetches last 10 played songs from my winamp. This script is inside file (lets call it "lastplayed.php") which is included in my site with php include function inside a "div". My site is on UTF-8 encoding. The problem is that some songs titles are in Windows-1251 encoding. And in my site they displays like "??????"... Is there any known way to tell to this div with included "lastplayed.php" in it, to be with windows-1251 encoding? Or any other suggestions? P.S: The file with fetching script a.k.a. "lastplayed.php", is converted to UTF-8. But if it is ANCII it's the same result. I try to put and meta tag with windows-1251 between head tag but nothing happens again. P.P.S: Script that fetches the Winamp's data (lastplayed.php): <?php /****** * You may use and/or modify this script as long as you: * 1. Keep my name & webpage mentioned * 2. Don't use it for commercial purposes * * If you want to use this script without complying to the rules above, please contact me first at: [email protected] * * Author: Martijn Korse * Website: http://devshed.excudo.net * * Date: 08-05-2006 ***/ /** * version 2.0 */ class Radio { var $fields = array(); var $fieldsDefaults = array("Server Status", "Stream Status", "Listener Peak", "Average Listen Time", "Stream Title", "Content Type", "Stream Genre", "Stream URL", "Current Song"); var $very_first_str; var $domain, $port, $path; var $errno, $errstr; var $trackLists = array(); var $isShoutcast; var $nonShoutcastData = array( "Server Status" => "n/a", "Stream Status" => "n/a", "Listener Peak" => "n/a", "Average Listen Time" => "n/a", "Stream Title" => "n/a", "Content Type" => "n/a", "Stream Genre" => "n/a", "Stream URL" => "n/a", "Stream AIM" => "n/a", "Stream IRC" => "n/a", "Current Song" => "n/a" ); var $altServer = False; function Radio($url) { $parsed_url = parse_url($url); $this->domain = isset($parsed_url['host']) ? $parsed_url['host'] : ""; $this->port = !isset($parsed_url['port']) || empty($parsed_url['port']) ? "80" : $parsed_url['port']; $this->path = empty($parsed_url['path']) ? "/" : $parsed_url['path']; if (empty($this->domain)) { $this->domain = $this->path; $this->path = ""; } $this->setOffset("Current Stream Information"); $this->setFields(); // setting default fields $this->setTableStart("<table border=0 cellpadding=2 cellspacing=2>"); $this->setTableEnd("</table>"); } function setFields($array=False) { if (!$array) $this->fields = $this->fieldsDefaults; else $this->fields = $array; } function setOffset($string) { $this->very_first_str = $string; } function setTableStart($string) { $this->tableStart = $string; } function setTableEnd($string) { $this->tableEnd = $string; } function getHTML($page=False) { if (!$page) $page = $this->path; $contents = ""; $domain = (substr($this->domain, 0, 7) == "http://") ? substr($this->domain, 7) : $this->domain; if (@$fp = fsockopen($domain, $this->port, $this->errno, $this->errstr, 2)) { fputs($fp, "GET ".$page." HTTP/1.1\r\n". "User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows 98)\r\n". "Accept: */*\r\n". "Host: ".$domain."\r\n\r\n"); $c = 0; while (!feof($fp) && $c <= 20) { $contents .= fgets($fp, 4096); $c++; } fclose ($fp); preg_match("/(Content-Type:)(.*)/i", $contents, $matches); if (count($matches) > 0) { $contentType = trim($matches[2]); if ($contentType == "text/html") { $this->isShoutcast = True; return $contents; } else { $this->isShoutcast = False; $htmlContent = substr($contents, 0, strpos($contents, "\r\n\r\n")); $dataStr = str_replace("\r", "\n", str_replace("\r\n", "\n", $contents)); $lines = explode("\n", $dataStr); foreach ($lines AS $line) { if ($dp = strpos($line, ":")) { $key = substr($line, 0, $dp); $value = trim(substr($line, ($dp+1))); if (preg_match("/genre/i", $key)) $this->nonShoutcastData['Stream Genre'] = $value; if (preg_match("/name/i", $key)) $this->nonShoutcastData['Stream Title'] = $value; if (preg_match("/url/i", $key)) $this->nonShoutcastData['Stream URL'] = $value; if (preg_match("/content-type/i", $key)) $this->nonShoutcastData['Content Type'] = $value; if (preg_match("/icy-br/i", $key)) $this->nonShoutcastData['Stream Status'] = "Stream is up at ".$value."kbps"; if (preg_match("/icy-notice2/i", $key)) { $this->nonShoutcastData['Server Status'] = "This is <span style=\"color: red;\">not</span> a Shoutcast server!"; if (preg_match("/ultravox/i", $value)) $this->nonShoutcastData['Server Status'] .= " But an <a href=\"http://ultravox.aol.com/\" target=\"_blank\">Ultravox</a> Server"; $this->altServer = $value; } } } return nl2br($htmlContent); } } else return $contents; } else { return False; } } function getServerInfo($display_array=null, $very_first_str=null) { if (!isset($display_array)) $display_array = $this->fields; if (!isset($very_first_str)) $very_first_str = $this->very_first_str; if ($html = $this->getHTML()) { // parsing the contents $data = array(); foreach ($display_array AS $key => $item) { if ($this->isShoutcast) { $very_first_pos = stripos($html, $very_first_str); $first_pos = stripos($html, $item, $very_first_pos); $line_start = strpos($html, "<td>", $first_pos); $line_end = strpos($html, "</td>", $line_start) + 4; $difference = $line_end - $line_start; $line = substr($html, $line_start, $difference); $data[$key] = strip_tags($line); } else { $data[$key] = $this->nonShoutcastData[$item]; } } return $data; } else { return $this->errstr." (".$this->errno.")"; } } function createHistoryArray($page) { if (!in_array($page, $this->trackLists)) { $this->trackLists[] = $page; if ($html = $this->getHTML($page)) { $fromPos = stripos($html, $this->tableStart); $toPos = stripos($html, $this->tableEnd, $fromPos); $tableData = substr($html, $fromPos, ($toPos-$fromPos)); $lines = explode("</tr><tr>", $tableData); $tracks = array(); $c = 0; foreach ($lines AS $line) { $info = explode ("</td><td>", $line); $time = trim(strip_tags($info[0])); if (substr($time, 0, 9) != "Copyright" && !preg_match("/Tag Loomis, Tom Pepper and Justin Frankel/i", $info[1])) { $this->tracks[$c]['time'] = $time; $this->tracks[$c++]['track'] = trim(strip_tags($info[1])); } } if (count($this->tracks) > 0) { unset($this->tracks[0]); if (isset($this->tracks[1])) $this->tracks[1]['track'] = str_replace("Current Song", "", $this->tracks[1]['track']); } } else { $this->tracks[0] = array("time"=>$this->errno, "track"=>$this->errstr); } } } function getHistoryArray($page="/played.html") { if (!in_array($page, $this->trackLists)) $this->createHistoryArray($page); return $this->tracks; } function getHistoryTable($page="/played.html", $trackColText=False, $class=False) { $title_utf8 = mb_convert_encoding($trackArr ,"utf-8" ,"auto"); if (!in_array($page, $this->trackLists)) $this->createHistoryArray($page); if ($trackColText) $output .= " <div class='lastplayed_top'></div> <div".($class ? " class=\"".$class."\"" : "").">"; foreach ($this->tracks AS $title_utf8) $output .= "<div style='padding:2px 0;'>".$title_utf8['track']."</div>"; $output .= "</div><div class='lastplayed_bottom'></div> <div class='lastplayed_title'>".$trackColText."</div> \n"; return $output; } } // this is needed for those with a php version < 5 // the function is copied from the user comments @ php.net (http://nl3.php.net/stripos) if (!function_exists("stripos")) { function stripos($haystack, $needle, $offset=0) { return strpos(strtoupper($haystack), strtoupper($needle), $offset); } } ?> And the calling script outside the lastplayed.php: include "lastplayed.php"; $radio = new Radio($ip.":".$port); echo $radio->getHistoryTable("/played.html", "<b>Last played:</b>", "lastplayed_content");

Read the article
pre-commit hook in svn: could not be translated from the native locale to UTF-8

- by Alexandre Moraes

Hi everybody, I have a problem with my pre-commit hook. This hook test if a file is locked when the user commits. When a bad condition happens, it should output that the another user is locking this file or if nobody is locking, it should show "you are not locking this file message (file´s name)". The error happens when the file´s name has some latin character like "ç" and tortoise show me this in the output. Commit failed (details follow): Commit blocked by pre-commit hook (exit code 1) with output: [Erro output could not be translated from the native locale to UTF-8.] Do you know how can I solve this? Thanks, Alexandre My shell script is here: #!/bin/sh REPOS="$1" TXN="$2" export LANG="en_US.UTF-8" /app/svn/hooks/ensure-has-need-lock.pl "$REPOS" "$TXN" if [ $? -ne 0 ]; then exit 1; fi exit 0 And my perl is here: !/usr/bin/env perl #Turn on warnings the best way depending on the Perl version. BEGIN { if ( $] >= 5.006_000) { require warnings; import warnings; } else { $^W = 1; } } use strict; use Carp; &usage unless @ARGV == 2; my $repos = shift; my $txn = shift; my $svnlook = "/usr/local/bin/svnlook"; my $user; my $ok = 1; foreach my $program ($svnlook) { if (-e $program) { unless (-x $program) { warn "$0: required program $program' is not executable, ", "edit $0.\n"; $ok = 0; } } else { warn "$0: required program $program' does not exist, edit $0.\n"; $ok = 0; } } exit 1 unless $ok; unless (-e $repos){ &usage("$0: repository directory $repos' does not exist."); } unless (-d $repos){ &usage("$0: repository directory $repos' is not a directory."); } foreach my $user_tmp (&read_from_process($svnlook, 'author', $repos, '-t', $txn)) { $user = $user_tmp; } my @errors; foreach my $transaction (&read_from_process($svnlook, 'changed', $repos, '-t', $txn)){ if ($transaction =~ /^U. (.*[^\/])$/){ my $file = $1; my $err = 0; foreach my $locks (&read_from_process($svnlook, 'lock', $repos, $file)){ $err = 1; if($locks=~ /Owner: (.*)/){ if($1 != $user){ push @errors, "$file : You are not locking this file!"; } } } if($err==0){ push @errors, "$file : You are not locking this file!"; } } elsif($transaction =~ /^D. (.*[^\/])$/){ my $file = $1; my $tchan = &read_from_process($svnlook, 'lock', $repos, $file); foreach my $locks (&read_from_process($svnlook, 'lock', $repos, $file)){ push @errors, "$1 : cannot delete locked Files"; } } elsif($transaction =~ /^A. (.*[^\/])$/){ my $needs_lock; my $path = $1; foreach my $prop (&read_from_process($svnlook, 'proplist', $repos, '-t', $txn, '--verbose', $path)){ if ($prop =~ /^\s*svn:needs-lock : (\S+)/){ $needs_lock = $1; } } if (not $needs_lock){ push @errors, "$path : svn:needs-lock is not set. Pleas ask TCC for support."; } } } if (@errors) { warn "$0:\n\n", join("\n", @errors), "\n\n"; exit 1; } else { exit 0; } sub usage { warn "@_\n" if @_; die "usage: $0 REPOS TXN-NAME\n"; } sub safe_read_from_pipe { unless (@_) { croak "$0: safe_read_from_pipe passed no arguments.\n"; } print "Running @_\n"; my $pid = open(SAFE_READ, '-|'); unless (defined $pid) { die "$0: cannot fork: $!\n"; } unless ($pid) { open(STDERR, ">&STDOUT") or die "$0: cannot dup STDOUT: $!\n"; exec(@_) or die "$0: cannot exec @_': $!\n"; } my @output; while (<SAFE_READ>) { chomp; push(@output, $_); } close(SAFE_READ); my $result = $?; my $exit = $result >> 8; my $signal = $result & 127; my $cd = $result & 128 ? "with core dump" : ""; if ($signal or $cd) { warn "$0: pipe from @_' failed $cd: exit=$exit signal=$signal\n"; } if (wantarray) { return ($result, @output); } else { return $result; } } sub read_from_process { unless (@_) { croak "$0: read_from_process passed no arguments.\n"; } my ($status, @output) = &safe_read_from_pipe(@_); if ($status) { if (@output) { die "$0: @_' failed with this output:\n", join("\n", @output), "\n"; } else { die "$0: @_' failed with no output.\n"; } } else { return @output; } }

Read the article
UTF-8 formatting in SPARQL

- by john

How can I "say" to SPARQL that ?churchname is in UTF-8 formatting? because response is like:PraÅ¾skÃ½ hrad PREFIX lgv: <http://linkedgeodata.org/vocabulary#> PREFIX abc: <http://dbpedia.org/class/yago/> SELECT ?churchname WHERE { <http://dbpedia.org/resource/Prague> geo:geometry ?gm . ?church a lgv:castle . ?church geo:geometry ?churchgeo . ?church lgv:name ?churchname . FILTER ( bif:st_intersects (?churchgeo,?gm, 10)) } GROUP BY ?churchname ORDER BY ?churchname

Read the article
Convert a Mysql database from latin to UTF-8

- by Matthieu

I am converting a website from ISO to UTF-8, so I need to convert the Mysql database too. On internet, I read various solutions, I don't know wich one to choose. Do I really need to convert my varchar columns to binary, then to utf8 like that : ALTER TABLE t MODIFY col BINARY(150); ALTER TABLE t MODIFY col CHAR(150) CHARACTER SET utf8; This is long to do that for each column, of each table, of each database. I have 10 database, wich 20 tables each, wich around 2 - 3 varchar colums (2 queries each column), this gives me around 1000 queries to write ! How come ? How to do ?

Read the article
Guessing UTF-8 encoding

- by Dervin Thunk

I have a question that may be quite naive, but I feel the need to ask, because I don't really know what is going on. I'm on Ubuntu. Suppose I do echo "t" > test.txt if I then file test.txt I get test.txt:ASCII text If I then do echo "å" > test.txt Then I get test.txt: UTF-8 Unicode text How does that happen? How does file "know" the encoding, or, alternatively, how does it guess it? Thanks.

Read the article
Explanation of JAXB error: Invalid byte 1 of 1-byte UTF-8 sequence

- by Marcus

We're parsing an XML document using JAXB and get this error: [org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence.] at javax.xml.bind.helpers.AbstractUnmarshallerImpl.createUnmarshalException(AbstractUnmarshallerImpl.java:315) What exactly does this mean and how can we resolve this?? We are executing the code as: jaxbContext = JAXBContext.newInstance(Results.class); Unmarshaller unmarshaller = jaxbContext.createUnmarshaller(); unmarshaller.setSchema(getSchema()); results = (Results) unmarshaller.unmarshal(new FileInputStream(inputFile)); Update Issue appears to be due to this "funny" character in the XML file: ¿ Why would this cause such a problem??

Read the article
Python utf-8 decoding issue with hashlib.digest() method

- by Sorw

Hello StackOverflow community, Using Google App Engine, I wrote a keyToSha256() method within a model class (extending db.Model) : class Car(db.Model): def keyToSha256(self): keyhash = hashlib.sha256(str(self.key())).digest() return keyhash When displaying the output (ultimately within a Django template), I get garbled text, for example : ?????_??!`?I?!?;?QeqN??Al?'2 I was expecting something more in line with this : 9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08 Am I missing something important ? Despite reading several guides on ASCII, Unicode, utf-8 and the like, I think I'm still far from mastering the secrets of string encoding/decoding. After browsing StackOverflow and searching for insights via Google, I figured out I should ask the question here. Any idea ? Thanks !

Read the article
Python utf-8, howto align printout

- by Fredrik

Hi, I have a array containing japanese caracters as well as "normal". How do I align the printout of these? #!/usr/bin/python # coding=utf-8 a1=['??', '???', 'trazan', '??', '????'] a2=['dipsy', 'laa-laa', 'banarne', 'po', 'tinky winky'] for i,j in zip(a1,a2): print i.ljust(12),':',j print '-'*8 for i,j in zip(a1,a2): print i,len(i) print j,len(j) Output: ?? : dipsy ??? : laa-laa trazan : banarne ?? : po ???? : tinky winky -------- ?? 6 dipsy 5 ??? 9 laa-laa 7 trazan 6 banarne 7 ?? 6 po 2 ???? 12 tinky winky 11 thanks, //Fredrik

Read the article
problems with UTF-8 encoding in PHP

- by user296516

Hi guys, The characters I am getting from the URL, for example www.mydomain.com/?name=john , were fine, as longs as they were not in Russian. If they were are in Russian, I was getting '????'. So I added $name= iconv("cp1251","utf-8" ,$name); and now it works fine for Russian and English characters, but screws up other languages. :))) For example 'Janis' ( Latvian ) that worked fine before iconv, now turns into 'j??nis'. Any idea if there's some universal encoder that would work with both the Cyrillic languages and not screw up other languages?

Read the article

Search Results

Search found 4604 results on 185 pages for 'utf 8'.

Page 5/185 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Nathan Spears

- by Nathan Spears

- by Karsten W.

- by jsonx

- by bob

- by Matthew Toohey

- by Janosch

- by jesperlind

- by Andrew

- by john

- by cherouvim

- by REALSOFO

- by Teneke

- by Nayn

- by poiloi

- by borghe

- by Spoonk

- by Alexandre Moraes

- by john

- by Matthieu

- by Dervin Thunk

- by Marcus

- by Sorw

- by Fredrik

- by user296516

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >