Search Results

Search found 36172 results on 1447 pages for 'unicode string'.

Page 1/1447 | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • What is the difference between String and string in C#

    - by SAMIR BHOGAYTA
    string : ------ The string type represents a sequence of zero or more Unicode characters. string is an alias for String in the .NET Framework. 'string' is the intrinsic C# datatype, and is an alias for the system provided type "System.String". The C# specification states that as a matter of style the keyword ('string') is preferred over the full system type name (System.String, or String). Although string is a reference type, the equality operators (== and !=) are defined to compare the values of string objects, not references. This makes testing for string equality more intuitive. For example: String : ------ A String object is called immutable (read-only) because its value cannot be modified once it has been created. Methods that appear to modify a String object actually return a new String object that contains the modification. If it is necessary to modify the actual contents of a string-like object Difference between string & String : ---------- ------- ------ - ------ the string is usually used for declaration while String is used for accessing static string methods we can use 'string' do declare fields, properties etc that use the predefined type 'string', since the C# specification tells me this is good style. we can use 'String' to use system-defined methods, such as String.Compare etc. They are originally defined on 'System.String', not 'string'. 'string' is just an alias in this case. we can also use 'String' or 'System.Int32' when communicating with other system, especially if they are CLR-compliant. I.e. - if I get data from elsewhere, I'd deserialize it into a System.Int32 rather than an 'int', if the origin by definition was something else than a C# system.

    Read the article

  • Delphi Unicode String Type Stored Directly at its Address (or "Unicode ShortString")

    - by Andreas Rejbrand
    I want a string type that is Unicode and that stores the string directly at the adress of the variable, as is the case of the (Ansi-only) ShortString type. I mean, if I declare a S: ShortString and let S := 'My String', then, at @S, I will find the length of the string (as one byte, so the string cannot contain more than 255 characters) followed by the ANSI-encoded string itself. What I would like is a Unicode variant of this. That is, I want a string type such that, at @S, I will find a unsigned 32-bit integer (or a single byte would be enough, actually) containing the length of the string in bytes (or in characters, which is half the number of bytes) followed by the Unicode representation of the string. I have tried WideString, UnicodeString, and RawByteString, but they all appear only to store an adress at @S, and the actual string somewhere else (I guess this has do do with reference counting and such). Update: The most important reason for this is probably that it would be very problematic if sizeof(string) were variable. I suspect that there is no built-in type to use, and that I have to come up with my own way of storing text the way I want (which actually is fun). Am I right? Update I will, among other things, need to use these strings in packed records. I also need manually to read/write these strings to files/the heap. I could live with fixed-size strings, such as <= 128 characters, and I could redesign the problem so it will work with null-terminated strings. But PChar will not work, for sizeof(PChar) = 1 - it's merely an address. The approach I eventually settled for was to use a static array of bytes. I will post my implementation as a solution later today.

    Read the article

  • String.Empty in strings, need some explanation if possible :)

    - by Pabuc
    Hello all, 2 days ago, there was a question related to string.LastIndexOf(String.Empty) returning the last index of string. So I thought that; a string can always contain string.empty between characters like: "testing" == "t" + String.Empty + "e" + String.Empty +"sting" + String.Empty; After this, I wanted to test if String.IndexOf(String.Empty) was returning 0 because since String.Empty can be between any char in a string, that would be what I expect it to return and I wasn't wrong. string testString = "testing"; int index = testString.LastIndexOf(string.Empty); // index is 6 index = testString.IndexOf(string.Empty); // index is 0 It actually returned 0. I started to think that if I could split a string with String.Empty, I would get at least 2 string and those would be String.Empty and rest of the string since String.IndexOf(String.Empty) returned 0 and String.LastIndexOf(String.Empty) returned length of the string.. Here is what I coded: string emptyString = string.Empty; char[] emptyStringCharArr = emptyString.ToCharArray(); string myDummyString = "abcdefg"; string[] result = myDummyString.Split(emptyStringCharArr); The problem here is, I can't obviously convert String.Empty to char[] and result in an empty string[]. I would really love to see the result of this operation and the reason behind this. So my questions are: Is there any way to split a string with String.Empty? If it is not possible but in an absolute world which it would be possible, would it return an array full of chars like [0] = "t" [1] = "e" [2] = "s" and so on or would it just return the complete string? Which would make more sense and why? Thank you for your time.

    Read the article

  • Validate Unicode String and Escape if Unicode is Invalid (C/C++)

    - by vy32
    I have a program that reads arbitrary data from a file system and outputs results in Unicode. The problem I am having is that sometimes filenames are valid Unicode and sometimes they aren't. So I want a function that can validate a string (in C or C++) and tell me if it is a valid UTF-8 encoding. If it is not, I want to have the invalid characters escaped so that it will be a valid UTF-8 encoding. This is different than escaping for XML --- I need to do that also. But first I need to be sure that the Unicode is right. I've seen some code from which I could hack this, but I would rather use some working code if it exists.

    Read the article

  • Unicode license

    - by Eric Grange
    Unicode Terms of use (http://www.unicode.org/copyright.html) state that any software that uses their data files (or a modification of it) should carry the Unicode license references. It seems to me that most Unicode libraries have functions to check if a character is a digit, a letter, a symbol, etc. ans so will contain a modification of the Unicode Data Files (usually in the form of tables). Does that mean the license applies and all applications that use such Unicode libraries should carry the license? I've checked around, and it appears very few Unicode software do carry the license, though arguable most of those that didn't carry the license were from companies that were members of the Unicode consortium (do they get license exemption?). Some (f.i. Mozilla) are only "Liaison Members", and while their software do not carry the license (AFAICT), they do obviously rely on data derived from those data files. Is Mozilla in breach of the license? Should we carry the license in all apps that include any form of advanced Unicode support? (ie. are bound to rely on the Unicode data files) Or is there some form of broad exemption? (since very very few software out there carries the license)

    Read the article

  • How to get full query string parameters not UrlDecoded

    - by developerit
    Introduction While developing Developer IT’s website, we came across a problem when the user search keywords containing special character like the plus ‘+’ char. We found it while looking for C++ in our search engine. The request parameter output in ASP.NET was “c “. I found it strange that it removed the ‘++’ and replaced it with a space… Analysis After a bit of Googling and Reflection, it turns out that ASP.NET calls UrlDecode on each parameters retreived by the Request(“item”) method. The Request.Params property is affected by this two since it mashes all QueryString, Forms and other collections into a single one. Workaround Finally, I solve the puzzle usign the Request.RawUrl property and parsing it with the same RegEx I use in my url re-writter. The RawUrl not affected by anything. As its name say it, it’s raw. Published on http://www.developerit.com/

    Read the article

  • How to display multiple unicode text files in non-unicode program

    - by Stan
    OS:WinXP Say I got some files in Chinese and some files in Korean. And in windows 'Region and Language Options', I set language for non-unicode program = Chinese. Is there any way that I can read some Korean text file in text editor easily without using Microsoft Word? I need an environment that can support multiple unicode easily, I need to read Chinese, Japanese, Korean in text editor (Ultraeditor, notepad++) and terminal clients like SecureCRT. Please advise, thanks.

    Read the article

  • How do I properly implement Unicode passwords?

    - by Sorin Sbarnea
    Adding support for Unicode passwords it an important feature that should not be ignored by the developpers. Still adding support for Unicode in the passwords it's a tricky job because the same text can be encoded in different ways in Unicode and this is not something you may want to prevent people from logging in due to this. Let's say that you'll store the passwords os UTF-8. Now the question is how you should normalize the Unicode data? You had to be sure that you'll be able to compare it. You need to be sure that when the next Unicode standard will be released it will not invalidate your password verification. Note: still there are some places where Unicode passwords are probably never be used, but this question is not about why or when to use Unicode passwords, is about how to implement them the proper way.

    Read the article

  • Python and Unicode: How everything should be Unicode

    - by A A
    Forgive if this a long a question: I have been programming in Python for around six months. Self taught, starting with the Python tutorial and then SO and then just using Google for stuff. Here is the sad part: No one told me all strings should be Unicode. No, I am not lying or making this up, but where does the tutorial mention it? And most examples also I see just make use of byte strings, instead of Unicode strings. I was just browsing and came across this question on SO, which says how every string in Python should be a Unicode string. This pretty much made me cry! I read that every string in Python 3.0 is Unicode by default, so my questions are for 2.x: Should I do a: print u'Some text' or just print 'Text' ? Everything should be Unicode, does this mean, like say I have a tuple: t = ('First', 'Second'), it should be t = (u'First', u'Second')? I read that I can do a from __future__ import unicode_literals and then every string will be a Unicode string, but should I do this inside a container also? When reading/ writing to a file, I should use the codecs module. Right? Or should I just use the standard way or reading/ writing and encode or decode where required? If I get the string from say raw_input(), should I convert that to Unicode also? What is the common approach to handling all of the above issues in 2.x? The from __future__ import unicode_literals statement? Sorry for being a such a noob, but this changes what I have been doing for a long time and so clearly I am confused.

    Read the article

  • Formatting a date string when the string sits inside another string

    - by sf
    Hi, I'm trying to figure out a way to format a date string that sits inside string using javascript. The string can be: "hello there From 2010-03-04 00:00:00.0 to 2010-03-31 00:00:00.0" or "stuff like 2010-03-04 20:00:00.0 and 2010-03-31 00:00:02.0 blah blah" I'd like it to end up like: "stuff like 4 March 2010 and 31 March 2010 blah blah" Does anyone have any idea as to how this could be achieved?

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • SQLite, python, unicode, and non-utf data

    - by Nathan Spears
    I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just switch your application to Unicode strings. Ok, I switched to Unicode strings. Then I started getting the message: sqlite3.OperationalError: Could not decode to UTF-8 column 'tag_artist' with text 'Sigur Rós' when trying to retrieve data from the db. More research and I started encoding it in utf8, but then 'Sigur Rós' starts looking like 'Sigur Rós' note: My console was set to display in 'latin_1' as @John Machin pointed out. What gives? After reading this, describing exactly the same situation I'm in, it seems as if the advice is to ignore the other advice and use 8-bit bytestrings after all. I didn't know much about unicode and utf before I started this process. I've learned quite a bit in the last couple hours, but I'm still ignorant of whether there is a way to correctly convert 'ó' from latin-1 to utf-8 and not mangle it. If there isn't, why would sqlite 'highly recommend' I switch my application to unicode strings? I'm going to update this question with a summary and some example code of everything I've learned in the last 24 hours so that someone in my shoes can have an easy(er) guide. If the information I post is wrong or misleading in any way please tell me and I'll update, or one of you senior guys can update. Summary of answers Let me first state the goal as I understand it. The goal in processing various encodings, if you are trying to convert between them, is to understand what your source encoding is, then convert it to unicode using that source encoding, then convert it to your desired encoding. Unicode is a base and encodings are mappings of subsets of that base. utf_8 has room for every character in unicode, but because they aren't in the same place as, for instance, latin_1, a string encoded in utf_8 and sent to a latin_1 console will not look the way you expect. In python the process of getting to unicode and into another encoding looks like: str.decode('source_encoding').encode('desired_encoding') or if the str is already in unicode str.encode('desired_encoding') For sqlite I didn't actually want to encode it again, I wanted to decode it and leave it in unicode format. Here are four things you might need to be aware of as you try to work with unicode and encodings in python. The encoding of the string you want to work with, and the encoding you want to get it to. The system encoding. The console encoding. The encoding of the source file Elaboration: (1) When you read a string from a source, it must have some encoding, like latin_1 or utf_8. In my case, I'm getting strings from filenames, so unfortunately, I could be getting any kind of encoding. Windows XP uses UCS-2 (a Unicode system) as its native string type, which seems like cheating to me. Fortunately for me, the characters in most filenames are not going to be made up of more than one source encoding type, and I think all of mine were either completely latin_1, completely utf_8, or just plain ascii (which is a subset of both of those). So I just read them and decoded them as if they were still in latin_1 or utf_8. It's possible, though, that you could have latin_1 and utf_8 and whatever other characters mixed together in a filename on Windows. Sometimes those characters can show up as boxes, other times they just look mangled, and other times they look correct (accented characters and whatnot). Moving on. (2) Python has a default system encoding that gets set when python starts and can't be changed during runtime. See here for details. Dirty summary ... well here's the file I added: \# sitecustomize.py \# this file can be anywhere in your Python path, \# but it usually goes in ${pythondir}/lib/site-packages/ import sys sys.setdefaultencoding('utf_8') This system encoding is the one that gets used when you use the unicode("str") function without any other encoding parameters. To say that another way, python tries to decode "str" to unicode based on the default system encoding. (3) If you're using IDLE or the command-line python, I think that your console will display according to the default system encoding. I am using pydev with eclipse for some reason, so I had to go into my project settings, edit the launch configuration properties of my test script, go to the Common tab, and change the console from latin-1 to utf-8 so that I could visually confirm what I was doing was working. (4) If you want to have some test strings, eg test_str = "ó" in your source code, then you will have to tell python what kind of encoding you are using in that file. (FYI: when I mistyped an encoding I had to ctrl-Z because my file became unreadable.) This is easily accomplished by putting a line like so at the top of your source code file: # -*- coding: utf_8 -*- If you don't have this information, python attempts to parse your code as ascii by default, and so: SyntaxError: Non-ASCII character '\xf3' in file _redacted_ on line 81, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Once your program is working correctly, or, if you aren't using python's console or any other console to look at output, then you will probably really only care about #1 on the list. System default and console encoding are not that important unless you need to look at output and/or you are using the builtin unicode() function (without any encoding parameters) instead of the string.decode() function. I wrote a demo function I will paste into the bottom of this gigantic mess that I hope correctly demonstrates the items in my list. Here is some of the output when I run the character 'ó' through the demo function, showing how various methods react to the character as input. My system encoding and console output are both set to utf_8 for this run: '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Now I will change the system and console encoding to latin_1, and I get this output for the same input: 'ó' = original char <type 'str'> repr(char)='\xf3' 'ó' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' 'ó' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Notice that the 'original' character displays correctly and the builtin unicode() function works now. Now I change my console output back to utf_8. '?' = original char <type 'str'> repr(char)='\xf3' '?' = unicode(char) <type 'unicode'> repr(unicode(char))=u'\xf3' '?' = char.decode('latin_1') <type 'unicode'> repr(char.decode('latin_1'))=u'\xf3' '?' = char.decode('utf_8') ERROR: 'utf8' codec can't decode byte 0xf3 in position 0: unexpected end of data Here everything still works the same as last time but the console can't display the output correctly. Etc. The function below also displays more information that this and hopefully would help someone figure out where the gap in their understanding is. I know all this information is in other places and more thoroughly dealt with there, but I hope that this would be a good kickoff point for someone trying to get coding with python and/or sqlite. Ideas are great but sometimes source code can save you a day or two of trying to figure out what functions do what. Disclaimers: I'm no encoding expert, I put this together to help my own understanding. I kept building on it when I should have probably started passing functions as arguments to avoid so much redundant code, so if I can I'll make it more concise. Also, utf_8 and latin_1 are by no means the only encoding schemes, they are just the two I was playing around with because I think they handle everything I need. Add your own encoding schemes to the demo function and test your own input. One more thing: there are apparently crazy application developers making life difficult in Windows. #!/usr/bin/env python # -*- coding: utf_8 -*- import os import sys def encodingDemo(str): validStrings = () try: print "str =",str,"{0} repr(str) = {1}".format(type(str), repr(str)) validStrings += ((str,""),) except UnicodeEncodeError as ude: print "Couldn't print the str itself because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print ude try: x = unicode(str) print "unicode(str) = ",x validStrings+= ((x, " decoded into unicode by the default system encoding"),) except UnicodeDecodeError as ude: print "ERROR. unicode(str) couldn't decode the string because the system encoding is set to an encoding that doesn't understand some character in the string." print "\tThe system encoding is set to {0}. See error:\n\t".format(sys.getdefaultencoding()), print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the unicode(str) because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('latin_1') print "str.decode('latin_1') =",x validStrings+= ((x, " decoded with latin_1 into unicode"),) try: print "str.decode('latin_1').encode('utf_8') =",str.decode('latin_1').encode('utf_8') validStrings+= ((x, " decoded with latin_1 into unicode and encoded into utf_8"),) except UnicodeDecodeError as ude: print "The string was decoded into unicode using the latin_1 encoding, but couldn't be encoded into utf_8. See error:\n\t", print ude except UnicodeDecodeError as ude: print "Something didn't work, probably because the string wasn't latin_1 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('latin_1') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t", print uee try: x = str.decode('utf_8') print "str.decode('utf_8') =",x validStrings+= ((x, " decoded with utf_8 into unicode"),) try: print "str.decode('utf_8').encode('latin_1') =",str.decode('utf_8').encode('latin_1') except UnicodeDecodeError as ude: print "str.decode('utf_8').encode('latin_1') didn't work. The string was decoded into unicode using the utf_8 encoding, but couldn't be encoded into latin_1. See error:\n\t", validStrings+= ((x, " decoded with utf_8 into unicode and encoded into latin_1"),) print ude except UnicodeDecodeError as ude: print "str.decode('utf_8') didn't work, probably because the string wasn't utf_8 encoded. See error:\n\t", print ude except UnicodeEncodeError as uee: print "ERROR. Couldn't print the str.decode('utf_8') because the console is set to an encoding that doesn't understand some character in the string. See error:\n\t",uee print print "Printing information about each character in the original string." for char in str: try: print "\t'" + char + "' = original char {0} repr(char)={1}".format(type(char), repr(char)) except UnicodeDecodeError as ude: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), ude) except UnicodeEncodeError as uee: print "\t'?' = original char {0} repr(char)={1} ERROR PRINTING: {2}".format(type(char), repr(char), uee) print uee try: x = unicode(char) print "\t'" + x + "' = unicode(char) {1} repr(unicode(char))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = unicode(char) ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = unicode(char) {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('latin_1') print "\t'" + x + "' = char.decode('latin_1') {1} repr(char.decode('latin_1'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('latin_1') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('latin_1') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) try: x = char.decode('utf_8') print "\t'" + x + "' = char.decode('utf_8') {1} repr(char.decode('utf_8'))={2}".format(x, type(x), repr(x)) except UnicodeDecodeError as ude: print "\t'?' = char.decode('utf_8') ERROR: {0}".format(ude) except UnicodeEncodeError as uee: print "\t'?' = char.decode('utf_8') {0} repr(char)={1} ERROR PRINTING: {2}".format(type(x), repr(x), uee) print x = 'ó' encodingDemo(x) Much thanks for the answers below and especially to @John Machin for answering so thoroughly.

    Read the article

  • Unicode issue in Django

    - by Kave
    I seem to have a unicode problem with the deal_instance_name in the Deal model. It says: coercing to Unicode: need string or buffer, __proxy__ found The exception happens on this line: return smart_unicode(self.deal_type.deal_name) + _(u' - Set No.') + str(self.set) The line works if I remove smart_unicode(self.deal_type.deal_name) but why? Back then in Django 1.1 someone had the same problem on Stackoverflow I have tried both the unicode() as well as the new smart_unicode() without any joy. What could I be missing please? class Deal(models.Model): def __init__(self, *args, **kwargs): super(Deal, self).__init__(*args, **kwargs) self.deal_instance_name = self.__unicode__() deal_type = models.ForeignKey(DealType) deal_instance_name = models.CharField(_(u'Deal Name'), max_length=100) set = models.IntegerField(_(u'Set Number')) def __unicode__(self): return smart_unicode(self.deal_type.deal_name) + _(u' - Set No.') + str(self.set) class Meta: verbose_name = _(u'Deal') verbose_name_plural = _(u'Deals') Dealtype: class DealType(models.Model): deal_name = models.CharField(_(u'Deal Name'), max_length=40) deal_description = models.TextField(_(u'Deal Description'), blank=True) def __unicode__(self): return smart_unicode(self.deal_name) class Meta: verbose_name = _(u'Deal Type') verbose_name_plural = _(u'Deal Types')

    Read the article

  • vb.net string concatenation string + function output + string = string + function output and no more

    - by Barfieldmv
    The following output produces a string with no closing xml tag. m_rFlight.Layout = m_rFlight.Layout + "<G3Grid:Spots>" + Me.gvwSpots.LayoutToString() + "</G3Grid:Spots>" This following code works correctly m_rFlight.Layout = m_rFlight.Layout + "<G3Grid:Spots>" + Me.gvwSpots.LayoutToString() m_rFlight.Layout = m_rFlight.Layout + "</G3Grid:Spots>" 'add closing tag What's going on here, what's the reason the first example isnt working and the second is? The gvwSpots.LayoutToString() function returns a string.

    Read the article

  • Remove accents from String .NET

    - by developerit
    Private Const ACCENT As String = “ÀÁÂÃÄÅàáâãäåÒÓÔÕÖØòóôõöøÈÉÊËèéêëÌÍÎÏìíîïÙÚÛÜùúûüÿÑñÇç” Private Const SANSACCENT As String = “AAAAAAaaaaaaOOOOOOooooooEEEEeeeeIIIIiiiiUUUUuuuuyNnCc” Public Shared Function FormatForUrl(ByVal uriBase As String) As String If String.IsNullOrEmpty(uriBase) Then Return uriBase End If ‘// Declaration de variables Dim chaine As String = uriBase.Trim.Replace(” “, “-”) chaine = chaine.Replace(” “c, “-”c) chaine = chaine.Replace(“–”, “-”) chaine = chaine.Replace(“‘”c, String.Empty) chaine = chaine.Replace(“?”c, String.Empty) chaine = chaine.Replace(“#”c, String.Empty) chaine = chaine.Replace(“:”c, String.Empty) chaine = chaine.Replace(“;”c, String.Empty) ‘// Conversion des chaines en tableaux de caractŠres Dim tableauSansAccent As Char() = SANSACCENT.ToCharArray Dim tableauAccent As Char() = ACCENT.ToCharArray ‘// Pour chaque accent For i As Integer = 0 To ACCENT.Length – 1 ‘ // Remplacement de l’accent par son ‚quivalent sans accent dans la chaŒne de caractŠres chaine = chaine.Replace(tableauAccent(i).ToString(), tableauSansAccent(i).ToString()) Next ‘// Retour du resultat Return chaine End Function

    Read the article

  • On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U

    - by Jian Lin
    On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U So I would press Window Key + R to run something, and type in cmd /U so that the content might handle Unicode. And then using dir or tree /F, the content in Unicode won't show as Unicode. (in Window Explorer (file manager), the Unicode will show) Is there a way to handle it? To get Unicode characters to test your filenames, you can go to http://news.google.com/news?edchanged=1&ned=tw and you will be able to get many Unicode characters there (UTF-8)

    Read the article

  • On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U

    - by ????
    On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U So I would press Window Key + R to run something, and type in cmd /U so that the content might handle Unicode. And then using dir or tree /F, the content in Unicode won't show as Unicode. (in Window Explorer (file manager), the Unicode will show) Is there a way to handle it? To get Unicode characters to test your filenames, you can go to http://news.google.com/news?edchanged=1&ned=tw and you will be able to get many Unicode characters there (UTF-8)

    Read the article

  • What causes exception in string.concat

    - by Budda
    Here are few classes: public class MyItem : ParamOut { public string Code { get { return _quoteCode; } } public InnerItem[] Skus { get { return _skus; } } public PriceSummary Price { get { return _price; } } public override string ToString() { return string.Concat("Code=", Code, "; SKUs=[", Skus != null ? "{" + string.Join("},{", Array.ConvertAll(Skus, item => item.ToString())) + "}" : "", "]" , "; Price={", Price.ToString(), "}", base.ToString() ); } ... } public abstract class ParamOut { public override string ToString() { return null; } public string ErrorMessage { get; set; } } Here is calling functionality: { MyItem item = new MyItem{ ErrorMessage = "Bla-bla-bla" }; string text = item.ToString(); } I am getting NullReference exception inside of 'ToString()' method (each property of item variable is null). Question: Q1. what overload of string.Concat will be called in this case? I have 9 parameters, so I guess one of the following: public static string Concat(params Object[] args) or public static string Concat(params string[] values) But which of them? Q2. Why exception is generated? Shouldn't 'null' be converted into something like 'null' or "" (empty string)? Thanks a lot!

    Read the article

  • How to convert a unicode charactor array back to unicode sequence in C++

    - by eddyxd
    My problem is how to convert a c/c++ string/chractor array to another string contain the unicode(UTF-16) escape sequence of original one for example, I want to find a function F(char *ch) could do following function. char a[10] = "\u5f53"; printf("a = %s\n",a); char b[10]; b = F(a); //<- F is the function I wanted printf("b = %s\n",b); -------- console will show ------- a = ? b = \u5f53 Anyone has any Idea@@?~ thanks!! ps: I tried to guess \u5f35 means the value store in a, but it is not indeed the value of a[0] = -79 , a[1] = 105 ... So I don't know how to convert it back to the sequence of unicode.... Please give me a hane~ : )

    Read the article

  • How to unicode Myanmar texts on Java? [closed]

    - by Spacez Ly Wang
    I'm just beginner of Java. I'm trying to unicode (display) correctly Myanmar texts on Java GUI ( Swing/Awt ). I have four TrueType fonts which support Myanmar unicode texts. There are Myanmar3, Padauk, Tharlon, Myanmar Text ( Window 8 built-in ). You may need the fonts before the code. Google the fonts, please. Each of the fonts display on Java GUI differently and incorrectly. Here is the code for GUI Label displaying myanmar texts: ++++++++++++++++++++++++ package javaapplication1; import javax.swing.JFrame; import javax.swing.JTextField; public class CusFrom { private static void createAndShowGUI() { JFrame frame = new JFrame("Hello World Swing"); frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); String s = "\u1015\u102F \u103C\u1015\u102F"; JLabel label = new JLabel(s); label.setFont(new java.awt.Font("Myanmar3", 0, 20));// font insert here, Myanmar Text, Padauk, Myanmar3, Tharlon frame.getContentPane().add(label); frame.pack(); frame.setVisible(true); } public static void main(String[] args) { javax.swing.SwingUtilities.invokeLater(new Runnable() { public void run() { createAndShowGUI(); } }); } } ++++++++++++++++++++++++ Outputs vary. See the pictures: Myanmar3 IMG Padauk IMG Tharlon IMG Myanmar Text IMG What is the correct form? (on notepad) Well, next is the code for GUI Textfield inputting Myanmar texts: ++++++++++++++++++++++++ package javaapplication1; import javax.swing.JFrame; import javax.swing.JTextField; public class XusForm { private static void createAndShowGUI() { JFrame frame = new JFrame("Frame Title"); frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); JTextField textfield = new JTextField(); textfield.setFont(new java.awt.Font("Myanmar3", 0, 20)); frame.getContentPane().add(textfield); frame.pack(); frame.setVisible(true); } public static void main(String[] args) { javax.swing.SwingUtilities.invokeLater(new Runnable() { public void run() { createAndShowGUI(); } }); } } ++++++++++++++++++++++++ Outputs vary when I input keys( unicode text ) on keyboards. Myanmar Text Output IMG Padauk Output IMG Myanmar3 Output IMG Tharlon Output IMG Those fonts work well on Linux when opening text files with Text Editor application. My Question is how to unicode Myanmar texts on Java GUI. Do I need additional codes left to display well? Or Does Java still have errors? The fonts display well on Web Application (HTML, CSS) but I'm not sure about displaying on Java Web Application.

    Read the article

  • Unicode fonts render incorrectly in Terminal

    - by Sridher
    My Ubuntu 13.04 terminal renders Unicode Indic fonts incorrectly, but they are rendered correctly in Firefox, gedit, Chrome etc. How can I fix this? Works fine in: Firefox, Chrome, Gedit, Open Office Not working in: Terminal UPDATE : Here is the screenshot from my desktop showing the telugu font rendering in various applications (including my sample pygame example) note : pygame unicode, console renders wrong and same but rest of the apps correct

    Read the article

  • Delphi Unicode String Type Stored Directly at its Address

    - by Andreas Rejbrand
    I want a string type that is Unicode and that stores the string directly at the adress of the variable, as is the case of the (Ansi-only) ShortString type. I mean, if I declare a S: ShortString and let S := 'My String', then, at @S, I will find the length of the string (as one byte, so the string cannot contain more than 255 characters) followed by the ANSI-encoded string itself. What I would like is a Unicode variant of this. That is, I want a string type such that, at @S, I will find a unsigned 32-bit integer containing the length of the string in bytes (or in characters, which is half the number of bytes) followed by the Unicode representation of the string. I have tried WideString, UnicodeString, and RawByteString, but they all appear only to store an adress at @S, and the actual string somewhere else (I guess this has do do with reference counting and such). I suspect that there is no built-in type to use, and that I have to come up with my own way of storing text the way I want (which actually is fun). Am I right?

    Read the article

  • Using String+string+string vs using string.replace

    - by Madi D.
    A colleague told me that using the following method: string url = "SomeURL"; string ext = "SomeExt"; string sub = "SomeSub"; string subSub = "moreSub"; string Url = @"http://www." + Url +@"/"+ ext +@"/"+ sub + subSub; is not efficenet (takes more resources) and it is preferred to use the following method: string Url = @"http://www.#URL.#EXT/#sub/#subSub"; string url = "SomeURL"; string ext = "SomeExt"; string sub = "SomeSub"; string subSub = "moreSub"; Url.Replace("#URL",url) Url.Replace("#EXT",ext); Url.Replace("#sub",sub); Url.Replace("#subSub",subSub); Is that true? and what is the explanation behind it?

    Read the article

1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >