Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 6/66 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

latin1/unicode conversion problem with ajax request and special characters

- by mfn

Server is PHP5 and HTML charset is latin1 (iso-8859-1). With regular form POST requests, there's no problem with "special" characters like the em dash (–) for example. Although I don't know for sure, it works. Probably because there exists a representable character for the browser at char code 150 (which is what I see in PHP on the server for a literal em dash with ord). Now our application also provides some kind of preview mechanism via ajax: the text is sent to the server and a complete HTML for a preview is sent back. However, the ordinary char code 150 em dash character when sent via ajax (tested with GET and POST) mutates into something more: %E2%80%93. I see this already in the apache log. According to various sources I found, e.g. http://www.tachyonsoft.com/uc0020.htm , this is the UTF8 byte representation of em dash and my current knowledge is that JavaScript handles everything in Unicode. However within my app, I need everything in latin1. Simply said: just like a regular POST request would have given me that em dash as char code 150, I would need that for the translated UTF8 representation too. That's were I'm failing, because with PHP on the server when I try to decode it with either utf8_decode(...) or iconv('UTF-8', 'iso-8859-1', ...) but in both cases I get a regular ? representing this character (and iconv also throws me a notice: Detected an illegal character in input string ). My goal is to find an automated solution, but maybe I'm trying to be überclever in this case? I've found other people simply doing manual replacing with a predefined input/output set; but that would always give me the feeling I could loose characters. The observant reader will note that I'm behind on understanding the full impact/complexity with things about Unicode and conversion of chars and I definitely prefer to understand the thing as a whole then a simply manual mapping. thanks

Read the article
Reading Unicode files line by line C++

- by Roger Nelson

What is the correct way to read Unicode files line by line in C++? I am trying to read a file saved as Unicode (LE) by Windows Notepad. Suppose the file contains simply the characters A and B on separate lines. In reading the file byte by byte, I see the following byte sequence (hex) : FE FF 41 00 0D 00 0A 00 42 00 0D 00 0A 00 So 2 byte BOM, 2 byte 'A', 2byte CR , 2byte LF, 2 byte 'B', 2 byte CR, 2 byte LF . I tried reading the text file using the following code: std::wifstream file("test.txt"); file.seekg(2); // skip BOM std::wstring A_line; std::wstring B_line; getline(file,A_line); // I get "A" getline(file,B_line); // I get "\0B" I get the same results using operator instead of getline file >> A_line; file >> B_line; It appears that the single byte CR character is is being consumed only as the single byte. or CR NULL LF is being consumed but not the high byte NULL. I would expect wifstream in text mode would read the 2byte CR and 2byte LF. What am I doing wrong? It does not seem right that one should have to read a text file byte by byte in binary mode just to parse the new lines.

Read the article
Cross-platform iteration of Unicode string

- by kizzx2

I want to iterate each character of a Unicode string, treating each surrogate pair and combining character sequence as a single unit (one grapheme). Example The text "??????" is comprised of the code points: U+0928, U+092E, U+0938, U+094D, U+0924, U+0947, of which, U+0938 and U+0947 are combining marks. static void Main(string[] args) { const string s = "??????"; Console.WriteLine(s.Length); // Ouptuts "6" var l = 0; var e = System.Globalization.StringInfo.GetTextElementEnumerator(s); while(e.MoveNext()) l++; Console.WriteLine(l); // Outputs "4" } So there we have it in .NET. We also have Win32's CharNextW() #include <Windows.h> #include <iostream> #include <string> int main() { const wchar_t * s = L"??????"; std::cout << std::wstring(s).length() << std::endl; // Gives "6" int l = 0; while(CharNextW(s) != s) { s = CharNextW(s); ++l; } std::cout << l << std::endl; // Gives "4" return 0; } Question Both ways I know of are specific to Microsoft. Are there portable ways to do it? I heard about ICU but I couldn't find something related quickly (UnicodeString(s).length() still gives 6). Would be an acceptable answer to point to the related function/module in ICU. C++ doesn't have a notion of Unicode, so a lightweight cross-platform library for dealing with these issues would make an acceptable answer.

Read the article
Using JavaMail to send a mail containing Unicode characters

- by NoozNooz42

I'm successfully sending emails through GMail's SMTP servers using the following piece of code: Properties props = new Properties(); props.put("mail.smtp.host", "smtp.gmail.com"); props.put("mail.smtp.socketFactory.port", "465"); props.put("mail.smtp.socketFactory.class","javax.net.ssl.SSLSocketFactory"); props.put("mail.smtp.auth", "true"); props.put("mail.smtp.port", "465"); props.put("mail.smtp.ssl", "true"); props.put("mail.smtp.starttls.enable","true"); props.put("mail.smtp.timeout", "5000"); props.put("mail.smtp.connectiontimeout", "5000"); // Do NOT use Session.getDefaultInstance but Session.getInstance // See: http://forums.sun.com/thread.jspa?threadID=5301696 final Session session = Session.getInstance( props, new javax.mail.Authenticator() { protected PasswordAuthentication getPasswordAuthentication() { return new PasswordAuthentication( USER, PWD ); } }); try { final Message message = new MimeMessage(session); message.setFrom( new InternetAddress( USER ) ); message.setRecipients( Message.RecipientType.TO, InternetAddress.parse( TO ) ); message.setSubject( emailSubject ); message.setText( emailContent ); Transport.send(message); emailSent = true; } catch ( final MessagingException e ) { e.printStackTrace(); } where emailContent is a String that does contain Unicode characters (like the euro symbol). When the email arrives (in another GMail account), the euro symbol has been converted to the ASCII '?' question mark. I don't know much about emails: can email use any character encoding? What should I modify in the code above so that an encoding allowing Unicode characters is used?

Read the article
Perl: Convert unicode codepoint (\uXXXX) into character

- by Peterim

I have some unicode codepoints (\u5315\u4e03\u58ec\u4e8c\u4e0a\u53b6\u4e4b), which I have to convert into actual characters they represent. What's the simplest way to do so? Thank you.

Read the article
Convert Unicode char to closest (most similar) char in ASCII (.net)

- by Andrey

Hi all! Do you have any idea how to covert different Unicode characters to their closest ASCII equivalents? Like Ä - A. A googled but didn't find any suitable solution. Trick Encoding.ASCII.GetBytes("Ä")[0] didn't work. (Result was ?). I found that there is class Encoder that has Fallback property that is exactly for cases when char can't be converted, but implementations (EncoderReplacementFallback) are stupid and convert to ?. Any ideas? Thanks, Andrey

Read the article
SHA-1 and Unicode

- by Andrew

Hi everyone, Is behavior of SHA-1 algorithm defined for Unicode strings? I do realize that SHA-1 itself does not care about the content of the string, however, it seems to me that in order to pass standard tests for SHA-1, the input string should be encoded with UTF-8.

Read the article
Insert unicode strings into CleverCSS

- by Brian M. Hunt

How can one insert a Unicode string CSS into CleverCSS? In particular, how could one produce the following CSS using CleverCSS: li:after { content: "\00BB \0020"; } I've figured out CleverCSS's parsing rules, but suffice that the permutations I've thought sensible have failed, for example: li: content: "\\00BB \\0020" // becomes content: 'BB 0' EDIT: My other examples and the rest of my post weren't saved. Suffice that I had a longer list of examples that also failed, as did my closing which was something like: I'd be grateful for any thoughts and input. Brian

Read the article
Need a simple tool for analysing unicode characters

- by Steve Bennett

I'm surprised I can't find a simple tool for this. Basically, sometimes as a result of text munging, or using some piece of software, I end up with some text that has some troublesome characters - such as looking a lot like other characters, but being distinct from them. I'd like a tool (preferably online, javascript based) where I can paste the text, and it will tell me all the characters involved, their names, unicode codes etc.

Read the article
Convert or strip out "illegal" Unicode characters

- by Oli

I've got a database in MSSQL that I'm porting to SQLite/Django. I'm using pymssql to connect to the database and save a text field to the local SQLite database. However for some characters, it explodes. I get complaints like this: UnicodeDecodeError: 'ascii' codec can't decode byte 0x97 in position 1916: ordinal not in range(128) Is there some way I can convert the chars to proper unicode versions? Or strip them out?

Read the article
SQL Server 2005 Convert Ascii to Unicode

- by Guazz

I have data in an nvarchar field with data in ascii format: "Zard FrÃ¨res Guesta" How do I convert it to a readable(unicode) format in t-sql?

Read the article
lxml unicode entity parse problems

- by Jon Hadley

I'm using lxml as follows to parse an exported XML file from another system: xmldoc = open(filename) etree.parse(xmldoc) But im getting: lxml.etree.XMLSyntaxError: Entity 'eacute' not defined, line 4495, column 46 Obviously it's having problems with unicode entity names - but how would i get round this? Via open() or parse()?

Read the article
UILabel displaying Unicode Characters

- by Lee Armstrong

Hello, I have an NSString that then sets a UILabel. This contains unicode such as... E = MC Hammer\U00ac\U2264 and complete ones such as \U2013\U00ee\U2013\U00e6\U2013\U2202\U2013\U220f\U2013\U03c0 \U2013\U00ee\U2013\U220f\U2013\U03c0\U2013\U00aa\U2013\U221e\U2014\U00c5 These are not displaying correctly, is there anything I need to do to parse these at all?

Read the article
Unicode characters not showing in System.Windows.Forms.TextBox

- by Sean

These characters show fine when I cut-and-paste them here from the VisualStudio debugger, but both in the debugger, and in the TextBox where I am trying to display this text, it just shows squares. ??\r\n???????,3-9 ?????????,???2 ?,???3 ?;10 ????4 ???????????,???2 ??\r\n??\r\n??????????,???????\r\n I thought that the TextBox supported Unicode text. Any idea how I can get this text to display in my application?

Read the article
Determining whether or not a font can render a Unicode character in Cocoa Touch

- by conmulligan

Hi folks, I'm wondering if there's a way to determine whether or not a font supports a particular Unicode character in Cocoa Touch. Alternatively, is it possible to specify the default substitute character?

Read the article
Unicode troubles

- by user343803

Hello, i have just known Python for few days. Unicode seems to be a problem with Python. i have a text file stores a text string like this '\u0110\xe8n \u0111\u1ecf n\xfat giao th\xf4ng Ng\xe3 t\u01b0 L\xe1ng H\u1ea1' i can read the file and print the string out but it displays incorrectly. How can i print it out to screen correctly as follow: "Ðèn d? nút giao thông Ngã tu Láng H?" Thanks in advance

Read the article
clang unicode characters for variable name

- by anon

cat test.cpp #include <iostream> int main() { int à; } results in: clang++ test.cpp test.cpp:4:7: error: expected unqualified-id int à; ^ 1 error generated. Now, is there a way to get clang to allow unicode variable names? Thanks!

Read the article
MySQL don't want to store unicode charecter

- by Qiao

Why MySQl don't wont to store unicode character ??? Yes, it is rare hieroglyph, you wouldn't see it in the browser. UTF16 is U+2B5EE Warning: #1366 Incorrect string value: '\xF0\xAB\x97\xAE' for column 'ch' at row 1 Is it possible to store this character in MySQL?

Read the article
Printing Unicode in eclipse Pydev console and in Idle

- by Jonathan

My configuration: Win7 + Python 2.6 + eclipse + PyDev How do I enable Unicode print statements in: PyDev console in eclipse Idle Python GUI Example print statement: print(u"???? ????") This comes out as: ùìåí òåìí

Read the article
Unicode Regex; Invalid XML characters

- by Ambush Commander

The list of valid XML characters is well known, as defined by the spec it's: #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] My question is whether or not it's possible to make a PCRE regular expression for this (or its inverse) without actually hard-coding the codepoints, by using Unicode general categories. An inverse might be something like [\p{Cc}\p{Cs}\p{Cn}], except that improperly covers linefeeds and tabs and misses some other invalid characters.

Read the article
SQL Server 2005 Convert Ascii to Unicode (UTF-8 -> nvarchar)

- by Guazz

I have data in an nvarchar field with data in ascii format: "Zard FrÃ¨res Guesta" How do I convert it to a readable(unicode) format in t-sql?

Read the article
Does Lua support Unicode?

- by TimK

Based on the link below, I'm confused as to whether the Lua programming language supports Unicode. http://lua-users.org/wiki/LuaUnicode It appears it does but has limitations. I simply don't understand, are the limitation anything big/key or not a big deal?

Read the article
Unicode filename to python subprocess.call()

- by otrov

I'm trying to run subprocess.call() with unicode filename, and here is simplified problem: n = u'c:\\windows\\notepad.exe ' f = u'c:\\temp\\nèw.txt' subprocess.call(n + f) which raises famous error: UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' Encoding to utf-8 produces wrong filename, and mbcs passes filename as new.txt without accent I just can't read any more on this confusing subject and spin in circle. I found here lot of answers for many different problems in past so I thought to join and ask for help myself Thanks

Read the article
How to validate unicode characters ?

- by morningglory

Hello, I have one registration form, I don't want people to register login username with unicode characters. How can i put server side validation PHP + client side validation javascript or jquery. Please kindly help me out. Thank you.

Read the article
regex unicode charater in vim

- by aidan

I'm being an idiot. Someone cut and pasted some text from microsoft word into my lovely html files. I now have these unicode characters instead of regular quote symbols, (i.e. quotes appear as <92 in the text) I want to do a regex replace but I'm having trouble selecting them. :%s/\u92/'/g :%s/\u5C/'/g :%s/\x92/'/g :%s/\x5C/'/g ...all fail. My google-fu has failed me.

Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >