unicode - Page 4 - Developer IT

Converting Unicode strings to escaped ascii string

- by Ali

How can I convert this string: This string contains the unicode character Pi(p) into an escaped ascii string: This string contains the unicode character Pi(\u03a0) and vice versa ? The current Encoding available in C#, converts the p character into "?". I need to preserve that character.

Read the article

Unicode replacement characters for text matching

- by Christian Harms

I have some fun with unicode text sources (all correct encodet) and I want to match names. The classic problem, one source comes correctly, an other has more flatten names: "Elblag" vs. "Elblag" (see the character a) How can I "flatten" a, á, â or à to a for better matching? Are there unicode to ascii- matching tables?

Read the article

Java Unicode encoding

- by Marcus

A Java char is 2 bytes (max size of 65,536) but there are 95,221 Unicode characters. Does this mean that you can't handle certain Unicode characters in a Java application? Does this boil down to what character encoding you are using?

Read the article

passing unicode string from C# exe to C++ DLL

- by Martin

Using this function in my C# exe, I try to pass a Unicode string to my C++ DLL: [DllImport("Test.dll", CharSet = CharSet.Unicode, CallingConvention = CallingConvention.StdCall)] public static extern int xSetTestString(StringBuilder xmlSettings); This is the function on the C++ DLL side: __declspec(dllexport) int xSetTestString(char* pSettingsXML); Before calling the function in C#, I do a MessageBox.Show(string) and it displays all characters properly. On the C++ side, I do: OutputDebugStringW((wchar_t*)pString);, but that shows that the non-ASCII characters were replaced by '?'.

Read the article

Python, Unicode, and the Windows console

- by James Sulak

When I try to print a Unicode string in a windows console, I get a "UnicodeEncodeError: 'charmap' codec can't encode character ...." error. I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this? Is there any way I can make Python automatically print a "?" instead of failing in this situation? Edit: I'm using Python 2.5.

Read the article

Django approximate matching of unicode strings with ascii equivalents

- by c

I have the following model and instance: class Bashable(models.Model): name = models.CharField(max_length=100) >>> foo = Bashable.objects.create(name=u"piñata") Now I want to be able to search for objects, but using ascii characters rather than unicode, something like this: >>> Bashable.objects.filter(name__lookslike="pinata") Is there a way in Django to do this sort of approximate string matching, using ascii stand-ins for the unicode characters in the database? Here is a related question, but for Apple's Core Data.

Read the article

Unicode characters and IE

- by findmeahamper

I just built a site that relies on certain Unicode characters like Ⓐ, but have just realized that IE doesn't show these characters? Is there some meta tag to get the browser to show it or how do you update IE to handle these Unicode characters?

Read the article

Entering Unicode characters in LaTeX

- by John D. Cook

How do I enter Unicode characters in LaTeX? What packages do I need to install and what escape sequence do I type to specify Unicode characters in an ASCII source file?

Read the article

IIS 6.0 Server and Unicode Characters

- by Srikanth

We are performing a pen test on a simple asp application that uses MS SQL Database. It seems for the authentication they are using dynamic constructed queries but escaping single qoutes. When we use Unicode quotes like %uFFO7,%u02b9 etc we are able to successfully inject SQL injections. Want to understand is it more a kind of configuration issue of IIS server to cannonicalize Unicode characters or the way the validation function to escape single quotes is written is the cause of the problem?

Read the article

Displaying unicode character U+2661 ("White Heart Suit") in Windows 7

- by Jordan

I can't get this character: ? to display properly in Windows Explorer, it instead shows up as a symbol of three lines, similar to this ?. The strangest thing is that if i use the heart symbol beside another unusual symbol, such as one of these: ??????, it will display correctly as a heart; yet if I delete the symbol which is next to the heart it will revert to the 3 lines symbol. All of these other symbols display correctly when used alone. Does anybody else have this problem? Is it possible that Windows has 2 different characters listed for U+2661? Thanks for any help

Read the article

SED and Unicode Quotation Marks

- by Jonathan Patt

When testing against this string: “… so that’s that… ” The following should, but does not, match the opening quotation mark and following ellipsis and space: sed "s/\([“‘\"']…\) /\1/g" However, this correctly matches the second ellipsis and following space and closing quotation mark: sed "s/… \([”’\"'.!?]\)/…\1/g" If I split the first apart it works fine: sed -e "s/\(“…\) /\1/g" \ -e "s/\(‘…\) /\1/g" \ -e "s/\(\"…\) /\1/g" \ -e "s/\('…\) /\1/g" So why doesn't it work when it's grouped together? Especially when it works fine with the closing quotation marks.

Read the article

Telugu (unicode) font rendering in emacs

- by Prakash K

[I asked the following question in stackoverflow, and I have been redirected here. I hope I can get some answers here. My question at stackoverflow had two small images showing the example rendering of text. As a new user at superuser, I am not being allowed to include them here, nor I am allowed to post more than one hyperlink. And, I don't have enough reputation on SO to migrate that question. Please look at the stackoverflow question for the images. Sorry about the inconvenience.] I sometimes edit text in telugu language. However, when I open the file (UTF-8 encoded) in GNU emacs (version 23.1.50.1 on Ubuntu Jaunty) the text rendering is incorrect. The same text file opened in gedit is rendered correctly. Here's a snippet: ????????? ???? ???? ???????? rendred in gedit: Please see the SO question for the image showing telugu text rendering in gedit And, the emacs rendering of the same text: Please see the SO question for the image showing telugu text rendering in emacs Wherever glyphs need to be composited (not sure if it's the right word), emacs (or whatever library it uses) is not doing it right. Is there anyway to fix this? Perhaps tuning some setting in my configuration? Any ideas, please?

Read the article

Telugu (unicode) font rendering in emacs

- by Prakash K

I sometimes edit text in telugu language. However, when I open the file (UTF-8 encoded) in GNU emacs (version 23.1.50.1 on Ubuntu Jaunty) the text rendering is incorrect. The same text file opened in gedit is rendered correctly. Here's a snippet: ????????? ???? ???? ???????? rendred in gedit: And, the emacs rendering of the same text: Wherever glyphs need to be composited (not sure if it's the right word), emacs (or whatever library it uses) is not doing it right. Is there anyway to fix this? Perhaps tuning some setting in my configuration? Any ideas, please?

Read the article

Cannot use Alt code for Unicode character insertion any more

- by Bergi

I've been using the Alt code for the ellipsis, 8230, for some time now, in several applications. A few days ago it stopped working, and & is displayed instead of … when pressing Alt+8+2+3+0 (on numpad). This happened both on my desktop and on my laptop (where I use it with Fn). Both run on 64bit-Win-7 with code page 850, and both might have recently updated Windows and Opera 12. What could be the reason this input method got disabled, and how do I switch it back? Btw, I just found out that Alt+0+1+3+3 does work.

Read the article

Haskell: convert unicode integer to actual unicode character

- by Thor Thurn

Suppose that my Haskell function is given an input, which is supposed to be the number of a unicode code point. How can one convert this to the corresponding character? Example: 0x0123 to '{'.

Read the article

Is there stl and utf8 friendly C++ Wrapper for ICU, or other powerful unicode library

- by artyom

Hello, I need a good Unicode library for C++. I need Transformations in Unicode sensitive way. For example sort all strings in case insensitive way and get their first characters for index. Convert to upper and to lower various Unicode strings. Split text in reasonable position -- words that would work for Chinese and Japanese as well. Formatting numbers, dates in locale sensitive way (should be thread safe). Transparent support of utf8 (primary internal representation). As far as I know the best library is ICU. However, I can't find normal developer friendly API documentation with examples. Also as far as I see, it is not too friendly with modern C++ design, work with STL and so on. Like this std::string msg; unistring umsg.from_utf8(msg); unistring::word_iterator wi; for(wi=umsg.words().begin(),n=0;wi!=usmg.words().wi_end(),n<10;++wi,++n) ; msg=umsg.substr(umsg.words().begin(),wi).to_utf8(); cout<<_("Five 10 words are ")<<msg; Does anybody know good STL friendly ICU wrapper released under Open Source license preferred permissive like MIT or Boost, but others LGPLv2 compatible are ok as well. Is there another high quality library similar to ICU? Platform: UNIX/POSIX, Windows support is not required. Thanks, Artyom Edit: Unfortunatly I wasn't logged in so I can't make asnver accepted... I had attached the ansver by myself.

Read the article

Regular expression of unicode characters on string

- by Marcus King

I'm working in c# doing some OCR work and have extracted the text I need to work with. Now I need to parse a line using Regular Expressions. string checkNum; string routingNum; string accountNum; Regex regEx = new Regex(@"\u9288\d+\u9288"); Match match = regEx.Match(numbers); if (match.Success) checkNum = match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1); regEx = new Regex(@"\u9286\d{9}\u9286"); match = regEx.Match(numbers); if(match.Success) routingNum = match.Value.Remove(0, 1).Remove(match.Value.Length - 1, 1); regEx = new Regex(@"\d{10}\u9288"); match = regEx.Match(numbers); if (match.Success) accountNum = match.Value.Remove(match.Value.Length - 1, 1); The problem is that the string contains the necessary unicode characters when I do a .ToCharArray() and inspect the contents of the string, but it never seems to recognize the unicode characters when I parse the string looking for them. I thought strings in C# were unicode by default.

Read the article

Problem using unicode in URLs with cgi.PATH_INFO in ColdFusion

- by Loftx

Hi there, My ColdFusion (MX7) site has search functionality which appends the search term to the URL e.g. http://www.example.com/search.cfm/searchterm. The problem I'm running into is this is a multilingual site, so the search term may be in another language e.g. ??????? leading to a search URL such as http://www.example.com/search.cfm/??????? The problem is when I come to retrieve the search term from the URL. I'm using cgi.PATH_INFO to retrieve the path of the search page and the search term and extracting the search term from this e.g. /search.cfm/searchterm however, when unicode characters are used in the search they are converted to question marks e.g. /search.cfm/??????. These appear actual question marks, rather than the browser not being able to format unicode characters, or them being mangled on output. I can't find any information about whether ColdFusion supports unicode in the URL, or how I can go about resolving this and getting hold of the complete URL in some way - does anyone have any ideas? Cheers, Tom

Read the article

Why are there so many spaces and line breaks in Unicode?

- by maaartinus

Unicode has maybe 50 spaces \u0009\u000A-\u000D\u0020\u0085\u00A0\u1680\u180E\u2000-\u200A\u2028\u2029\u202F\u205F\u3000][\u0009\u000A-\u000D\u0020\u0085\u00A0\u1680\u180E\u2000-\u200A\u2028\u2029\u202F\u205F\u3000 and 6 line breaks not only CRLF, LF, CR, but also NEL (U+0085), PS (U+2029) and LS (U+2028). Maybe I could understand most of the spaces and PS ("Paragraph separator"), but what are "Next Line" and "Line separator" good for? It all looks like invented by a very big committee where everybody wanted their own space and the leaders were granted one line break each. But seriously, how do you deal with it when your programming language doesn't support it (or does it wrong as e.g. Java does)?

Read the article

Strange characters appearing on websites - ASCII? - UNICODE?

- by Mick

I have created many very simple pure HTML websites over the years. Most of them appear to work fine most of the time. But there is one recurring problem which I have never quite sorted out involving strange characters. The scenario goes like this: I create the site. I look at it in my browser, everything appears fine. I may look at it a great many times over the coming weeks or months as I make additions here and there. Perhaps on a variety of browsers on a variety of PC's. Then one day I look at the page and see a random sprinkling of white question marks against dark diamond shapes. These might appear where I had expected to see hyphens or quotes or apostrophes. My immediate thought is that my browser got into some strange state because I was looking at some foreign website with strange characters, but I'm never quite sure. I'm left with that nagging feeling that perhaps half the planet is seeing my website with funny question marks all over it. So my question is what's going on? What should I do to ensure that as many people as possible around the world can view my text as I originally intended? Should I be using those special html sequences like £ for all non alphanumeric characters? Should I worry at all? Edit: Right now I have the problem occurring on this page: http://www.fullreservebanking.com/papers.htm ... part of it looks like this: I am using FireFox 5 and the character encoding currently appears to be "UNICODE (UTF-8)". I do not remember manually setting the character encoding to anything since installation. I do occasionally look at Japanese websites for work related reasons - though when I do so, I do not manually make any changes to firefox settings. Edit: Now fixed. Web page altered accordingly.

Read the article

VBA or Vb Scripting: how to write to a file in Unicode using FileSystemObject

- by Craig Johnston

How would you use FileSystemObject to write to a string to a file in Unicode?

Read the article

Why is Django reverse() failing with unicode?

- by JeffS

Here is a django models file that is not working as I would expect. I would expect the to_url method to do the reverse lookup in the urls.py file, and get a url that would correspond to calling that view with arguments supplied by the Arguments model. from django.db import models class Element(models.Model): viewname = models.CharField(max_length = 200) arguments = models.ManyToManyField('Argument', null = True, blank = True ) @models.permalink def to_url(self): d = dict( self.arguments.values_list('key', 'value') ) return (self.viewname, (), d) class Argument(models.Model): key = models.CharField(max_length=200) value = models.CharField(max_length=200) The value d ends up as a dictionary from a unicode string to another unicode string, which I believe, should work fine with the reverse() method that would be called by the permalink decorator, however, it results in: TypeError: reverse() keywords must be strings

Read the article

What DVCS support Unicode filenames?

- by Craig McQueen

I'm interested in trying out distributed version control systems. git sounds promising, but I saw a note somewhere for the Windows port of git that says "don't use non-ASCII filenames". I can't find that now, but there is this link. It's put me off git for now, but I don't know if the other options are any better. Support for non-ASCII filenames is essential for my Japanese company. I'm looking for one that internally stores filenames as Unicode, not a platform-dependent encoding which would cause endless grief. So: What DVCS support Unicode filenames? In both Windows and Linux? Ideally, with the possibility to transfer repositories between Windows and Linux machines with minimal issues?

Read the article

html tag attribute displayed in unicode

- by user297975

I have the following code, from which you can see that, I use the same way to create the text in utf-8. The text shown between html tags are shown corrently. But the text shown as html tag attribute are shown in unicode. I'm positive that on the server side(PHP), both texts are treated in the same way and are encoded in utf-8. Why the text as html tag attribute shown in unicode? ?????????????????????? ??

Read the article

Track unicode words from Twitter using Ruby and the Tweetstream API

- by Régis B.

I am trying to track a set of keywords from Twitter by using the Streaming API (can't post the link here because of spam limitations: google twitter streaming API). I am doing this inside Ruby, using the TweetStream gem: http://bit.ly/cODAWI The problem I have is that I want to track keywords that contain some unicode/UTF-8 characters. For instance: require 'rubygems' require 'tweetstream' TweetStream::Client.new("my_user_name", "my_password").track("é") do |s| puts s.text end (you can try it out, provided you installed the tweetstream and json gems) This piece of code does not print anything, while replacing "é" with "e" outputs a bunch of tweets continuously. I did not find any reliable documentation about Unicode in Ruby, so I have no idea where the problem comes from. Thanks for your help!

Search Results

Search found 1474 results on 59 pages for 'unicode'.

Page 4/59 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Ali

- by Christian Harms

- by Marcus

- by Martin

- by James Sulak

- by c

- by findmeahamper

- by John D. Cook

- by Srikanth

- by Jordan

- by Jonathan Patt

- by Prakash K

- by Prakash K

- by Bergi

- by Thor Thurn

- by artyom

- by Marcus King

- by Loftx

- by maaartinus

- by Mick

- by Craig Johnston

- by JeffS

- by Craig McQueen

- by user297975

- by Régis B.

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >