unicode - Page 18 - Developer IT

Really fast C++ html parser

- by Alessandro

Hello to all, I'm doing a html text feature extractor in C++; the program need to be REALLY fast: i need to extract a this features in ms per html page and the memory usage needs to be good and finally unicode encoding well be nice. I know how difficult is to have all of this things, but i want a parser close to these things at least. Somebody have a suggestion?

Read the article

Convertion strings like \\uXXXX in python

- by Gregory Lo

I have a strings like \uXXXX (representation) and I need to convert it into unicode. I recieve it from 3rd party service so python interpreter don't convert it and I need convertion in my code. How do I do it in Python? >>> s u'\\u0e4f\\u032f\\u0361\\u0e4f'

Read the article

How do you print raw UTF-8 characters from their numbers? [PHP]

- by Xeoncross

Say I wanted to print a ÿ (latin small y with diaeresis) from it's Unicode/UTF-8 number of "U+00FF" or hex of "c3 bf". How can I do that in PHP? Note: In order for it show correctly in a browser I know that the first step is header('Content-Type: text/html; charset=utf-8');

Read the article

Changing text appearence in vim

- by anon

Suppose I have a file, whose entire contents is: \u1234 and suppose 1234 is the code for \alpha is there a way to, in vim, have the "\1234" show up as a single \alpha symbol (and be treated as an \alpha symbol) ? Thanks! [This problem arises since I want to to use unicode names in g++]

Read the article

C++: Chr() and unichr() equivalent?

- by alex

I could have sworn I used a chr() function 40 minutes ago but can't find the file. I know it can go up to 256 so I use this: std::string chars = ""; chars += (char) 42; //etc So that's alright, but I really want to access unicode characters. Can I do (w_char) 512? Or maybe something just like the unichr() function in python, I just can't find a way to access any of those characters.

Read the article

How does u?op-?pisdn text work?

- by flybywire

I have found upside down text in this website: http://www.cheesygames.com/upside-down-text how does it work? does unicode have upside down chars? Or what? How can I write my own text flipping function?

Read the article

Why do (Russian) characters in some received emails change when reading in David InfoCenter?

- by waszkiewicz

I'm using David InfoCenter as email Software, and I have troubles with some of my emails in Russian. It's only a few letters, in some emails (sent from different people), like for example the "R" ("P" in russian) will be shown as a "T". In other emails in Russian, the problem doesn't appear. Isn't it strange? Does anyone had the same problem already and found where it came from? When I transmit that email to an external mailbox (internet email account), it's even worse, and gives me symbols instead of all Russian letters... The default encoding was "Russian (ISO)", I changed it to "Russian (Windows)", but same problem. Another weird reaction is when I write an intern email and name it TEST in Russian (????), with ???? in the text window, it changes the title to "Oano"? But the content stays in Russian... With Mailinator I got the following, for message and subject "????": Subject: ???? [..] MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----_=_NextPart_000_00017783.4AF7FB71" This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_000_00017783.4AF7FB71 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 0KLQtdGB0YI= ------_=_NextPart_000_00017783.4AF7FB71 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: base64 PCFET0NUWVBFIEhUTUwgUFVCTElDICItLy9XM0MvL0RURCBIVE1MIDQuMCBUcmFuc2l0aW9uYWwv L0VOIj4NCjxIVE1MPjxIRUFEPg0KPE1FVEEgaHR0cC1lcXVpdj1Db250ZW50LVR5cGUgY29udGVu dD0idGV4dC9odG1sOyBjaGFyc2V0PXV0Zi04Ij4NCjxNRVRBIG5hbWU9R0VORVJBVE9SIGNvbnRl bnQ9Ik1TSFRNTCA4LjAwLjYwMDEuMTg4NTIiPjwvSEVBRD4NCjxCT0RZIHN0eWxlPSJGT05UOiAx MHB0IENvdXJpZXIgTmV3OyBDT0xPUjogIzAwMDAwMCIgbGVmdE1hcmdpbj01IHRvcE1hcmdpbj01 Pg0KPERJViBzdHlsZT0iRk9OVDogMTBwdCBDb3VyaWVyIE5ldzsgQ09MT1I6ICMwMDAwMDAiPtCi 0LXRgdGCPFNQQU4gDQppZD10b2JpdF9ibG9ja3F1b3RlPjxTUEFOIGlkPXRvYml0X2Jsb2NrcXVv dGU+PC9ESVY+PC9TUEFOPjwvU1BBTj48L0JPRFk+PC9IVE1MPg== ------_=_NextPart_000_00017783.4AF7FB71--

Read the article

compose-key mappings differ between gtk and qt apps

- by intuited

I'm noticing that there is an inconsistency in the output of one of the compose-key combos. When I type ( [Compose] . . ) under Chrome, gedit, gnome-terminal, or roxterm I get the character '?'. This is a small raised dot: $ echo -n '?' | xxd 0000000: cb99 .. When I type the same combo under konsole, yakuake, or kate, I get the character '…'. This is an ellipsis: $ echo -n '…' | xxd 0000000: e280 a6 ... This is not a font issue: if I copy-paste the characters from an app using one toolkit to an app using the other, its appearance is maintained. I use a few other combos pretty regularly and they seem to work consistently across toolkits. I think this is a pretty recent phenomenon. I upgraded from Ubuntu 8.10 to 9.10 fairly recently so this might be related. I'm not sure if this will reoccur if I restart X, and I'd rather not find out. Can someone explain how this is possible, and what I can do to resolve it? I'd like to have the ellipsis appear in all apps when that combo is entered.

Read the article

Cannot copy non-latin characters from PDF document

- by user17381

Hi, I have a pdf file which contains some non-latin european characters. If I copy some text with the highlight tool, and paste it into another program (word, notepad) - the 'special' characters do not transfer correctly (I get other odd characters in their place). I have tried copying the text from both Acrobat Reader and Foxit. Is there anything I can do here to copy this? Thanks

Read the article

How to enable utf-8 in xpdf outline pane and search

- by Thanos D. Papaïoannou

Xpdf version 3.02 downloaded from the Ubuntu repositories and run on Ubuntu 8.04.3 replaces greek utf-8 characters with blank characters in the outline pane, i.e. the bookmark pane, and in the search window. In particular, it is impossible to search for greek words in documents. Is there a way to enable utf-8 support in xpdf so that 1. and 2. above work properly? Thanks!

Read the article

Using Chinese Characters With Mod_Rewrite

- by Moak

I'm trying to create a rule using Chinese characters #RewriteRule ^zh(.*) /???$1 [L,R=301] creates error 500 when i change the file to UTF-8 #RewriteRule ^zh(.*) /%E4%B8%AD%E6%96%87%E7%89%88$1 [L,R=301] redirects to /%25E4%25B8%25AD%25E6%2596%2587%25E7%2589%2588 (basically replacing % with %25) Anybody familiar with this problem?

Read the article

Graphic Designers: Where can I find some nice Sanskrit/Hindi/Devanagari fonts?

- by ???

I'm looking for different fonts rather than the default typefaces for the language. Anyone know of a website where I can view multiple fonts and possible preview some text? I want to try some different fonts out on this phrase: ??????? ????

Read the article

Complications registering a punycode domain name

- by chaz

Not sure if any of you have experience with this, but I am trying to include the anchor (?) in my domain name (using the appropriate punycode to allow it) but upon registering it I encounter the error that the symbol is not supported by the language I have chosen. Does anyone know what language would support this if I were to continue or even how I would go about doing so or if i can even do so. Thanks

Read the article

Handling UTF-8 with BOM in HTTP

- by Alois Mahdal

Say I have a script which at some point serves a plain text file as a content (right after "\n\n"). These files are provided by users, but I can expect they will be UTF-8. So I hard-wire Content-Type: text/plain; charset=UTF-8. But while I can teach users to save everything in UTF-8, I can't be very sure that the files will be without BOM ("\xEE\xBB\xBF"), as at least on Windows, this is not very clearly distinguished in common plain text editors and not every one of them uses the same default. So what about these files created on Windows, where they may/may not start with BOM? Should/will server or UA get rid of this debris for me? Or is it my task to prepare clean UTF-8, i.e. open each file and check whether BOM needs to be removed?

Read the article

Using Chinese Charachters With Mod_Rewrite

- by Moak

I'm trying to create a rule using Chinese characters #RewriteRule ^zh(.*) /???$1 [L,R=301] creates error 500 when i change the file to UTF-8 #RewriteRule ^zh(.*) /%E4%B8%AD%E6%96%87%E7%89%88$1 [L,R=301] redirects to /%25E4%25B8%25AD%25E6%2596%2587%25E7%2589%2588 (basically replacing % with %25) Anybody familiar with this problem?

Read the article

Can PuTTY be configured to display the following UTF-8 characters?

- by Stuart Powers

I'd like to be able to render the characters as seen in this tweet: I saved the tweet's JSON data and wrote a one-liner python script for testing. python -c 'import json,urllib; print json.load(urllib.urlopen("http://c.sente.cc/BUCq/tweet.json"))["text"]' This next image shows the output of this command on two different putty sessions, one with Bitstream Vera Sans Mono font and the other is using Courier New: Next is an example of correct output (I wasn't using PuTTY): The original JSON is at this link using Twitter's API. How can I get PuTTY to display those characters?

Read the article

Unable to see some Russian id3 tags in ncmpc

- by ??????? ???????????

I'm running urxvt with the current env: $ env | grep LC LC_ALL=en_US.UTF-8 The problem is either with ncurses or ncmpc and I was wondering if anyone could shed some light on what the problem might be. This could also be an issue with the ID3 tags and any advice on working with broken or misconfigured encoding settings in meta tags in mp3 files is also welcome. I have been ignoring this matter for years and it has finally gotten to me. The bizarre thing is that some filenames or tags work, while others do not. What I have tried the following: setting LC_ALL to these values (whatever is before the space) ru_RU.KOI8-R KOI8-R ru_RU.UTF-8 UTF-8 ru_RU ISO-8859-5 rebuilding the MPD database with id3v1_encoding "ISO-8859-1" or id3v1_encoding "UTF-8" I can demonstrate the problem with two screen shots, as it's the easiest way to do so: Expected output (mpc works well): Broken encoding (ncmpc):

Read the article

Trouble registering punycode domain!

- by chaz

Not sure if any of you have experience with this, but I am trying to include the anchor (?) in my domain name (using the appropriate punycode to allow it) but upon registering it I encounter the error that the symbol is not supported by the language I have chosen. Does anyone know what language would support this if I were to continue or even how I would go about doing so or if i can even do so. Thanks

Read the article

Emacs quail: Less verbose completions?

- by kdb

Emacs's quail functionality with (set-input-method "TeX") is great for typesetting mathematical notes in plain text. It even has completions, but, well... After \su<TAB> I get Possible completion and corresponding characters are: \su: - \sub: - \subs: - \subse: - \subset:(1/1) 1.? \suc: - \succ:(1/1) 1.? \succa: - \succap: - \succc: - \succcu: - \succe: - \succeq:(1/1) 1.? \succn: - \succna: - \succns: - \succs: - \succsi: - \sum:(1/1) 1.? \sup: - \sups: - \supse: - \supset:(1/1) 1.? \sur: - \surd:(1/1) 1.v Is there some possibility to make the completion output less verbose, showing only the full completions rather than the full paths?

Read the article

Korean characters not appearing in Korean Windows XP computer

- by user13267

I am using a Korean software (with a partial English interface) in a Korean Version of Windows XP SP 3 However, in parts of the software, even when I change the interface to Korean, Korean letters show up as random characters, as shown here: This is happening at others parts of the software as well, and I am not sure what is the difference between the places where this is happening, and places where this is not happening. For example, a command button where Korean letters are showing up properly is shown below: This software is a video conferencing software and has a chat feature as well. When I type into the chat box, i can see the Korean letters appear properly at my side, but when I press Enter and send the message, it changes into random characters as shown above in the chat box. What could be the issue here? Could it be a missing font in my computer? Since this is a Korean Windows installation I was hoping everything would work properly by default. What can be done here? EDIT 1: I asked some other people who are using this software, and they think that the problem is at my end, and playing around with the Regional and Language Settings might solve the problem. Also, they suggested I install all the language packs related to Korean display. But it looks like all the language packs have been installed, and my location is set to Korea in Regional and Language Settings in Control Panel, and I still have this problem. Also, I have had similar problems with displaying Korean on an English Windows XP computer. This answer suggested some solutions, but I still do not quite understand exactly what I have to do (at that time I had not fixed the problem, as I later on changed the computer). If I follow that answer, what fonts exactly do I need to install?

Read the article

Per-character-set font size in Firefox not working?

- by Coderer

Firefox has a setting (Preferences - Content - Fonts & Colors - Advanced) that is supposed to let you set font preferences for different character sets. I've tried setting larger minimum font sizes for some non-Western character sets (I'm still learning, and have to see extra detail to tell them apart!) and nothing seems to happen. For example, if there's Hangul on a page (like this one), it will show in the same size as the Latin characters around it, even if I set "minimum font size" to 24. Am I misunderstanding how that setting is supposed to work, or does it just not do anything? Is there any other way to blow up only non-Western characters while leaving the letters I know how to read intact?

Read the article

How do I type Square character

- by John

I have to write x/100000 with square character: x*10-5 ,but the '-5' should be above 10 so it is known that it is x/10/10/10/10/10. How do I do that?

Read the article

In utf-8 collation, why 11- is less then 1- ?

- by ???

I found that the sort result in ASCII: 1- 11- and in UTF-8: 11- 1- I feel it's so counter-intuitive, and it's not dictionary order. Isn't the character '-' (002d) is always less then [0-9] (0030-0039)? What's the general rule in UTF-8 collation? And how to bypass it, just make - be less then [0-9] while keep other characters unchanged for UTF-8, in Linux? (So it can affects the result of ls --sort, sort, etc. )

Read the article

Square character

- by John

I have to write x/100000 with square character: x*10-5 ,but the '-5' should be above 10 so it is known that it is x/10/10/10/10/10. How do I do that?

Read the article

overriding ctype<wchar_t>

- by Potatoswatter

I'm writing a lambda calculus interpreter for fun and practice. I got iostreams to properly tokenize identifiers by adding a ctype facet which defines punctuation as whitespace: struct token_ctype : ctype<char> { mask t[ table_size ]; token_ctype() : ctype<char>( t ) { for ( size_t tx = 0; tx < table_size; ++ tx ) { t[tx] = isalnum( tx )? alnum : space; } } }; (classic_table() would probably be cleaner but that doesn't work on OS X!) And then swap the facet in when I hit an identifier: locale token_loc( in.getloc(), new token_ctype ); … locale const &oldloc = in.imbue( token_loc ); in.unget() >> token; in.imbue( oldloc ); There seems to be surprisingly little lambda calculus code on the Web. Most of what I've found so far is full of unicode ? characters. So I thought to try adding Unicode support. But ctype<wchar_t> works completely differently from ctype<char>. There is no master table; there are four methods do_is x2, do_scan_is, and do_scan_not. So I did this: struct token_ctype : ctype< wchar_t > { typedef ctype<wchar_t> base; bool do_is( mask m, char_type c ) const { return base::do_is(m,c) || (m&space) && ( base::do_is(punct,c) || c == L'?' ); } const char_type* do_is (const char_type* lo, const char_type* hi, mask* vec) const { base::do_is(lo,hi,vec); for ( mask *vp = vec; lo != hi; ++ vp, ++ lo ) { if ( *vp & punct || *lo == L'?' ) *vp |= space; } return hi; } const char_type *do_scan_is (mask m, const char_type* lo, const char_type* hi) const { if ( m & space ) m |= punct; hi = do_scan_is(m,lo,hi); if ( m & space ) hi = find( lo, hi, L'?' ); return hi; } const char_type *do_scan_not (mask m, const char_type* lo, const char_type* hi) const { if ( m & space ) { m |= punct; while ( * ( lo = base::do_scan_not(m,lo,hi) ) == L'?' && lo != hi ) ++ lo; return lo; } return base::do_scan_not(m,lo,hi); } }; (Apologies for the flat formatting; the preview converted the tabs differently.) The code is WAY less elegant. I does better express the notion that only punctuation is additional whitespace, but that would've been fine in the original had I had classic_table. Is there a simpler way to do this? Do I really need all those overloads? (Testing showed do_scan_not is extraneous here, but I'm thinking more broadly.) Am I abusing facets in the first place? Is the above even correct? Would it be better style to implement less logic?

Search Results

Search found 1474 results on 59 pages for 'unicode'.

Page 18/59 | < Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >

- by Alessandro

- by Gregory Lo

- by Xeoncross

- by anon

- by alex

- by flybywire

- by waszkiewicz

- by intuited

- by user17381

- by Thanos D. Papaïoannou

- by Moak

- by ???

- by chaz

- by Alois Mahdal

- by Moak

- by Stuart Powers

- by ??????? ???????????

- by chaz

- by kdb

- by user13267

- by Coderer

- by John

- by ???

- by John

- by Potatoswatter

< Previous Page | 14 15 16 17 18 19 20 21 22 23 24 25 | Next Page >