utf 8 - Page 4 - Developer IT

Java: Converting UTF 8 to String

- by kujawk

When I run the following program: public static void main(String args[]) throws Exception { byte str[] = {(byte)0xEC, (byte)0x96, (byte)0xB4}; String s = new String(str, "UTF-8"); } on Linux and inspect the value of s in jdb, I correctly get: s = "ì–´" on Windows, I incorrectly get: s = "?" My byte sequence is a valid UTF-8 character in Korean, why would it be producing two very different results?

Read the article

C++ UTF-8 lightweight & permissive code?

- by xenthral

Anyone know of a more permissive license (MIT / public domain) version of this: http://library.gnome.org/devel/glibmm/unstable/classGlib_1_1ustring.html ('drop-in' replacement for std::string thats UTF-8 aware) Lightweight, does everything I need and even more (doubt I'll use the UTF-XX conversions even) I really don't want to be carrying ICU around with me.

Read the article

Can str_replace be safely used on a UTF-8 encoded string if it's only given valid UTF-8 encoded stri

- by Manos Dilaverakis

PHP's str_replace() was intended only for ANSI strings and as such can mangle UTF-8 strings. However, given that it's binary-safe would it work properly if it was only given valid UTF-8 strings as arguments? Edit: I'm not looking for a replacement function, I would just like to know if this hypothesis is correct.

Read the article

replacing characters with UTF-8 after using mysql_set_charset('utf8') function

- by Ahmet vardar

I converted all mysql tables to utf-8_unicode and started using mysql_set_charset('utf8'); function. But after this, some characters like S, Ö started looking like Ã– , Åž How can i replace this kinda letters in mysql with UTF-8 format ? shortly, can i find a list of all these kinda characters to replace ? EDIT: He is explaining about this issue in this article actually but i cannot understand it properly acutally lol http://www.oreillynet.com/onlamp/blog/2006/01/turning_mysql_data_in_latin1_t.html

Read the article

strnicmp equivalent for UTF-8?

- by Jen

What do I use to perform a case-insensitive comparison on two UTF-8 encoded sub-strings? Essentially, I'm looking for a strnicmp function for UTF-8.

Read the article

Can I convert my database/script to UTF-8 ?

- by Mohannad Otaibi

How can I convert a database to support UTF-8 and convert it's old data from what ever encoding they're in to UTF-8 ? Extra Info: I'm running a server which has many websites on it, and one of them is running WHMCS (php script to manage hosting clients). WHMCS has an iPhone application where i can browse it through iPhone, the problem is that this application will only run if everything in my website is in UTF-8 encoding. I was using windows-1256 as encoding in my script's settings, and i tried changing that in some point of time to UTF-8 for a while then changed it back to windows-1256 so, the data in the database are some inserted using UTF standards and most of them are windows-1256 If someone could clear the picture for me, Do I need to convert every database on the server or just one DB ? what should I change? If i had to do that manually, I'll do it but I need some expert advise.

Read the article

Is PHP serialize function compatible UTF-8 ?

- by Matthieu

I have a site I want to migrate from ISO to UTF-8. I have a record in database indexed by the following primary key : s:22:"Informations générales"; The problem is, now (with UTF-8), when I serialize the string, I get : s:24:"Informations générales"; (notice the size of the string is now the number of bytes, not string length) So this is not compatible with non-utf8 previous records ! Did I do something wrong ? How could I fix this ? Thanks

Read the article

Problem with UTF-8

- by Pablo Fernandez

I'm using castor as an OXM mapper, and I'm having a problem with UTF-8 encoding. The code here shows the issue: //Marshaller configuration ByteArrayOutputStream baos = new ByteArrayOutputStream(); OutputStreamWriter os = new OutputStreamWriter(baos, UTF_8); Marshaller marshaller = new Marshaller(os); marshaller.setSuppressXSIType(true); //Mappings configuration Mapping map = new Mapping(); map.loadMapping(MarshallingService.class.getResource(MAPPINGS_PATH)); marshaller.setMapping(map); //Example //BEFORE MARSHALLING: This prints correctly the UTF-8 Chars object.getName() ; marshaller.marshal(object); //AFTER MARSHALLING: This returns the characters like \435\235\654\345 return baos.toString(UTF_8);

Read the article

Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte

- by luckylak

Hi, I am trying to convert a string encoded in java in UTF-8 to ISO-8859-1. Say for example, in the string 'âabcd' 'â' is represented in ISO-8859-1 as E2. In UTF-8 it is represented as two bytes. C3 A2 I believe. When I do a getbytes(encoding) and then create a new string with the bytes in ISO-8859-1 encoding, I get a two different chars. Ã¢. Is there any other way to do this so as to keep the character the same i.e. âabcd?

Read the article

Display EURO Symbol in UTF-8 Format

- by StevieB

Im wondering how I can display a Euro symbol in UTF-8 Format ? Cheers

Read the article

trouble with utf-8 chars & apache2 rewrite rules

- by tixrus

I see the post http://stackoverflow.com/questions/2565864/validating-utf-8-in-htaccess-rewrite-rule and I think that is great, but a more fundamental problem I am having first: I needed to expand to handle utf-8 chars for query string parameters, names of directories, files, and used in displays to users etc. I configured my Apache with DefaultCharset utf-8 and also my php if that matters. My original rewrite rule filtered everything except regular A-Za-z and underscore and hyphen. and it worked. Anything else would give you a 404 (which is what I want!) Now, however it seems that everything matches, including stuff I don't want, however, although it seems to match it doesn't go in the query string unless it is a regular A-Za-z_- character string. I find this confusing, because the rule says put whatever you matched into the query string: Here is the original rule: RewriteRule ^/puzzle/([A-Za-z_-]+)$ /puzzle.php?g=$1 [NC] and here is the revised rule: RewriteRule ^/puzzle/(\w+)$ /puzzle.php?g=$1 [NC] I made the change because somewhere I read that \w matches ALL the alpha chars where as A-Zetc. only matches the ones without accents and stuff. It doesn't seem to matter which of those rules I use: Here is what happens: In the application I have this: echo $_GET['g']; If I feed it a url like http://mydomain.com/puzzle/USA it echoes out "USA" and works fine. If I feed it a url like http://mydomain.com/puzzle/México it echoes nothing for that and warns me that index g is not defined and of course doesn't get resources for Mexico. if I feed it a url like http://mydomain.com/puzzle/fuzzle/buzzle/j.qle it does the same thing. This last case should be a 404! And it does this no matter which of the above rules I use. I configured a rewrite log RewriteLogLevel 5 RewriteLog /opt/local/apache2/logs/puzzles.httpd.rewrite but it is empty. Here is from the regular access log (it gives a status of 200) [26/May/2010:11:21:42 -0700] "GET /puzzle/M%C3%A9xico HTTP/1.1" 200 342 [26/May/2010:11:21:54 -0700] "GET /puzzle/M/l.foo HTTP/1.1" 200 342 What can I do to get these $%#$@(*#@!!! characters but not slash, dot or other non-alpha into my program, and once there, will it decode them correctly??? Would posix char classes work any better? Is there anything else I need to configure?

Read the article

Lost in UTF-8 hell. (Django and Python)

- by user140314

I am working through the Django RSS reader project here. The RSS feed will read something like "OKLAHOMA CITY (AP) — James Harden let". The RSS feed's encoding reads encoding="UTF-8" so I believe I am passing utf-8 to markdown in the code snippet below. The em dash is where it chokes. I get the Django error of "'ascii' codec can't encode character u'\u2014' in position 109: ordinal not in range(128)" which is an UnicodeEncodeError. In the variables being passed I see "OKLAHOMA CITY (AP) \u2014 James Harden". The code line that is not working is: content = content.encode(parsed_feed.encoding, "xmlcharrefreplace") I am using markdown 2.0, django 1.1, and python 2.4. What is the magic sequence of encoding and decoding that I need to do to make this work? Thanks.

Read the article

PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string

- by BlaM

What I want to do is to remove all accents and umlauts from a string, turning "lärm" into "larm" or "andré" into "andre". What I tried to do was to utf8_decode the string and then use strtr on it, but since my source file is saved as UTF-8 file, I can't enter the ISO-8859-15 characters for all umlauts - the editor inserts the UTF-8 characters. Obviously a solution for this would be to have an include that's an ISO-8859-15 file, but there must be a better way than to have another required include? echo strtr(utf8_decode($input), 'ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ', 'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy'); UPDATE: Maybe I was a bit inaccurate with what I try to do: I do not actually want to remove the umlauts, but to replace them with their closest "one character ASCII" aequivalent.

Read the article

Efficient way to ASCII encode UTF-8

- by Andreas Gohr

I'm looking for a simple and efficient way to store UTF-8 strings in ASCII-7. With efficient I mean the following: all ASCII chars in the input should stay ASCII chars in the output the resulting string should be as short as possible the operation needs to be reversable without any data loss there should be no restriction on the input length the whole UTF-8 range should be allowed My first idea was to use Punycode (IDNA) as it fits the first three requirements, but it fails at the last two. Can anyone recommend an alternative encoding scheme? Even better if there's some code available to look at.

Read the article

Reading from the HTML DOM returns UTF-8 characters

- by teehoo

I have a contenteditable div where I'm reading individual characters and sending them off to a server (for more background this is similar to Google Wave where typing a character automatically sends it) I was using a plain old html textfield before and everything worked fine until I "upgraded" to a contenteditable div. My problem is that now the characters are in UTF-8 format, which is causing some weird problems on the server that I would rather not debug. It would be much easier to force everything to be ASCII on the client side. Is there any way to do this? I tried putting in a meta tag stating the html file is charset=ISO-8859-1, but it doesnt seem to work. Reading from the div tag still returns UTF-8 codes. (One example is when I press space I get the pair 0xC2 0xA0 which corresponds to a "non-breaking white space"

Read the article

Forcing a mixed ISO-8859-1 and UTF-8 multi-line string into UTF-8 in Perl

- by knorv

Consider the following problem: A multi-line string $junk contains some lines which are encoded in UTF-8 and some in ISO-8859-1. I don't know a priori which lines are in which encoding, so heuristics will be needed. I want to turn $junk into pure UTF-8 with proper re-encoding of the ISO-8859-1 lines. Also, in the event of errors in the processing I want to provide a "best effort result" rather than throwing an error. My current attempt looks like this: $junk = &force_utf8($junk); sub force_utf8 { my $input = shift; my $output = ''; foreach my $line (split(/\n/, $input)) { if (utf8::valid($line)) { utf8::decode($line); } $output .= "$line\n"; } return $output; } While this appears to work I'm certain this is not the optimal solution. How would you improve the force_utf8(...) sub?

Read the article

Python: UTF-8 problems (again...)

- by blahblah

I have a database which is synchronized against an external web source twice a day. This web source contains a bunch of entries, which have names and some extra information about these names. Some of these names are silly and I want to rename them when inserting them into my own database. To rename these silly names, I have a standard dictionary as such: RENAME_TABLE = { "Wsird" : "Weird", ... } As you can see, this is where UTF-8 comes into play. This is the function which performs renaming of all the problematic entries: def rename_all_entries(): all_keys = RENAME_TABLE.keys() entries = Entry.objects.filter(name__in=all_keys) for entry in entries: entry.name = RENAME_TABLE[entry.name] entry.save() So it tries to find the old name in RENAME_TABLE and renames the entry if found. However, I get a KeyError exception when using RENAME_TABLE[entry.name]. Now I'm lost, what do I do? I have... # -*- coding: utf-8 -*- ...in the top of the Python file.

Read the article

Global UTF-encoding, the right way

- by mowgli

I'm curious, as to what is the right way to have UTF-8 encoding on all web files All my files (incl. CSS and JS) are made and saved in UTF-8 encoding In PHP, I set the char-set on top of the main page (this page includes all others) with: header('Content-type: text/html; charset=utf-8'); In the same page I have this html meta tag: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> Then I stubled upon an external css file that has this on first line: @charset "UTF-8"; And now I wonder, should I set the charset INSIDE all my CSS/JS files too, like that? And/or should I serve each file with charset=utf-8 in the meta tag?

Read the article

Should UTF-16 be considered harmful?

- by Artyom

I'm going to ask what is probably quite a controversial question: "Should one of the most popular encodings, UTF-16, be considered harmful?" Why do I ask this question? How many programmers are aware of the fact that UTF-16 is actually a variable length encoding? By this I mean that there are code points that, represented as surrogate pairs, take more than one element. I know; lots of applications, frameworks and APIs use UTF-16, such as Java's String, C#'s String, Win32 APIs, Qt GUI libraries, the ICU Unicode library, etc. However, with all of that, there are lots of basic bugs in the processing of characters out of BMP (characters that should be encoded using two UTF-16 elements). For example, try to edit one of these characters: 𝄞 (U+1D11E) MUSICAL SYMBOL G CLEF 𝕥 (U+1D565) MATHEMATICAL DOUBLE-STRUCK SMALL T 𝟶 (U+1D7F6) MATHEMATICAL MONOSPACE DIGIT ZERO 𠂊 (U+2008A) Han Character You may miss some, depending on what fonts you have installed. These characters are all outside of the BMP (Basic Multilingual Plane). If you cannot see these characters, you can also try looking at them in the Unicode Character reference. For example, try to create file names in Windows that include these characters; try to delete these characters with a "backspace" to see how they behave in different applications that use UTF-16. I did some tests and the results are quite bad: Opera has problem with editing them (delete required 2 presses on backspace) Notepad can't deal with them correctly (delete required 2 presses on backspace) File names editing in Window dialogs in broken (delete required 2 presses on backspace) All QT3 applications can't deal with them - show two empty squares instead of one symbol. Python encodes such characters incorrectly when used directly u'X'!=unicode('X','utf-16') on some platforms when X in character outside of BMP. Python 2.5 unicodedata fails to get properties on such characters when python compiled with UTF-16 Unicode strings. StackOverflow seems to remove these characters from the text if edited directly in as Unicode characters (these characters are shown using HTML Unicode escapes). WinForms TextBox may generate invalid string when limited with MaxLength. It seems that such bugs are extremely easy to find in many applications that use UTF-16. So... Do you think that UTF-16 should be considered harmful?

Read the article

Locale variables have no effect in remote shell (perl: warning: Setting locale failed.)

- by Janning

I have a fresh ubuntu 12.04 installation. When i connect to my remote server i got errors like this: ~$ ssh example.com sudo aptitude upgrade ... Traceback (most recent call last): File "/usr/bin/apt-listchanges", line 33, in <module> from ALChacks import * File "/usr/share/apt-listchanges/ALChacks.py", line 32, in <module> sys.stderr.write(_("Can't set locale; make sure $LC_* and $LANG are correct!\n")) NameError: name '_' is not defined perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LC_TIME = "de_DE.UTF-8", LC_MONETARY = "de_DE.UTF-8", LC_ADDRESS = "de_DE.UTF-8", LC_TELEPHONE = "de_DE.UTF-8", LC_NAME = "de_DE.UTF-8", LC_MEASUREMENT = "de_DE.UTF-8", LC_IDENTIFICATION = "de_DE.UTF-8", LC_NUMERIC = "de_DE.UTF-8", LC_PAPER = "de_DE.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). locale: Cannot set LC_ALL to default locale: No such file or directory No packages will be installed, upgraded, or removed. 0 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded. Need to get 0 B of archives. After unpacking 0 B will be used. ... I don't have this problem when i connect from an older ubuntu installation. This is output from my ubuntu 12.04 installation, LANG and LANGUAGE are set $ locale LANG=de_DE.UTF-8 LANGUAGE=de_DE:en_GB:en LC_CTYPE="de_DE.UTF-8" LC_NUMERIC=de_DE.UTF-8 LC_TIME=de_DE.UTF-8 LC_COLLATE="de_DE.UTF-8" LC_MONETARY=de_DE.UTF-8 LC_MESSAGES="de_DE.UTF-8" LC_PAPER=de_DE.UTF-8 LC_NAME=de_DE.UTF-8 LC_ADDRESS=de_DE.UTF-8 LC_TELEPHONE=de_DE.UTF-8 LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=de_DE.UTF-8 LC_ALL= Does anybody know what has changed in ubuntu to get this error message on remote servers?

Read the article

Invalid byte 1 of 1-byte UTF-8 sequence

- by user275886

I have a MyFaces Facelets application, where the page coding is a bit rugged. Anyway, it's developed with Eclipse and built with Ant, and kindof runs ok in Tomcat 2.0.26. So far so good. Now, I'd rather build with Maven, so I made a couple of pom-files, opened them in Netbeans and built, and now I have a war file that deploys ok. However, on any facelet page it barfs out with com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence. at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684) at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:554) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742) So, I've tried a lot of different things, and the application actually run simple pages without facelet stuff. But, everything runs if I just build with Ant instead ... So my question is: What's the most likely difference between an ant build and a maven build that may cause this? It also seems that even though I've configured for UTF-8 in Netbeans and pom-files, Netbeans eventually ends up reporting the facelet files as ISO-8859-1 after some editing. I've made sure that most central libs are of same version (especially xerces 2.3.0), I've added an encoding servlet filter that had no effect. And, I'd rather fix the maven build and keep the buggy pages, than the other way around ... it's my intention to introduce Naven, not fix buggy pages.

Read the article

Does Perl's Net::Cassandra module support UTF-8?

- by knorv

I've run into a really strange UTF-8 problem with Net::Cassandra::Easy (which is built upon Net::Cassandra): UTF-8 strings written to Cassandra are garbled upon retrieval. The following code shows the problem: use strict; use utf8; use warnings; use Net::Cassandra::Easy; binmode(STDOUT, ":utf8"); my $key = "some_key"; my $column = "some_column"; my $set_value = "\x{2603}"; my $cassandra = Net::Cassandra::Easy->new(keyspace => "Keyspace1", server => "localhost"); $cassandra->connect(); $cassandra->mutate([$key], family => "Standard1", insertions => { $column => $set_value }); my $result = $cassandra->get([$key], family => "Standard1", standard => 1); my $get_value = $result->{$key}->{"Standard1"}->{$column}; if ($set_value eq $get_value) { # this is the path I want. print "OK: $set_value == $get_value\n"; } else { # this is the path I get. print "ERR: $set_value != $get_value\n"; } When running the code above $set_value eq $get_value evaluates to false. What am I doing wrong?

Read the article

UTF-8 BOM signature in PHP files

- by skidding

I was writing some commented PHP classes and I stumbled upon a problem. My name (for the @author tag) ends up with a ? (which is a UTF-8 character, ...and a strange name, I know). Even though I save the file as UTF-8, some friends reported that they see that character totally messed up (È™). This problem goes away by adding the BOM signature. But that thing troubles me a bit, since I don't know that much about it, except from what I saw on Wikipedia and on some other similar questions here on SO. I know that it adds some things at the beginning of the file, and from what I understood it's not that bad, but I'm concerned because the only problematic scenarios I read about involved PHP files. And since I'm writing PHP classes to share them, being 100% compatible is more important than having my name in the comments. But I'm trying to understand the implications, should I use it without worrying? or are there cases when it might cause damage? When? Thanks!

Read the article

Ruby : UTF-8 IO

- by subtenante

I use ruby 1.8.7. I try to parse some text files containing greek sentences, encoded in UTF-8. (I can't much paste here sample files, because they are subject to copyright. Really just some greek text encoded in UTF-8.) I want, for each file, to parse the file, extract all the words, and make a list of each new word found in this file. All that saved to one big index file. Here is my code : #!/usr/bin/ruby -KU def prepare_line(l) l.gsub(/^\s*[ST]\d+\s*:\s*|\s+$|$\d+$\s*/u, "") end def tokenize(l) l.split /['·.;!:\s]+/u end $dict = {} $cpt = 0 $out = File.new 'out.txt', 'w' def lesson(file) $cpt = $cpt + 1 file.readlines.each do |l| $out.puts l l = prepare_line l tokenize(l).each do |t| unless $dict[t] $dict[t] = $cpt $out.puts " #{t}\n" end end end end Dir.new('etc/').each do |filename| f = File.new("etc/#{filename}") unless File.directory? f lesson f end end Here is part of my output : ?@???†?†?????????? ?...[snip very long hangul/hanzi mishmash]... ????????†? ???N2 : ?e?te?? (2) µ???µa (Note that the puts l part seems to work fine, at the end of the given output line.) Any idea what is wrong with my code ? (General comments about ruby idioms I could use are very welcome, I'm really a beginner.)

Read the article

How to add UTF-8 support to my hard disk in fstab?

- by punkmexic

I use Ubuntu (Spanish language). Sometimes I get this error when I use special characters (codification error) so I read that if I edit a file of my hard disk by using gedit /etc/fstab and adding utf8 I can fix it.... I had this line: UUID=bfb5b95e-bf68-464a-8abf-d6027b039fa4 / ext4 errors=remount-ro 0 1 I adeed utf8 like this: UUID=bfb5b95e-bf68-464a-8abf-d6027b039fa4 / ext4 errors=remount-ro,iocharset=utf8 0 1 But I messed my Ubuntu and I can't log in now to my Ubuntu so im using live session... so I'll have to remove that code in order to be able to use my Ubuntu again. Can someone tell me how that line should look like?

Search Results

Search found 4604 results on 185 pages for 'utf 8'.

Page 4/185 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by kujawk

- by xenthral

- by Manos Dilaverakis

- by Ahmet vardar

- by Jen

- by Mohannad Otaibi

- by Matthieu

- by Pablo Fernandez

- by luckylak

- by StevieB

- by tixrus

- by user140314

- by BlaM

- by Andreas Gohr

- by teehoo

- by knorv

- by blahblah

- by mowgli

- by Artyom

- by Janning

- by user275886

- by knorv

- by skidding

- by subtenante

- by punkmexic

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >