Search Results

Search found 4604 results on 185 pages for 'utf 8'.

Page 8/185 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >

JMeter CSV Data Set is corrupting Japanese strings stored as proper UTF-8, I get Question Marks instead

- by Mark Bennett

I read in search terms from a simple text file to send to a search engine. It works fine in English, but gives me ???? for any Japanese text. Text with mixed English and Japanese does show the English text, so I know it's reading it. What I'm seeing: Input text: Snow Leopard ??????????????? Turns into: Snow Leopard ??????????????? This is in my POST field of an HTTP. If I set JMeter to encode the data, it just puts in the percent sequence for question marks. Interesting note: In the example above there are 15 Japanese characters, and then 15 question marks, so at some point it's being seen as full characters and not just bytes. About the Data: The CSV file is very simple in structure. There's only one field / one column, which I name TERM, and later use as ${TERM} I don't really need full CSV because it's only one string per line. There's no commas or quotes. When I run the Unix "file" command on the file, it says UTF-8 text. I've also verified it in command line and graphical mode on two machines. JMeter CSV Dataset Config: Filename: japanese-searches.csv File encoding: UTF-8 (also tried without) Variable names: TERM Delimiter: , Allow Quoted Data: False (I also tried True, different, but still wrong) Recycle at EOF: True Stop at EOF: False Staring mode: All threads A few things I've tried: Tried Allow quoted Data. It changed to other strange characters. -Dfile.encoding=UTF-8 Tried encoding the POST, but it just turned into a bunch of %nn for question marks And I'm not sure how "debug" just after the each line of the CSV is read in. I think it's corrupted right away, but I'm not sure. If it's only mangled when I reference it, then instead of ${TERM} perhaps there's some other "to bytes" function call. I'll start checking into that. I haven't done anything with the JMeter functions yet.

Read the article
Unable to convert file to UTF-8

- by antoniocs

I am on windows xp sp3 and I am trying to convert a file from ASCII to UTF-8. I use notepad++ to do this. I go to Encoding Convert to UTF-8 without BOM. I save the file, reopen and it is still on ASCII. I am using this file in a webpage and I need the file to be UTF-8, because I have strings in utf-8 and they am seeing little squares with ? on them.

Read the article
=?UTF-8?B??= in Emails sent via php mail problem

- by Camran

I have a website, and in the "Contact" section I have a form which users may fill in to contact me. The form is a simple form which action is a php page. The php code: $to = "[email protected]"; $name=$_POST['name']; // sender name $email=$_POST['email']; // sender email $tel= $_POST['tel']; // sender tel $subject=$_POST['subject']; // subject CHOSEN FROM DROPLIST, ALL TESTED $text=$_POST['text']; // Message from sender $text.="\n\nTel:".$tel; // Added to message to show me the telephone nr to the sender at bottom of message $headers="MIME-Version: 1.0"."\n"; $headers.="Content-type: text/plain; charset=UTF-8"."\n"; $headers.="From: $name <$email>"."\n"; mail($to, '=?UTF-8?B?'.base64_encode($subject).'?=', $text, $headers, '[email protected]'); Could somebody please tell me why this works most of the time, but sometimes I receive email whith no text and the subject line showing =?UTF-8?B??= I use outlook express, and I have read this http://stackoverflow.com/questions/454833/system-net-mail-and-utf-8bxxxxx-headers but it didn't help. The problem is not in Outlook, because when I log in to the actual mailprogram where I fetch the POP3 emails from, the email looks the same. When I right click in Outlook and chose "message source" then there is no "From" information. Ex, a good message should look like this: Subject: =?UTF-8?B?w5Z2cmlndA==?= MIME-Version: 1.0 Content-type: text/plain; charset=UTF-8 From: John Doe However, the ones with problem looks like this: Subject: =?UTF-8?B??= MIME-Version: 1.0 Content-type: text/plain; charset=UTF-8 From: As if the information has been lost somewhere. You should know also that I have a VPS, which I manage myself. I use postfix as an emailserver, if thats got anything to do with it. But then again, why does it work sometimes? Also another thing that I have noticed is that sometimes special characters are not shown correctly (by both Outlook and the webmail). For instance, the name "Björkman" in swedish is shown like BjÃ¶rkman, but again, only sometimes. I hope anybody knows something about this problem, because it is very hard to track down for me atleast. If you need more input let me know.

Read the article
How relevant is UTF-7 when it comes to parsing emails?

- by J. Pablo Fernández

I recently implemented incoming emails for an application and boy, did I open the gates of hell? Since then every other day an email arrives that makes the app fail in a different way. One of those things is emails encoded as UTF-7. Most emails come as ASCII, some of the Latin encodings, or thankfully, UTF-8. Hotmail error messages (like email address doesn't exist or quota exceeded) seem to come as UTF-7. Unfortunately, UTF-7 is not an encoding Ruby understands: > "hello world".encode("utf-8", "utf-7") Encoding::ConverterNotFoundError: code converter not found (UTF-7 to UTF-8) > Encoding::UTF_7 => #<Encoding:UTF-7 (dummy)> My application doesn't crash, it actually handles the email quite well, but it does send me a notification about the potential error. I spent some time googling and I can't find anyone that implemented the conversion, at least not as a Ruby 1.9.3 Encoding::Converter. So, my question is, since I never got an email with actual content, from an actual person, in UTF-7, how relevant is that encoding? can I safely ignore it?

Read the article
Get correct output from UTF-8 stored in VarChar using Entity Framework or Linq2SQL?

- by jasonpenny

Borland StarTeam seems to store its data as UTF-8 encoded data in VarChar fields. I have an ASP.NET MVC site that returns some custom HTML reports using the StarTeam database, and I would like to find a better solution for getting the correct data, for a rewrite with MVC2. I tried a few things with Encoding GetBytes and GetString, but I couldn't get it to work (we use mostly Delphi at work); then I figured out a T-SQL function to return a NVarChar from UTF-8 stored in a VarChar, and created new SQL views which return the data as NVarChar, but it's slow. The actual problem appears like this: â€œdescriptionâ€ instead of “description”, in both SSMS and in a webpage when using Linq2SQL Is there a way to get the proper data out of these fields using Entity Framework or Linq2SQL?

Read the article
Is there a list of language only character regions for UTF-8 somewhere?

- by Brehtt

I'm trying to analyze some UTF-8 encoded documents in a way that recognizes different language characters. For my approach to work I need to ignore non-language characters, such as control characters, mathematical symbols etc. Just trying to dissect the basic Latin section of the UTF standard has resulted in multiple regions, with characters like the division symbol being right in the middle of a range of valid Latin characters. Is there a list somewhere that identifies these regions? Or better yet, a Regex that defines the regions or something in C# that can identify the different characters?

Read the article
Why does anyone use an encoding other than UTF-8?

- by mostafa

I want to know why any developer would need to use an encoding other than UTF-8.

Read the article
Should I convert overlong UTF-8 strings to their shortest normal form?

- by Grant McLean

I've just been reworking my Encoding::FixLatin Perl module to handle overlong UTF-8 byte sequences and convert them to the shortest normal form. My question is quite simply "is this a bad idea"? A number of sources (including this RFC) suggest that any over-long UTF-8 should be treated as an error and rejected. They caution against "naive implementations" and leave me with the impression that these things are inherently unsafe. Since the whole purpose of my module is to clean up messy data files with mixed encodings and convert them to nice clean utf8, this seems like just one more thing I can clean up so the application layer doesn't have to deal with it. My code does not concern itself with any semantic meaning the resulting characters might have, it simply converts them into a normalised form. Am I missing something. Is there a hidden danger I haven't considered?

Read the article
How can I check if a binary string is UTF-8 in mysql?

- by Piotr Czapla

I've found a Perl regexp that can check if a string is UTF-8 (the regexp is from w3c site). $field =~ m/\A( [\x09\x0A\x0D\x20-\x7E] # ASCII | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte | \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates | \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 | [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 )*\z/x; But I'm not sure how to port it to MySQL as it seems that MySQL don't support hex representation of characters see this question. Any thoughts how to port the regexp to MySQL? Or maybe you know any other way to check if the string is valid UTF-8? UPDATE: I need this check working on the MySQL as I need to run it on the server to correct broken tables. I can't pass the data through a script as the database is around 1TB.

Read the article
Perl to output processed XML file encoded as UTF-8 with UNIX line endings (in Win32 environment)?

- by Umber Ferrule

Running ActiveState Perl 5.8.8 on WinXP. As the title suggests, I'd like to output an XML file as UTF-8 with UNIX line endings. I've looked at the PerlDoc for binmode, but am unsure of the exact syntax (if I'm not barking up the wrong tree). The following doesn't do it (forgive my Perl - it's a learning process!): sub SaveFile { my($FileName, $Contents) = @_; my $File = "SAVE"; unless( open($File, ">:utf-8 :unix", $FileName) ) { die("Cannot open $FileName"); } print $File @$Contents; close($File); } Any suggestions? Thanks.

Read the article
Why is conversion from UTF-8 to ISO-8859-1 not the same in Windows and Linux?

- by user1895307

I have the following in code to convert from UTF-8 to ISO-8859-1 in a jar file and when I execute this jar in Windows I get one result and in CentOS I get another. Might anyone know why? public static void main(String[] args) { try { String x = "Ã„, Ã¤, Ã‰, Ã©, Ã–, Ã¶, Ãœ, Ã¼, ÃŸ, Â«, Â»"; Charset utf8charset = Charset.forName("UTF-8"); Charset iso88591charset = Charset.forName("ISO-8859-1"); ByteBuffer inputBuffer = ByteBuffer.wrap(x.getBytes()); CharBuffer data = utf8charset.decode(inputBuffer); ByteBuffer outputBuffer = iso88591charset.encode(data); byte[] outputData = outputBuffer.array(); String z = new String(outputData); System.out.println(z); } catch(Exception e) { System.out.println(e.getMessage()); } } In Windows, java -jar test.jar test.txt creates a file containing: Ä, ä, É, é, Ö, ö, Ü, ü, ß, «, » but in CentOS I get: ??, ä, ??, é, ??, ö, ??, ü, ??, «, » Help please!

Read the article
How would you create a string of all UTF-8 characters? [PHP]

- by Xeoncross

There are many ways to represent the +1 million UTF-8 characters. Take the latin capital "A" with macron (A). This is unicode code point U+0100, hex number 0xc4 0x80, decimal number 196 128, and binary 11000100 10000000. I would like to create a collection of the first 65,535 UTF-8 characters for use in testing applications. These are all unicode characters up to code point U+FFFF (byte3). Is it possible to do something like a for($x=0) loop and then convert the resulting decimal to another base (like hex) which would allow the creation of the matching unicode character? I can create the value A using something like this: $char = "\xc4\x80"; // or $char = chr(196).chr(128); However, I am not sure how to turn this into an automated process. // fail! $char = "\x". dechex($a). "\x". dexhex($$b);

Read the article
Should I convert overly-long UTF-8 strings to their shortest normal form?

- by Grant McLean

I've just been reworking my Encoding::FixLatin Perl module to handle overly-long UTF-8 byte sequences and convert them to the shortest normal form. My question is quite simply "is this a bad idea"? A number of sources (including this RFC) suggest that any over-long UTF-8 should be treated as an error and rejected. They caution against "naive implementations" and leave me with the impression that these things are inherently unsafe. Since the whole purpose of my module is to clean up messy data files with mixed encodings and convert them to nice clean utf8, this seems like just one more thing I can clean up so the application layer doesn't have to deal with it. My code does not concern itself with any semantic meaning the resulting characters might have, it simply converts them into a normalised form. Am I missing something. Is there a hidden danger I haven't considered?

Read the article
What is difference between UTF-8 and HTML entities?

- by Ole Jak

What is difference between UTF-8 and HTML entities?

Read the article
UTF-8 bit representation

- by Yanick Rochon

I'm learning about UTF-8 standards and this is what I'm learning : Definition and bytes used UTF-8 binary representation Meaning 0xxxxxxx 1 byte for 1 à 7 bits chars 110xxxxx 10xxxxxx 2 bytes for 8 à 11 bits chars 1110xxxx 10xxxxxx 10xxxxxx 3 bytes for 12 à 16 bits chars 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 4 bytes for 17 à 21 bits chars And I'm wondering, why 2 bytes UTF-8 code is not 10xxxxxx instead, thus gaining 1 bit all the way up to 22 bits with a 4 bytes UTF-8 code? The way it is right now, 64 possible values are lost (from 1000000 to 10111111). I'm not trying to argue the standards, but I'm wondering why this is so? ** EDIT ** Even, why isn't it UTF-8 binary representation Meaning 0xxxxxxx 1 byte for 1 à 7 bits chars 110xxxxx xxxxxxxx 2 bytes for 8 à 13 bits chars 1110xxxx xxxxxxxx xxxxxxxx 3 bytes for 14 à 20 bits chars 11110xxx xxxxxxxx xxxxxxxx xxxxxxxx 4 bytes for 21 à 27 bits chars ...? Thanks!

Read the article
UTF-8 locale portability (and ssh)

- by kine

I spend a lot of my time sshed into various machines, all of which are different (some are embedded, some run Linux, some run BSD, &c.). On my own local machines, however, i use OS X, which of course has a userland based on FreeBSD. My locale on those machines is set to en_GB.UTF-8, which is one of the available options: % echo `sw_vers` ProductName: Mac OS X ProductVersion: 10.8.2 BuildVersion: 12C60 % locale -a | grep -i 'en_gb.utf' en_GB.UTF-8 Several of the more-capable Linux systems i use appear to have an equivalent option, but i note that on Linux the name is slightly different: % lsb_release -d Description: Debian GNU/Linux 6.0.3 (squeeze) % locale -a | grep -i 'en_gb.utf' en_GB.utf8 This makes me wonder: When i ssh into a Linux machine from my Mac, and it forwards all of my LC_* variables with that 'UTF-8' suffix, does that Linux machine even understand what is being asked of it? Or is it just falling back to some other locale? In either case, what is the mechanism behind its behaviour, and is it dependent on any particular set-up (e.g., will i see the same behaviour on a BusyBox-based system as on a GNU-based one)?

Read the article
Strange display language in gnome shell

- by khalafuf

I logged in gnome-shell, and found that the display language is set to some strange asian language (I think) without my prompt. I tried to change the locale settings but found that the default language is English (how?) despite of that strange language. Here's a snapshot, See the strange word instead of "Activity": I'm on Ubuntu 12.04 LTS. Output of locale: LANG=zh_CN.UTF-8 LANGUAGE=zh_CN:en_US:en LC_CTYPE="zh_CN.UTF-8" LC_NUMERIC=en_US.UTF-8 LC_TIME=en_US.UTF-8 LC_COLLATE="zh_CN.UTF-8" LC_MONETARY=en_US.UTF-8 LC_MESSAGES="zh_CN.UTF-8" LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_ALL= Output of locale -a: C C.UTF-8 de_CH.utf8 en_AG en_AG.utf8 en_AU.utf8 en_BW.utf8 en_CA.utf8 en_DK.utf8 en_GB.utf8 en_IE.utf8 en_IN en_IN.utf8 en_NG en_NG.utf8 en_NZ.utf8 en_PH.utf8 en_SG.utf8 en_US.utf8 en_ZA.utf8 en_ZM en_ZM.utf8 en_ZW.utf8 POSIX zh_CN.utf8 zh_SG.utf8 Solved: This answer did it.

Read the article
locale: Reset lost settings

- by Adam Matan

Due to some strange reason, I've lost some of my locale settings. I've managed to restore most of them using sudo dpkg-reconfigure locales: perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory So I'm stuck with one missing value: $ locale locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_ALL to default locale: No such file or directory LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= Any idea how to restore them all? Thanks, Adam

Read the article
Ubuntu 13.10 change first weekday to Monday in calendar applet

- by wonderingapple

Before the update (I was using 13.04), editing: sudo gedit /etc/default/locale so that LC_TIME="en_GB.UTF-8" does the job. However in 13.10, this does not work anymore. I've tried editing: sudo gedit /usr/share/i18n/locales/en_AU sudo gedit /usr/share/i18n/locales/en_GB sudo gedit /usr/share/i18n/locales/en_US so that first_weekday 2 in each of the files, but this also does not work. As a reference, when I run locale, the output is LANG=en_AU.UTF-8 LANGUAGE=en_AU:en LC_CTYPE="en_AU.UTF-8" LC_NUMERIC=en_AU.UTF-8 LC_TIME=en_AU.UTF-8 LC_COLLATE="en_AU.UTF-8" LC_MONETARY=en_AU.UTF-8 LC_MESSAGES="en_AU.UTF-8" LC_PAPER=en_AU.UTF-8 LC_NAME=en_AU.UTF-8 LC_ADDRESS=en_AU.UTF-8 LC_TELEPHONE=en_AU.UTF-8 LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=en_AU.UTF-8 LC_ALL= Please help.

Read the article
How do I configure encodings (UTF-8) for code executed by Quartz scheduled Jobs in Spring framework

- by Martin

I wonder how to configure Quartz scheduled job threads to reflect proper encoding. Code which otherwise executes fine within Springframework injection loaded webapps (java) will get encoding issues when run in threads scheduled by quartz. Is there anyone who can help me out? All source is compiled using maven2 with source and file encodings configured as UTF-8. In the quartz threads any string will have encoding errors if outside ISO 8859-1 characters: Example config <bean name="jobDetail" class="org.springframework.scheduling.quartz.JobDetailBean"> <property name="jobClass" value="example.ExampleJob" /> </bean> <bean id="jobTrigger" class="org.springframework.scheduling.quartz.SimpleTriggerBean"> <property name="jobDetail" ref="jobDetail" /> <property name="startDelay" value="1000" /> <property name="repeatCount" value="0" /> <property name="repeatInterval" value="1" /> </bean> <bean class="org.springframework.scheduling.quartz.SchedulerFactoryBean"> <property name="triggers"> <list> <ref bean="jobTrigger"/> </list> </property> </bean> Example implementation public class ExampleJob extends QuartzJobBean { private Log log = LogFactory.getLog(ExampleJob.class); protected void executeInternal(JobExecutionContext ctx) throws JobExecutionException { log.info("ÅÄÖ"); log.info(Charset.defaultCharset()); } } Example output 2010-05-20 17:04:38,285 1342 INFO [QuartzScheduler_Worker-9] ExampleJob - vÖvÑvñ 2010-05-20 17:04:38,286 1343 INFO [QuartzScheduler_Worker-9] ExampleJob - UTF-8 The same lines of code executed within spring injected beans referenced by servlets in the web-container will output proper encoding. What is it that make Quartz threads encoding dependent?

Read the article
Can I print UTF-8 encoded files from Linux command-line?

- by Ken

enscript doesn't support utf-8 and the only other suggestion I've seen is to use lpr: lpr -o document-format=text/utf8 file_to_print but that gives an "Unsupported format" error. (Ubuntu 9.04 / GNOME Terminal 2.26.0)

Read the article
How do I convert Windows 7 file-name encoding to UTF-8 for Ruby on Rails?

- by Reilly

Hi (Ive looked at the other questions - none seemed to quite fit my problem.) I have some file-names under Windows 7 that need to be translated in to MySQL database (UTF-8) with Ruby on Rails. An example file-name includes "íéó" in some kind of Windows 7 file-system encoding. Ive tried many combinations of gsub and ActiveSupport::Multibyte::Chars. Thanks for the help

Read the article
Any tool to convert bulk php files to UTF-8 without BOM?

- by Axel

Hi, i have a very large script which contains a lot of php files, so i need some windows tool or software which converts all those files into UTF-8 without BOM, i know this can be done with Notepad++ but you should convert each one. Thanks

Read the article
How to get UTF-8 working in java webapps?

- by kosoant

I need to get UTF-8 working in my Java webapp (servlets + JSP, no framework used) to support äöå etc. for regular Finnish text and Cyrillic alphabets like ??? for special cases. My setup is the following: Development encironment: Windows XP Production encironment: Debian Database used: MySQL 5.x Users mainly use Firefox2 but also Opera 9.x, FF3, IE7 and Google Chrome are used to access the site. How to achieve this?

Read the article
What is better for PHP developers - Unicode or UTF-8?

- by Ole Jak

What is better for PHP developers - Unicode or UTF-8? I am going to create an international CMS. So I am going to have clients all over the world. They will speak all possible languages. What encoding format is better for browser recognition and for DB data storage?

Read the article

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15 | Next Page >