Search Results

Search found 1649 results on 66 pages for 'unicode normalization'.

Page 9/66 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16  | Next Page >

  • Beautiful Soup Unicode encode error

    - by iamrohitbanga
    I am trying the following code with a particular HTML file from BeautifulSoup import BeautifulSoup import re import codecs import sys f = open('test1.html') html = f.read() soup = BeautifulSoup(html) body = soup.body.contents para = soup.findAll('p') print str(para).encode('utf-8') I get the following error: UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 9: ordinal not in range(128) How do I debug this?

    Read the article

  • PHP-GD: Dealing with Unicode characters

    - by sehugg
    I am developing a web service that renders characters using the PHP GD extension, using a user-selected TTF font. This works fine in ASCII-land, but there are a few problems: The string to be rendered comes in as UTF-8. I would like to limit the list of user-selectable fonts to be only those which can render the string properly, as some fonts only have glyphs for ASCII characters, ISO 8601, etc. In the case where some decorative characters are included, it would be fine to render the majority of characters in the selected font and render the decorative characters in Arial (or whatever font contains the extended glyphs). It does not seem like PHP-GD has support for querying the font metadata sufficiently to figure out if a character can be rendered in a given font. What is a good way to get font metrics into PHP? Is there a command-line utility that can dump in XML or other parsable format?

    Read the article

  • Excel 2007 and Unicode

    - by pjlasl
    I have an israeli spreadsheet reading right to left. When I read the values (using VBA) it places a question mark (?) at the beginning and end of the text, in other words it wraps the text with the question mark (ie ?0123456?). If you type Range("A2").value or .value2 or .text the results are the same. Any idea on how to prevent this?

    Read the article

  • cgi.FieldStorage translating unicode strangely

    - by trydyingtolive
    I have a form that is on a UTF-8 encoded page. When I submit the form cgi.FieldStorage converts any non-ascii character to an odd format. For example if I submit the value c. The browser will send %c4%87. I want to convert that to the string \xc4\x87. However, cgi.FieldStorage is converting it to \\xc4\\x87. post = cgi.FieldStorage(fp=env['wsgi.input'], environ=env, keep_blank_values=True) Python 2.6 on Ubuntu 9.10SE, Apache2, mod_wsgi.

    Read the article

  • Cross platform unicode path handling

    - by Matt Joiner
    I'm using boost::filesystem for cross-platform path manipulation, but this breaks down when calls need to be made down into interfaces I don't control that won't accept UTF-8. For example when using the Windows API, I need to convert to UTF-16, and then call the wide-string version of whatever function I was about to call, and then convert any output back to UTF-8. While the wpath, and other w* forms of many of the boost::filesystem functions help keep sanity, are there any suggestions for how best to handle this conversion to wide-string forms where needed, while maintaining consistency in my own code?

    Read the article

  • Need unicode characters in UITableView from SQLlite database

    - by Lee Armstrong
    I have some NSString varibales that incude items like Ð and Õ and if I do cell.textLabel.text = person.name; and if it contains one of those characters the cell.textlabel is blank! I have discovered that if I use NSString *col1 = [NSString stringWithUTF8String:(char *)sqlite3_column_text(compiledStatement, 0)]; To pull my data back it pulls back null, however using the deprectared method NSString *col1 = [NSString stringWithCString:(char *)sqlite3_column_text(compiledStatement, 0)]; Shows the characters! Any ideas?

    Read the article

  • Perl Unicode glitch

    - by RedGrittyBrick
    In this output, why am I getting extra newlines between lines b&c and d&e? a: ....v....1....v... (a) b: 'Budejovický Budvar' length 18 (b) c: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 (c) d: B u d e j o v i c k ý B u d v a r (d) e: 42 75 64 11b 6a 6f 76 69 63 6b fd 20 42 75 64 76 61 72 (e) from this program #!perl use strict; use warnings; binmode (STDOUT, "encoding(UTF-8)"); # so no "Wide characater in print" warning print "\n"; my $r = "Bud\N{U+011B}jovick\N{U+00FD} Budvar"; print "a: ....v....1....v... (a)\n"; print "b: '$r' length ", length($r)," (b)\n"; print "c:"; printf "%4d",$_ for (1..18); print " (c)\n"; print "d: "; print join(" ", split("", $r)); print " (d)\n"; print "e: "; printf "%*v3x", " ", $r; print " (e)\n";

    Read the article

  • File.open with ruby on windows with a unicode filename

    - by aussiegeek
    I have a script running on Ruby 1.9.1 on Windows 7 I've distilled my script down to File.open("????.txt") and still can't get it to work. I know there are issues with Ruby 1.9 filename handling on windows (Using the Windows ANSI library), but would be happy enough with a work around that is callable from Ruby

    Read the article

  • How do I correctly decode unicode parameters passed to a servlet

    - by Grant Wagner
    Suppose I have: <a href="http://www.yahoo.com/" target="_yahoo" title="Yahoo!&#8482;" onclick="return gateway(this);">Yahoo!</a> <script type="text/javascript"> function gateway(lnk) { window.open(SERVLET + '?external_link=' + encodeURIComponent(lnk.href) + '&external_target=' + encodeURIComponent(lnk.target) + '&external_title=' + encodeURIComponent(lnk.title)); return false; } </script> I have confirmed external_title gets encoded as Yahoo!%E2%84%A2 and passed to SERVLET. If in SERVLET I do: Writer writer = response.getWriter(); writer.write(request.getParameter("external_title")); I get Yahoo!â„¢ in the browser. If I manually switch the browser character encoding to UTF-8, it changes to Yahoo!TM (which is what I want). So I figured the encoding I was sending to the browser was wrong (it was Content-type: text/html; charset=ISO-8859-1). I changed SERVLET to: response.setContentType("text/html; charset=utf-8"); Writer writer = response.getWriter(); writer.write(request.getParameter("external_title")); Now the browser character encoding is UTF-8, but it outputs Yahoo!â?¢ and I can't get the browser to render the correct character at all. My question is: is there some combination of Content-type and/or new String(request.getParameter("external_title").getBytes(), "UTF-8"); and/or something else that will result in Yahoo!TM appearing in the SERVLET output?

    Read the article

  • utf8 and unicode getting warning messages in mysql

    - by BufordTaylor
    I have a mysql table. When I try to insert, I get this: Warning: Incorrect string value: '\xAE</...' for column 'value' at row 1 mysql> show create table Configurations; | Configurations | CREATE TABLE `Configurations` ( `id` int(11) NOT NULL AUTO_INCREMENT, `title` varchar(255) NOT NULL, `ckey` varchar(255) NOT NULL, `value` mediumtext, PRIMARY KEY (`id`), KEY `ckey` (`ckey`), ) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=utf8 | mysql> SHOW VARIABLES LIKE 'coll%'; +----------------------+-----------------+ | Variable_name | Value | +----------------------+-----------------+ | collation_connection | utf8_general_ci | | collation_database | utf8_general_ci | | collation_server | utf8_general_ci | +----------------------+-----------------+ I googled the hell out of the error, and it all seemed to boil down to utf8 being set as my default character set. I've been like that for a while. I'm not sure what else to do. Help?

    Read the article

  • php mysql flex unicode

    - by JonoB
    I have a problem with saving the £ symbol to a mysql database. I am running a flex front end, with a php + mysql backend When I save a record from flex, the string gets sent to the server as "This amount is £10" php views the string as above, and when it gets saved into the DB, it gets saved as "This amount is £10". My understanding is that this is correct based on MySQL or PHP is appending a  whenever the £ is used I now retrieve the above record, and it gets sent to flex as "This amount is £10". Flex correctly displays this in a textarea as "This amount is £10" I change another field in the same record in flex, and re-save the transaction. The string now gets sent to the server as "This amount is £10" The record is now saved into the DB as "The amount is £10". Each time the record is re-saved, this effect snowballs. Thanks for any advice you can give.

    Read the article

  • Weird error using preg_match and unicode

    - by Thorpe Obazee
    if (preg_match('(\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+)', '2010/02/14/this-is-something')) { // do stuff } The above code works. However this one doesn't. if (preg_match('/\p{Nd}{4}/\p{Nd}{2}/\p{Nd}{2}/\p{L}+/u', '2010/02/14/this-is-something')) { // do stuff } Maybe someone could shed some light as to why the one below doesn't work. This is the error that is being produced: A PHP Error was encountered Severity: Warning Message: preg_match() [function.preg-match]: Unknown modifier '\'

    Read the article

  • Apache htdocs in folder with unicode name

    - by Zsolti
    I have my apache (for windows) htdocs in a folder like c:\anything1\????\anything2. The problem is that in this case php won't execute any scripts from here and will display an error message like this: `Warning: Unknown: failed to open stream: No such file or directory in Unknown on line 0 Fatal error: Unknown: Failed opening required 'c:/anything1/????/anything2/index.php' (include_path='.;C:\php5\pear') in Unknown on line 0 ` If I try to open a html file, it is served by apache, so it seems that the problem appears only with php. Do you have an idea how to solve this?

    Read the article

  • Unicode string turns garbage at serverside.

    - by this. __curious_geek
    I have a situation. I have a label in ASP.NET 2.0(C#). The label should display a dutch language text that is "Sähköpostiosoite", I tried setting the Label.Text both from markup and code-behind but what I see in the browser response is "Sähköpostiosoite". Originally assigned string "Sähköpostiosoite" get replaced with "Sähköpostiosoite". I have no idea why this happens can you please help me diagnose the problem ??

    Read the article

  • Flex 3 - Full unicode support fonts and CSS

    - by BS_C3
    Hi! I'm developping a web application that will be used either in Europe or in Asia (specially Japan -Hiragana, Kanji and Katana-, China and Korea). I'm using the following fonts: - ericssonga628.TTF - HelveticaNeueLTStd-Lt.otf - HelveticaNeueLTStd-LtEx.otf - HelveticaNeueLTStd-Bd.otf - HelveticaNeueLTStd-BdEx.otf When I tried to display Japanese characters, I don't get anything. I guess these fonts don't support East Asian characters... Do you know of any equivalent fonts? Also, I was thinking of creating a CSS for each language (or pack of languages) when the user changes the display language. For example, if the user selects "japanese", I'll use the japanese stylesheet. However, how do I switch from a CSS to another? Thanks in advance for your answers. Regards,

    Read the article

  • reading unicode

    - by user121196
    I'm using java io to retrieve text from a server that might output character such as é. then output it using System.err, they turn out to be '?'. I am using UTF8 encoding. what's wrong? int len=0; char[]buffer=new char[1024]; OutputStream os = sock.getOutputStream(); InputStream is = sock.getInputStream(); os.write(query.getBytes("UTF8"));//iso8859_1")); Reader reader = new InputStreamReader(is, Charset.forName("UTF-8")); do{ len = reader.read(buffer); if (len0) { if(outstring==null)outstring=new StringBuffer(); outstring.append(buffer,0,len); } }while(len0); System.err.println(outstring);

    Read the article

  • Unicode escaping in C/C++

    - by Geo
    Hi guys! I'm having a dispute with a colleague of mine. She says that the following: char* a = "\x000aaxz"; will/can be seen by the compiler as "\x000aa". I do not agree with her, as I think you can have a maximum number of 4 hex characters after the \x. Can you have more than 4 hex chars? Who is right here?

    Read the article

  • Unicode filenames on windows in ruby

    - by delivarator
    I have a piece of code that looks like this: Dir.new(path).each do |entry| puts entry end The problem comes when I have a file named ???????.txt in the directory that I list. On a Windows 7 machine I get the output: ???????.txt From googling around, properly reading this filename on windows seems to be an impossible task. Any suggestions?

    Read the article

  • Joomla 1.5 & Indic Unicode Fonts - How-to?

    - by Ganesh
    I am using Inscript Keyboard to directly type into TinyMCE. However when I click on save, all the characters appear as question marks on website and even in article list on admin side. How I should solve the problem? I am specifically talking about Marathi but the problem-solution might be same for all Devnagrari fonts. Thanks in advance.

    Read the article

  • How to do proper Unicode and ANSI output redirection on cmd.exe?

    - by Sorin Sbarnea
    If you are doing automation on windows and you are redirecting the output of different commands (internal cmd.exe or external, you'll discover that your log files contains combined Unicode and ANSI output (meaning that they are invalid and will not load well in viewers/editors). Is it is possible to make cmd.exe work with UTF-8? This question is not about display, s about stdin/stdout/stderr redirection and Unicode. I am looking for a solution that would allow you to: redirect the output of the internal commands to a file using UTF-8 redirect output of external commands supporting Unicode to the files but encoded as UTF-8. If it is impossible to obtain this kind of consistence using batch files, is there another way of solving this problem, like using python scripting for this? In this case, I would like to know if it is possible to do the Unicode detection alone (user using the scripting should not remember if the called tools will output Unicode or not, it will just expect to convert the output to UTF-8. For simplicity we'll assume that if the tool output is not-Unicode it will be considered as UTF-8 (no codepage conversion).

    Read the article

  • Delphi 2009 dbExpress and Interbase: Unicode migration steps and risks?

    - by mjustin
    Currently, our database uses Win1252 as the only character encoding. We will have to support Unicode in the database tables soon, which means we have to perform this migration for four databases and around 80 Delphi applications which run in-house in a 24/7 environment. Are there recommendations for database migrations to UTF-8 (or UNICODE_FSS) for Delphi applications? Some questions listed below. Many thanks in advance for your answers! are there tools which help with the migration of the existing databases (sizes between 250 MB and 2 GB, no Blob fields), by dumping the data, recreating the database with UNICODE_FSS or UTF-8, and loading the data back? are there known problems with Delphi 2009, dbExpress and Interbase 7.5 related to Unicode character sets? would you recommend to upgrade the databases to Interbase 2009 first? (This upgrade is planned but does not have a high priority) can we simply migrate the database and Delphi will handle the Unicode character sets automatically, or will we have to change all character field types in every Datamodule (dfm and source code) too? which strategy would you recommend to work on the migration in parallel with the normal development and maintenance of the existing application? The application runs in-house so development and database administration is done internally. Update: one problem I found now is that there are two different persistent field types for Unicode and non Unicode character fields. For the existing database, dbExpress creates TStringField objects. For the Unicode database fields, dbExpress creates (or expects!) TWideStringField objects. This looks like a lot of work lies ahead. While we could try to avoid persistent fields (and add calculated fields at run time), Of course we would prefer a solution which does not require so many changes in existing units and DFM files.

    Read the article

  • Where can I find a useful multi-language Unicode font for Mac OS X?

    - by Stephen Jennings
    On every browser I've tried (Firefox, Safari, Chrome, and Omniweb), when I go to a web page containing somewhat less-common characters, I can't see the glyphs. For example, on the Wikipedia page for the Bengali Language, the very first line contains a string of squares; on Windows, I can see the Bengali writing. Firefox does display code points on the Coptic Language article, but not Bengali. I'm not sure why. On Windows, as long as I have the Arial Unicode MS font installed, these characters fall back to that font and display properly. Mac OS X doesn't seem to ship with a font containing these Unicode characters (it has Arial Unicode MS, but it must be a subset of the Windows version because Bengali doesn't display in that font). I checked on my Snow Leopard DVD and I installed "Additional Fonts" from the Optional Installs package, but I'm still missing many languages. Is there any good, free font that contains a large collection of languages? I know creating fonts is difficult and time-consuming, but it seems like including at least one font like this with operating systems should be standard by now.

    Read the article

< Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16  | Next Page >