Search Results

Search found 36172 results on 1447 pages for 'unicode string'.

Page 10/1447 | < Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17  | Next Page >

  • Python - pyparsing unicode characters

    - by mgj
    Hi..:) I tried using w = Word(printables), but it isn't working. How should I give the spec for this. 'w' is meant to process Hindi characters (UTF-8) The code specifies the grammar and parses accordingly. 671.assess :: ????? ::2 x=number + "." + src + "::" + w + "::" + number + "." + number If there is only english characters it is working so the code is correct for the ascii format but the code is not working for the unicode format. I mean that the code works when we have something of the form 671.assess :: ahsaas ::2 i.e. it parses words in the english format, but I am not sure how to parse and then print characters in the unicode format. I need this for English Hindi word alignment for purpose. The python code looks like this: # -*- coding: utf-8 -*- from pyparsing import Literal, Word, Optional, nums, alphas, ZeroOrMore, printables , Group , alphas8bit , # grammar src = Word(printables) trans = Word(printables) number = Word(nums) x=number + "." + src + "::" + trans + "::" + number + "." + number #parsing for eng-dict efiledata = open('b1aop_or_not_word.txt').read() eresults = x.parseString(efiledata) edict1 = {} edict2 = {} counter=0 xx=list() for result in eresults: trans=""#translation string ew=""#english word xx=result[0] ew=xx[2] trans=xx[4] edict1 = { ew:trans } edict2.update(edict1) print len(edict2) #no of entries in the english dictionary print "edict2 has been created" print "english dictionary" , edict2 #parsing for hin-dict hfiledata = open('b1aop_or_not_word.txt').read() hresults = x.scanString(hfiledata) hdict1 = {} hdict2 = {} counter=0 for result in hresults: trans=""#translation string hw=""#hin word xx=result[0] hw=xx[2] trans=xx[4] #print trans hdict1 = { trans:hw } hdict2.update(hdict1) print len(hdict2) #no of entries in the hindi dictionary print"hdict2 has been created" print "hindi dictionary" , hdict2 ''' ####################################################################################################################### def translate(d, ow, hinlist): if ow in d.keys():#ow=old word d=dict print ow , "exists in the dictionary keys" transes = d[ow] transes = transes.split() print "possible transes for" , ow , " = ", transes for word in transes: if word in hinlist: print "trans for" , ow , " = ", word return word return None else: print ow , "absent" return None f = open('bidir','w') #lines = ["'\ #5# 10 # and better performance in business in turn benefits consumers . # 0 0 0 0 0 0 0 0 0 0 \ #5# 11 # vHyaapaar mEmn bEhtr kaam upbhOkHtaaomn kE lIe laabhpHrdd hOtaa hAI . # 0 0 0 0 0 0 0 0 0 0 0 \ #'"] data=open('bi_full_2','rb').read() lines = data.split('!@#$%') loc=0 for line in lines: eng, hin = [subline.split(' # ') for subline in line.strip('\n').split('\n')] for transdict, source, dest in [(edict2, eng, hin), (hdict2, hin, eng)]: sourcethings = source[2].split() for word in source[1].split(): tl = dest[1].split() otherword = translate(transdict, word, tl) loc = source[1].split().index(word) if otherword is not None: otherword = otherword.strip() print word, ' <-> ', otherword, 'meaning=good' if otherword in dest[1].split(): print word, ' <-> ', otherword, 'trans=good' sourcethings[loc] = str( dest[1].split().index(otherword) + 1) source[2] = ' '.join(sourcethings) eng = ' # '.join(eng) hin = ' # '.join(hin) f.write(eng+'\n'+hin+'\n\n\n') f.close() ''' if an example input sentence for the source file is: 1# 5 # modern markets : confident consumers # 0 0 0 0 0 1# 6 # AddhUnIk baajaar : AshHvsHt upbhOkHtaa . # 0 0 0 0 0 0 !@#$% the ouptut would look like this :- 1# 5 # modern markets : confident consumers # 1 2 3 4 5 1# 6 # AddhUnIk baajaar : AshHvsHt upbhOkHtaa . # 1 2 3 4 5 0 !@#$% Output Explanation:- This achieves bidirectional alignment. It means the first word of english 'modern' maps to the first word of hindi 'AddhUnIk' and vice versa. Here even characters are take as words as they also are an integral part of bidirectional mapping. Thus if you observe the hindi WORD '.' has a null alignment and it maps to nothing with respect to the English sentence as it doesn't have a full stop. The 3rd line int the output basically represents a delimiter when we are working for a number of sentences for which your trying to achieve bidirectional mapping. What modification should i make for it to work if the I have the hindi sentences in Unicode(UTF-8) format.

    Read the article

  • Recursive String Function (Java)

    - by Jake Brooks
    Hi, I am trying to design a function that essentially does as follows: String s = "BLAH"; store the following to an array: blah lah bah blh bla bl ba bh ah al So basically what I did there was subtract each letter from it one at a time. Then subtract a combination of two letters at a time, until there's 2 characters remaining. Store each of these generations in an array. Hopefully this makes sense, Jake

    Read the article

  • Problems removing a " from a string

    - by Graeme
    Hi, I have a string that ends with a " (quotation mark) that I want to get rid of. However, because XCode usually requires you to enter the text you wish to remove using stringByReplacingOccurrencesOfString in @"texttoremove" format, you can't use the quotation marks in the space as it thinks you are closing the text. Any ideas on how I can do it? Thanks.

    Read the article

  • Filter string in C

    - by Paul Tarjan
    How can I filter a string in c? I want to remove anything that isn't [a-z0-9_]. int main(int argc, char ** argv) { char* name = argv[1]; // remove anything that isn't [a-z0-9_] printf("%s", name); }

    Read the article

  • Best way to code this, string to map conversion in Groovy

    - by Daxon
    I have a string like def data = "session=234567893egshdjchasd&userId=12345673456&timeOut=1800000" I want to convert it to a map ["session", 234567893egshdjchasd] ["userId", 12345673456] ["timeout", 1800000] This is the current way I am doing it, def map = [:] data.splitEachLine("&"){ it.each{ x -> def object = x.split("=") map.put(object[0], object[1]) } } It works, but is there a more efficient way?

    Read the article

  • String Parsing in C#

    - by Betamoo
    What is the most efficient way to parse a C# string in the form of "(params (abc 1.3)(sdc 2.0)....)" into a struct in the form struct Params { double abc,sdc....; } Thanks EDIT The structure always have the same parameters (number and names).. but the order is not granted..

    Read the article

  • python - remove string from words in an array

    - by tekknolagi
    #!/usr/bin/python #this looks for words in dictionary that begin with 'in' and the suffix is a real word wordlist = [line.strip() for line in open('/usr/share/dict/words')] newlist = [] for word in wordlist: if word.startswith("in"): newlist.append(word) for word in newlist: word = word.split('in') print newlist how would I get the program to remove the string "in" from all the words that it starts with? right now it does not work

    Read the article

  • finding and returning a string with a specified prefix

    - by tipu
    I am close but I am not sure what to do with the restuling match object. If I do p = re.search('[/@.* /]', str) I'll get any words that start with @ and end up with a space. This is what I want. However this returns a Match object that I dont' know what to do with. What's the most computationally efficient way of finding and returning a string which is prefixed with a @? For example, "Hi there @guy" After doing the proper calculations, I would be returned guy

    Read the article

  • Split a long JSON string into lines in Ruby

    - by David J.
    First, the background: I'm writing a Ruby app that uses SendGrid to send mass emails. SendGrid uses a custom email header (in JSON format) to set recipients, values to substitute, etc. SendGrid's documentation recommends splitting up the header so that the lines are shorter than 1,000 bytes. My question, then, is this: given a long JSON string, how can I split it into lines < 1,000 so that lines are split at appropriate places (i.e., after a comma) rather than in the middle of a word? This is probably unnecessary, but here's an example of the sort of string I'd like to split: X-SMTPAPI: {"sub": {"pet": ["dog", "cat"]}, "to": ["[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]", "[email protected]"]} Thanks in advance for any help you can provide!

    Read the article

  • String.IsNullOrWhiteSpace

    - by Scott Dorman
    An empty string is different than an unassigned string variable (which is null), and is a string containing no characters between the quotes (""). The .NET Framework provides String.Empty to represent an empty string, and there is no practical difference between ("") and String.Empty. One of the most common string comparisons to perform is to determine if a string variable is equal to an empty string. The fastest and simplest way to determine if a string is empty is to test if the Length property is equal to 0. However, since strings are reference types it is possible for a string variable to be null, which would result in a runtime error when you tried to access the Length property. Since testing to determine if a string is empty is such a common occurrence, the .NET Framework provides the static method String.IsNullOrEmpty method: public static bool IsNullOrEmpty(string value) { if (value != null) { return (value.Length == 0); }   return true; } It is also very common to determine if a string is empty and contains more than just whitespace characters. For example, String.IsNullOrEmpty("   ") would return false, since this string is actually made up of three whitespace characters. In some cases, this may be acceptable, but in many others it is not. TO help simplify testing this scenario, the .NET Framework 4 introduces the String.IsNullOrWhiteSpace method: public static bool IsNullOrWhiteSpace(string value) { if (value != null) { for (int i = 0; i < value.Length; i++) { if (!char.IsWhiteSpace(value[i])) { return false; } } } return true; }   Using either String.IsNullOrEmpty or String.IsNullOrWhiteSpace helps ensure correctness, readability, and consistency, so they should be used in all situations where you need to determine if a string is null, empty, or contains only whitespace characters. Technorati Tags: .NET,C# 4

    Read the article

  • problem using getline with a unicode file

    - by hamishmcn
    UPDATE: Thank you to @Potatoswatter and @Jonathan Leffler for comments - rather embarrassingly I was caught out by the debugger tool tip not showing the value of a wstring correctly - however it still isn't quite working for me and I have updated the question below: If I have a small multibyte file I want to read into a string I use the following trick - I use getline with a delimeter of '\0' e.g. std::string contents_utf8; std::ifstream inf1("utf8.txt"); getline(inf1, contents_utf8, '\0'); This reads in the entire file including newlines. However if I try to do the same thing with a wide character file it doesn't work - my wstring only reads to the the first line. std::wstring contents_wide; std::wifstream inf2(L"ucs2-be.txt"); getline( inf2, contents_wide, wchar_t(0) ); //doesn't work For example my if unicode file contains the chars A and B seperated by CRLF, the hex looks like this: FE FF 00 41 00 0D 00 0A 00 42 Based on the fact that with a multibyte file getline with '\0' reads the entire file I believed that getline( inf2, contents_wide, wchar_t(0) ) should read in the entire unicode file. However it doesn't - with the example above my wide string would contain the following two wchar_ts: FF FF (If I remove the wchar_t(0) it reads in the first line as expected (ie FE FF 00 41 00 0D 00) Why doesn't wchar_t(0) work as a delimiting wchar_t of "00 00"? Thank you

    Read the article

  • Allowed unicode characters in IDN host labels

    - by Roland Franssen
    Hi all, Im currently working on a "proper" URI validator and currently it all comes down to hostname validation, the rest isnt that tricky. Im stuck at IDN hostname labels (e.g. containing unicode; possible punycode encoded strings have been decoded at this point). My first idea was basicly a regex for TLD's not supporting IDN and one for those who do (http://www.mozilla.org/projects/security/tld-idn-policy-list.html (?)). Respectively; ^[a-zA-Z0-9-]+$ and ^[a-zA-Z0-9-\p{L}]+$ However this is not an ideal situation, since every IDN registrar can decide which characters to allow and which not. What im looking for is a proper, consistent, up2date data table of unicode characters allowed in various TLD's; im getting this idea i have to find all the data myself at russian and chinese registry sites (which is quite difficult). So before spitting down the web.. i wondered is there such a list? Or are there better approaches, best/common practices etc? (I want the validation to be as strict as possible.) Any help is welcome! // Roland

    Read the article

  • iPhone app rejection for using ICU (Unicode extensions)

    - by nickbit
    I received the following mail form Apple, considering my application: *Thank you for submitting your update to ??µ??es?a to the App Store. During our review of your application we found it is using private APIs, which is in violation of the iPhone Developer Program License Agreement section 3.3.1; "3.3.1 Applications may only use Documented APIs in the manner prescribed by Apple and must not use or call any private APIs." While your application has not been rejected, it would be appropriate to resolve this issue in your next update. The following non-public APIs are included in your application: u_isspace ubrk_close ubrk_current ubrk_first ubrk_next ubrk_open If you have defined methods in your source code with the same names as the above mentioned APIs, we suggest altering your method names so that they no longer collide with Apple's private APIs to avoid your application being flagged with future submissions. Please resolve this issue in your next update to ??µ??es?a. Sincerely, iPhone App Review Team* The functions mentioned in this mail are used in the ICU library (International Components for Unicode). Although my app is not rejected at this point, I don't feel very secure for the future of my app, because it relies heavily on the Unicode protocol and on this components in particular. Another thing is that I do not call these functions directly, but they are called by a custom 'sqlite' build (with FTS3 extensions enabled). Am I missing something here? Any suggestions?

    Read the article

  • latin1/unicode conversion problem with ajax request and special characters

    - by mfn
    Server is PHP5 and HTML charset is latin1 (iso-8859-1). With regular form POST requests, there's no problem with "special" characters like the em dash (–) for example. Although I don't know for sure, it works. Probably because there exists a representable character for the browser at char code 150 (which is what I see in PHP on the server for a literal em dash with ord). Now our application also provides some kind of preview mechanism via ajax: the text is sent to the server and a complete HTML for a preview is sent back. However, the ordinary char code 150 em dash character when sent via ajax (tested with GET and POST) mutates into something more: %E2%80%93. I see this already in the apache log. According to various sources I found, e.g. http://www.tachyonsoft.com/uc0020.htm , this is the UTF8 byte representation of em dash and my current knowledge is that JavaScript handles everything in Unicode. However within my app, I need everything in latin1. Simply said: just like a regular POST request would have given me that em dash as char code 150, I would need that for the translated UTF8 representation too. That's were I'm failing, because with PHP on the server when I try to decode it with either utf8_decode(...) or iconv('UTF-8', 'iso-8859-1', ...) but in both cases I get a regular ? representing this character (and iconv also throws me a notice: Detected an illegal character in input string ). My goal is to find an automated solution, but maybe I'm trying to be überclever in this case? I've found other people simply doing manual replacing with a predefined input/output set; but that would always give me the feeling I could loose characters. The observant reader will note that I'm behind on understanding the full impact/complexity with things about Unicode and conversion of chars and I definitely prefer to understand the thing as a whole then a simply manual mapping. thanks

    Read the article

  • Reading Unicode files line by line C++

    - by Roger Nelson
    What is the correct way to read Unicode files line by line in C++? I am trying to read a file saved as Unicode (LE) by Windows Notepad. Suppose the file contains simply the characters A and B on separate lines. In reading the file byte by byte, I see the following byte sequence (hex) : FE FF 41 00 0D 00 0A 00 42 00 0D 00 0A 00 So 2 byte BOM, 2 byte 'A', 2byte CR , 2byte LF, 2 byte 'B', 2 byte CR, 2 byte LF . I tried reading the text file using the following code: std::wifstream file("test.txt"); file.seekg(2); // skip BOM std::wstring A_line; std::wstring B_line; getline(file,A_line); // I get "A" getline(file,B_line); // I get "\0B" I get the same results using operator instead of getline file >> A_line; file >> B_line; It appears that the single byte CR character is is being consumed only as the single byte. or CR NULL LF is being consumed but not the high byte NULL. I would expect wifstream in text mode would read the 2byte CR and 2byte LF. What am I doing wrong? It does not seem right that one should have to read a text file byte by byte in binary mode just to parse the new lines.

    Read the article

  • Using JavaMail to send a mail containing Unicode characters

    - by NoozNooz42
    I'm successfully sending emails through GMail's SMTP servers using the following piece of code: Properties props = new Properties(); props.put("mail.smtp.host", "smtp.gmail.com"); props.put("mail.smtp.socketFactory.port", "465"); props.put("mail.smtp.socketFactory.class","javax.net.ssl.SSLSocketFactory"); props.put("mail.smtp.auth", "true"); props.put("mail.smtp.port", "465"); props.put("mail.smtp.ssl", "true"); props.put("mail.smtp.starttls.enable","true"); props.put("mail.smtp.timeout", "5000"); props.put("mail.smtp.connectiontimeout", "5000"); // Do NOT use Session.getDefaultInstance but Session.getInstance // See: http://forums.sun.com/thread.jspa?threadID=5301696 final Session session = Session.getInstance( props, new javax.mail.Authenticator() { protected PasswordAuthentication getPasswordAuthentication() { return new PasswordAuthentication( USER, PWD ); } }); try { final Message message = new MimeMessage(session); message.setFrom( new InternetAddress( USER ) ); message.setRecipients( Message.RecipientType.TO, InternetAddress.parse( TO ) ); message.setSubject( emailSubject ); message.setText( emailContent ); Transport.send(message); emailSent = true; } catch ( final MessagingException e ) { e.printStackTrace(); } where emailContent is a String that does contain Unicode characters (like the euro symbol). When the email arrives (in another GMail account), the euro symbol has been converted to the ASCII '?' question mark. I don't know much about emails: can email use any character encoding? What should I modify in the code above so that an encoding allowing Unicode characters is used?

    Read the article

  • Cross-platform iteration of Unicode string

    - by kizzx2
    I want to iterate each character of a Unicode string, treating each surrogate pair and combining character sequence as a single unit (one grapheme). Example The text "??????" is comprised of the code points: U+0928, U+092E, U+0938, U+094D, U+0924, U+0947, of which, U+0938 and U+0947 are combining marks. static void Main(string[] args) { const string s = "??????"; Console.WriteLine(s.Length); // Ouptuts "6" var l = 0; var e = System.Globalization.StringInfo.GetTextElementEnumerator(s); while(e.MoveNext()) l++; Console.WriteLine(l); // Outputs "4" } So there we have it in .NET. We also have Win32's CharNextW() #include <Windows.h> #include <iostream> #include <string> int main() { const wchar_t * s = L"??????"; std::cout << std::wstring(s).length() << std::endl; // Gives "6" int l = 0; while(CharNextW(s) != s) { s = CharNextW(s); ++l; } std::cout << l << std::endl; // Gives "4" return 0; } Question Both ways I know of are specific to Microsoft. Are there portable ways to do it? I heard about ICU but I couldn't find something related quickly (UnicodeString(s).length() still gives 6). Would be an acceptable answer to point to the related function/module in ICU. C++ doesn't have a notion of Unicode, so a lightweight cross-platform library for dealing with these issues would make an acceptable answer.

    Read the article

  • String replacement problem.

    - by fastcodejava
    I want to provide some template for a code generator I am developing. A typical pattern for class is : public ${class_type} ${class_name} extends ${super_class} implements ${interfaces} { ${class_body} } Problem is if super_class is blank or interfaces. I replace extends ${super_class} with empty string. But I get extra spaces. So a class with no super_class and interfaces end up like : public class Foo { //see the extra spaces before {? ${class_body} } I know I can replace multiple spaces with single, but is there any better approach?

    Read the article

  • Transforming a string to a valid PDO_MYSQL DSN

    - by Alix Axel
    What is the most concise way to transform a string in the following format: mysql:[/[/]][user[:pass]@]host[:port]/db[/] Into a usuable PDO connection/instance (using the PDO_MYSQL DSN), some possible examples: $conn = new PDO('mysql:host=host;dbname=db'); $conn = new PDO('mysql:host=host;port=3307;dbname=db'); $conn = new PDO('mysql:host=host;port=3307;dbname=db', 'user'); $conn = new PDO('mysql:host=host;port=3307;dbname=db', 'user', 'pass'); I've been trying some regular expressions (preg_[match|split|replace]) but they either don't work or are too complex, my gut tells me this is not the way to go but nothing else comes to my mind. Any suggestions?

    Read the article

  • PHP String tokenizer not working correctly

    - by asdadas
    I have no clue why strtok decided to break on me. Here is my code. I am tokenizing a string by dollar symbol $. echo 'Tokenizing this by $: ',$aliases,PHP_EOL; if(strlen($aliases) > 0) { //aliases check $token = strtok($aliases, '$'); while($token != NULL) { echo 'Found a token: ',$token,PHP_EOL; if(!isGoodLookup($token)) { echo 'ERROR: Invalid alias found.',PHP_EOL; stop($db); } $goodAliasesList[] = $token; $token = strtok('$'); } if($token == NULL) echo 'Found null token, moving on',PHP_EOL; } And this is my output: Tokenizing this by $: getaways$aaa Found a token: getaways Found null token, moving on str tok is not supposed to do this!! where is my aaa token!!

    Read the article

< Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17  | Next Page >