Search Results

Search found 36172 results on 1447 pages for 'unicode string'.

Page 6/1447 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

Python: Removing particular character (u"\u2610") from string

- by duhaime

I have been wrestling with decoding and encoding in Python, and I can't quite figure out how to resolve my problem. I am looping over xml text files (sample) that are apparently coded in utf-8, using Beautiful Soup to parse each file, then looking to see if any sentence in the file contains one or more words from two different list of words. Because the xml files are from the eighteenth century, I need to retain the em dashes that are in the xml. The code below does this just fine, but it also retains a pesky box character that I wish to remove. I believe the box character is this character. (You can find an example of the character I wish to remove in line 3682 of the sample file above. On this webpage, the character looks like an 'or' pipe, but when I read the xml file in Komodo, it looks like a box. When I try to copy and paste the box into a search engine, it looks like an 'or' pipe. When I print to console, though, the character looks like an empty box.) To sum up, the code below runs without errors, but it prints the empty box character that I would like to remove. for work in glob.glob(pathtofiles): openfile = open(work) readfile = openfile.read() stringfile = str(readfile) decodefile = stringfile.decode('utf-8', 'strict') #is this the dodgy line? soup = BeautifulSoup(decodefile) textwithtags = soup.findAll('text') textwithtagsasstring = str(textwithtags) #this method strips everything between anglebrackets as it should textwithouttags = stripTags(textwithtagsasstring) #clean text nonewlines = textwithouttags.replace("\n", " ") noextrawhitespace = re.sub(' +',' ', nonewlines) print noextrawhitespace #the boxes appear I tried to remove the boxes by using noboxes = noextrawhitespace.replace(u"\u2610", "") But Python threw an error flag: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 280: ordinal not in range(128) Does anyone know how I can remove the boxes from the xml files? I would be grateful for any help others can offer.

Read the article
Converting a string into ArrayCollection (C#, see string below)

- by Ole Jak

how to get from such string ' name1{value1,value2};name2{value3}; ... nameN{value12, valueN} ' Array or arrays in such form: Array = {string, int};{string, int};{string, int}; like this: { { name1 ; value1} { name1 ; value2} { name2 ; value3} ... { nameN ; valueN} } in C# (.net)?

Read the article
How can we remove specific string from string ?

- by Harikrishna

I will have a different type of string(string will not have fixed format,they will be different every time) from them I want to remove some specific substring.Like the string can be FUTIDX 26FEB2009 NIFTY 0 FUTSTK ONGC 27 Mar 2008 FUTIDX MINIFTY 30 Jul 2009 FUTIDX NIFTY 27 Aug 2009 NIFTY FUT XP: 29/05/2008 Now I want to Remove the string which starts with FUT how can I do that ?

Read the article
string.format vs + for string concatenatoin

- by AMissico

Which is better in respect to performance and memory utilization? // + Operator oMessage.Subject = "Agreement, # " + sNumber + ", Name: " + sName; // String.Format oMessage.Subject = string.Format("Agreement, # {0}, Name: {1}", sNumber, sName); My preference is memory utilization. The + operator is used throughout the application. String.Format and StringBuilder is rarely use. I want to reduce the amount of memory fragmentation caused by excessive string allocations.

Read the article
PHP Explode with an Unicode character as separator

- by Young Roger

XPDFs pdftotext converts pdf to text and outputs it at command line level. If needed it inserts PageBreaks between the pages as specified in TextOutputDev.cc: eopLen = uMap->mapUnicode(0x0c, eop, sizeof(eop)); This Unicode symbol is encoding independent, -enc ASCII7 wouldn't change it. I'm currently willing to use PHP for converting and splitting the PDF file into several TXT pages for database storage. However, the following function does work, but takes twice as long as a conversion of the whole book in one time. for($i = 1; $i <= $pages[0]; $i++) $page[$i] = shell_exec('/usr/bin/pdftotext sample.pdf -f '.$i.' -l '.$i.' -'); How am I supposed to explode(0x0c, $wholePDF) with an Unicode character as separator? Currently, page[$i] doesn't seem to retrieve those weird Unicode PageBreak characters from the shell_exec(). I tried several headers for encoding (UTF-8 especially) but it didn't work out so far.

Read the article
CDOSYS and Unicode in the from field - vbScript.

- by Simmo

I've got the code below, and I'm trying to set the from field to allow unicode. Currently in my email client I get "??". The subject line and any content shows the unicode correctly. And looking at the MSDN the property should be "urn:schemas:httpmail:from". Anyone solved this issue? Thanks M Dim AC_EMAIL : AC_EMAIL = "[email protected]" Dim AC_EMAIL_FROM : AC_EMAIL_FROM = "?? <[email protected]>" Dim strSubject : strSubject = """??"" testing testing" set oMessage = WScript.CreateObject("CDO.Message") With oMessage .BodyPart.charset = "utf-8" 'unicode-1-1-utf-8 .Fields("urn:schemas:httpmail:from") = AC_EMAIL_FROM .Fields("urn:schemas:httpmail:to") = AC_EMAIL .Fields("urn:schemas:httpmail:subject") = strSubject .Fields.Update .Send End With Set oMessage = Nothing

Read the article
Wordpress is ignoring Unicode Chars in URL

- by Ankur Gupta

Hi, I am using wordpress with this type of permalink: /%year%/%monthnum%/%postname%/ if I use this type of url: example.com/2010/03/????? it treats this url like this example.com/2010/03/ (By ignoring unicode chars) and displays March 2010 archive list. if I use english url: example.com/2010/03/technology then it works perfectly. This problem occurs even on tags page: for example example.com/tag/??????? is treated like example.com/tag/ and displays 404 page. Why wordpress is ignoring unicode chars? If I use default querystring structure then it works perfectly even with unicode characters. Server Info: IIS7 Win2008 Server (Url rewriting enabled) Wordpress 2.9.2

Read the article
Double-Escaped Unicode Javascript Issue

- by Jeffrey Winter

I am having a problem displaying a Javascript string with embedded Unicode character escape sequences (\uXXXX) where the initial "\" character is itself escaped as "\" What do I need to do to transform the string so that it properly evaluates the escape sequences and produces output with the correct Unicode character? For example, I am dealing with input such as: "this is a \u201ctest\u201d"; attempting to decode the "\" using a regex expression, e.g.: var out = text.replace('/\/g','\'); results in the output text: "this is a \u201ctest\u201d"; that is, the Unicode escape sequences are displayed as actual escape sequences, not the double quote characters I would like.

Read the article
SQLAlchemy automatically converts str to unicode on commit

- by Victor Stanciu

Hello, When inserting an object into a database with SQLAlchemy, all it's properties that correspond to String() columns are automatically transformed from <type 'str'> to <type 'unicode'>. Is there a way to prevent this behavior? Here is the code: from sqlalchemy import create_engine, Table, Column, Integer, String, MetaData from sqlalchemy.orm import mapper, sessionmaker engine = create_engine('sqlite:///:memory:', echo=False) metadata = MetaData() table = Table('projects', metadata, Column('id', Integer, primary_key=True), Column('name', String(50)) ) class Project(object): def __init__(self, name): self.name = name mapper(Project, table) metadata.create_all(engine) session = sessionmaker(bind=engine)() project = Project("Lorem ipsum") print(type(project.name)) session.add(project) session.commit() print(type(project.name)) And here is the output: <type 'str'> <type 'unicode'> I know I should probably just work with unicode, but this would involve digging through some third-party code and I don't have the Python skills for that yet :)

Read the article
C# Button Text Unicode characters.

- by Fossaw

C# doesn't want to put Unicode characters on buttons. If I put \u2129 in the Text attribute of the button, the button displays the \u2129, not the Unicode character, (example - I chose 2129 because I could see it in the font currently active on the machine). I saw this question before, link text, but the question isn't really answered, just got around. I am working on applications which are going all over the world, and don't want to install all the fonts, more then "don't want", there are that many that I doubt the machine I am working on has sufficient disk space. Our overseas sales agents supply the Unicode character "numbers". Is there another way forward with this? As an aside, (curiosity), why does it not work?

Read the article
VB.NET - Convert Unicode in one TB to Shift-JIS in another TB

- by Yiu Korochko

Trying to develop a text editor, I've got two textboxes, and a button below each one. When the button below textbox1 is pressed, it is supposed to convert the Unicode text (intended to be Japanese) to Shift-JIS. The reason why I am doing this is because the software VOCALOID2 only allows ANSI and Shift-JIS encoding text to be pasted into the lyrics system. Users of the application normally have their keyboard set to change to Japanese already, but it types in Unicode. How can I convert Unicode text to Shift-JIS when SJIS isn't available in the System.Text.Encoding types?

Read the article
Converting datetime.ctime() values to Unicode

- by Malcolm

I would like to convert datetime.ctime() values to Unicode. Using Python 2.6.4 running under Windows I can set my locale to Spanish like below: import locale locale.setlocale(locale.LC_ALL, 'esp' ) Then I can pass %a, %A, %b, and %B to ctime() to get day and month names and abbreviations. import datetime dateValue = datetime.date( 2010, 5, 15 ) dayName = dateValue.strftime( '%A' ) dayName 's\xe1bado' How do I convert the 's\xe1bado' value to Unicode? Specifically what encoding do I use? I'm thinking I might do something like the following, but I'm not sure this is the right approach. codePage = locale.getdefaultlocale()[ 1 ] dayNameUnicode = unicode( dayName, codePage ) dayNameUnicode u's\xe1bado' Malcolm

Read the article
How would you create a string of all UTF-8 characters? [PHP]

- by Xeoncross

There are many ways to represent the +1 million UTF-8 characters. Take the latin capital "A" with macron (A). This is unicode code point U+0100, hex number 0xc4 0x80, decimal number 196 128, and binary 11000100 10000000. I would like to create a collection of the first 65,535 UTF-8 characters for use in testing applications. These are all unicode characters up to code point U+FFFF (byte3). Is it possible to do something like a for($x=0) loop and then convert the resulting decimal to another base (like hex) which would allow the creation of the matching unicode character? I can create the value A using something like this: $char = "\xc4\x80"; // or $char = chr(196).chr(128); However, I am not sure how to turn this into an automated process. // fail! $char = "\x". dechex($a). "\x". dexhex($$b);

Read the article
Output unicode strings in Windows console app

- by Andrew

Hi I was trying to output unicode string to a console with iostreams and failed. I found this: Using unicode font in c++ console app and this snippet works. SetConsoleOutputCP(CP_UTF8); wchar_t s[] = L"èéøÞ???Sæca"; int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL); char* m = new char[bufferSize]; WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL); wprintf(L"%S", m); However, I did not find any way to output unicode correctly with iostreams. Any suggestions?

Read the article
Writing to a file in Unicode

- by Lefteris

I am having some problems writing to a file in unicode inside my c program. I am trying to write a unicode Japanese string to a file. When I go to check the file though it is empty. If I try a non-unicode string it works just fine. What am I doing wrong? setlocale(LC_CTYPE, ""); FILE* f; f = _wfopen(COMMON_FILE_PATH,L"w"); fwprintf(f,L"???"); fclose(f); Oh about my system: I am running Windows. And my IDE is Visual Studio 2008.

Read the article
Unicode generated by toEscapedUnicode method is without spaces

- by vishvesha

For this word ????????????? the Unicode is== \u0938\u0941\u0916\u091A\u0948\u0928\u093E\u0928\u0940 \u0930\u0940\u091D\u0941\u092E\u0932 \u091C\u093F\u0935\u0924\u0930\u093E\u092E and look it has spaces before \u0930 and \u091C But when I am trying in my code String tempString=Strings.toEscapedUnicode(strString); This method to convert to Unicode gives a result without spaces: \u0938\u0941\u0916\u091A\u0948\u0928\u093E\u0928\u0940\u0930\u0940\u091D\u0941\u092E\u0932\u091C\u093F\u0935\u0924\u0930\u093E\u092E and that's why they are not matching. My 'toEscapeUnicode' method generates Unicode without spaces. I want the spaces, so how to do it?

Read the article
C Writting to a file in Unicode

- by Lefteris

Hey all, I am having some problems writting to a file in unicode inside my c program. I am trying to write a unicode Japanese string to a file. When I go to check the file though it is empty. If I try a non-unicode string it works just fine. What am I doing wrong? setlocale(LC_CTYPE, ""); FILE* f; f = _wfopen(COMMON_FILE_PATH,L"w"); fwprintf(f,L"???"); fclose(f);

Read the article
How to compare two structure strings in C++

- by Arvandor

Ok, so this week in class we're working with arrays. I've got an assignment that wanted me to create a structure for an employee containing an employee ID, first name, last name, and wages. Then it has me ask users for input for 5 different employees all stored in an array of this structure, then ask them for a search field type, then a search value. Lastly, display all the information for all positive search results. I'm still new, so I'm sure it isn't a terribly elegant program, but what I'm trying to do now is figure out how to compare a user entered string with the string stored in the structure... I'll try to give all the pertinent code below. struct employee { int empid, string firstname, string lastname, float wage }; employee emparray[] = {}; employee value[] = {}; //Code for populating emparray and structure, then determine search field etc. cout << "Enter a search value: "; cin >> value.lastname; for(i = 0; i < 5; i++) { if(strcmp(value.lastname.c_str, emparray[i].lastname.c_str) == 0) { output(); } } Which... I thought would work, but it's giving me the following error.. Error 1 error C3867: 'std::basic_string<_Elem,_Traits,_Alloc>::c_str': function call missing argument list; use '&std::basic_string<_Elem,_Traits,_Alloc>::c_str' to create a pointer to member d:\myfile Any thoughts on what's going on? Is there a way to compare two .name notated strings without totally revamping the program? IF you want to drill me on best practices, please feel free, but also please try to solve my particular problem.

Read the article
Regarding String manipulation

- by arav

I have a String str which can have list of values like below. I want the first letter in the string to be uppercase and if underscore appears in the string then i need to remove it and need to make the letter after it as upper case. The rest all letter i want it to be lower case. "" "abc" "abc_def" "Abc_def_Ghi12_abd" "abc__de" "_" Output: "" "Abc" "AbcDef" "AbcDefGhi12Abd" "AbcDe" ""

Read the article
How to convert a JSON string to a Map<String, String> with Jackson JSON

- by Infinity

This is my first time trying to do something useful with Java.. I'm trying to do something like this but it doesn't work: Map<String, String> propertyMap = new HashMap<String, String>(); propertyMap = JacksonUtils.fromJSON(properties, Map.class); But the IDE says: 'Unchecked assignment Map to Map<String,String>' What's the right way to do this? I'm only using Jackson because that's what is already available in the project, is there a native Java way of converting to/from JSON? In PHP I would simply json_decode($str) and I'd get back an array. I need basically the same thing here. Thanks!

Read the article
A simple string array Iteration in C# .NET doesn't work

- by met.lord

This is a simple code that should return true or false after comparing each element in a String array with a Session Variable. The thing is that even when the string array named 'plans' gets the right attributes, inside the foreach it keeps iterating only over the first element, so if the Session Variable matches other element different than the first one in the array it never returns true... You could say the problem is right there in the foreach cicle, but I cant see it... I've done this like a hundred times and I can't understand what am I doing wrong... Thank you protected bool ValidatePlans() { bool authorized = false; if (RequiredPlans.Length > 0) { string[] plans = RequiredPlans.Split(','); foreach (string plan in plans) { if (MySessionInfo.Plan == plan) authorized = true; } } return authorized; }

Read the article
C++ String tokenisation from 3D .obj files

- by Ben

I'm pretty new to C++ and was looking for a good way to pull the data out of this line. A sample line that I might need to tokenise is f 11/65/11 16/70/16 17/69/17 I have a tokenisation method that splits strings into a vector as delimited by a string which may be useful static void Tokenise(const string& str, vector<string>& tokens, const string& delimiters = " ") The only way I can think of doing it is to tokenise with " " as a delimiter, remove the first item from the resulting vector, then tokenise each part by itself. Is there a good way to do this all in one?

Read the article
How to check if string contains a string in string array

- by Abu Hamzah

edit: the order might change as you can see in the below example, both string have same name but different order.... How would you go after checking to see if the both string array match? the below code returns true but in a reality its should return false since I have extra string array in the _check what i am trying to achieve is to check to see if both string array have same number of strings. string _exists = "Adults,Men,Women,Boys"; string _check = "Men,Women,Boys,Adults,fail"; if (_exists.All(s => _check.Contains(s))) //tried Equal { return true; } else { return false; }

Read the article
String Occurance Counting Algorithm

- by Hellnar

Hello, I am curious what is the most efficient algorithm (or commonly used) to count the number of occurances of a string in a chunck of text. From what I read, Boyer–Moore string search algorithm is the standard for string search but I am not sure if counting occurance in an efficient way would be same as searching a string. In python this is what I want: text_chunck = "one two three four one five six one" occurance_count(text_chunck, "one") # gives 3. Regards EDIT: It seems like python str.count serves me such method however I am not able to find what algorithm it uses.

Read the article
string in c++,question

- by user189364

Hi, I created a program in C++ that remove commas (') from a given integer. i.e. 2,00,00 would return 20000. I am not using any new space. Here is the program i created void removeCommas(string& str1,int len) { int j=0; for(int i=0;i<len;i++) { if(str1[i] == ',') continue; else { str1[j] =str1[i]; j++; } } str1[j] = '\0'; } void main() { string str1; getline(cin,str1); int i = str1.length(); removeCommas(str1,i); cout<<"the new string "<<str1<<endl; } Here is the result i get : Input : 2,000,00 String length =8 Output = 200000 0 Length = 8 My question is that why does it show the length has 8 in output and shows the rest of string when i did put a null character. It should show output as 200000 and length has 6.

Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >