Search Results

Search found 3804 results on 153 pages for 'regex'.

Page 85/153 | < Previous Page | 81 82 83 84 85 86 87 88 89 90 91 92 | Next Page >

In Python BeautifulSoup How to move tags

- by JJ

I have a partially converted XML document in soup coming from HTML. After some replacement and editing in the soup, the body is essentially - <Text...></Text> # This replaces <a href..> tags but automatically creates the </Text> <p class=norm ...</p> <p class=norm ...</p> <Text...></Text> <p class=norm ...</p> and so forth. I need to "move" the <p> tags to be children to <Text> or know how to suppress the </Text>. I want - <Text...> <p class=norm ...</p> <p class=norm ...</p> </Text> <Text...> <p class=norm ...</p> </Text> I've tried using item.insert and item.append but I'm thinking there must be a more elegant solution. for item in soup.findAll(['p','span']): if item.name == 'span' and item.has_key('class') and item['class'] == 'section': xBCV = short_2_long(item._getAttrMap().get('value','')) if currentnode: pass currentnode = Tag(soup,'Text', attrs=[('TypeOf', 'Section'),... ]) item.replaceWith(currentnode) # works but creates end tag elif item.name == 'p' and item.has_key('class') and item['class'] == 'norm': childcdatanode = None for ahref in item.findAll('a'): if childcdatanode: pass newlink = filter_hrefs(str(ahref)) childcdatanode = Tag(soup, newlink) ahref.replaceWith(childcdatanode) Thanks

Read the article
Invert regexp in vim

- by Chris J

There's a few "how do I invert a regexp" questions here on stackoverflow, but I can't find one for vim (if it does exist, by goggle-fu is lacking today). In essence I want to match all non-printable characters and delete them. I could write a short script, or drop to a shell and use tr or something similar to delete, but a vim solution would be dandy :-) Vim has the atom \p to match printable characters, however trying to do this :s/[^\p]//g to match the inverse failed and just left me with every 'p' in the file. I've seen the (?!xxx) sequence in other questions, and vim seems to not recognise this sequence. I've not found seen an atom for non-printable chars. In the interim, I'm going to drop to external tools, but if anyone's got any trick up their sleeve to do this, it'd be welcome :-) Ta!

Read the article
Filtering out emoticons using sed

- by user349222

Hello, I have a grep expression using cygwin grep on Win. grep -a "\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u" all_fbs.txt > rockon_fbs.txt Once I identify the emoticon class, however, I want to strip them out of the data. However, the same regexp above within a sed results in a syntax error (yes, I realize I could use /d instead of //g, but this doesn't make a difference, I still get the error.) sed "s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g" The full line is: grep -a "\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u" all_fbs.txt | sed "s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g" | sed "s/^/ROCKON\t/" > rockon_fbs.txt The result is: sed: -e expression #1, char 14: unknown option to `s' I know it's coming from the sed regexp I'm asking about it b/c if I remove that portion of the full line, then I get no error (but, of course, the emoticons are not filtered out). Thanks in advance, Steve

Read the article
Basic regexp help

- by casben79

I am new to programming PHP and am trying to validate a field in a form. The field if for a RAL color code and so would look something like : RAL 1001. so the letters RAL and then 4 numbers. Can someone help me set them into a regular expression to validate them. i have tried this with no success: $string_exp = "/^[RAL][0-9 .-]+$/i"; What can I say but sorry for being a complete NOOB at PHP. Cheers Ben

Read the article
Python - Strange Behavior in re.sub

- by Greg

Here's the code I'm running: import re FIND_TERM = r'C:\\Program Files\\Microsoft SQL Server\\90\\DTS\\Binn\\DTExec\.exe' rfind_term = re.compile(FIND_TERM,re.I) REPLACE_TERM = 'C:\\Program Files\\Microsoft SQL Server\\100\\DTS\\Binn\\DTExec.exe' test = r'something C:\Program Files\Microsoft SQL Server\90\DTS\Binn\DTExec.exe something' print rfind_term.sub(REPLACE_TERM,test) And the result I get is: something C:\Program Files\Microsoft SQL Server@\DTS\Binn\DTExec.exe something Why is there an @ sign?

Read the article
strange behavior in vim with negative look-behind

- by João Portela

So, I am doing this search in vim: /\(\(unum\)\|\(player\)=\)\@<!\"1\" and as expected it does not match lines that have: player="1" but matches lines that have: unum="1" what am i doing wrong? isn't the atom to be negated all of this: \(\(unum\)\|\(player\)=\) naturally just doing: /\(\(unum\)\|\(player\)=\) matches unum= or player=.

Read the article
Pulling out two separate words from a string using reg expressions?

- by Marvin

I need to improve on a regular expression I'm using. Currently, here it is: ^[a-zA-Z\s/-]+ I'm using it to pull out medication names from a variety of formulation strings, for example: SULFAMETHOXAZOLE-TRIMETHOPRIM 200-40 MG/5ML PO SUSP AMOX TR/POTASSIUM CLAVULANATE 125 mg-31.25 mg ORAL TABLET, CHEWABLE AMOXICILLIN TRIHYDRATE 125 mg ORAL TABLET, CHEWABLE AMOX TR/POTASSIUM CLAVULANATE 125 mg-31.25 mg ORAL TABLET, CHEWABLE Amoxicillin 1000 MG / Clavulanate 62.5 MG Extended Release Tablet The resulting matches on these examples are: SULFAMETHOXAZOLE-TRIMETHOPRIM AMOX TR/POTASSIUM CLAVULANATE AMOXICILLIN TRIHYDRATE AMOX TR/POTASSIUM CLAVULANATE Amoxicillin The first four are what I want, but on the fifth, I really need "Amoxicillin / Clavulanate". How would I pull out patterns like "Amoxicillin / Clavulanate" (in fifth row) while missing patterns like "MG/5 ML" (in the first row)?

Read the article
I need a regular expression to substitute pseudo html in .NET

- by netadictos

I have texts like this one: this is a text in [lang lang="en" ]english[/lang] or a text in [lang lang="en" ]spanish[/lang] I need to substitute them for: this is a text in <span lang="en">english </span> or a text in <span lang="es">spanish</span> I need a regular expression, not a simple replace. The languages in the lang tag can be whatever.

Read the article
Search text with a regular expression to match outside specific characters

- by user228164

I have text that looks like: My name is (Richard) and I cannot do [whatever (Jack) can't do] and (Robert) is the same way [unlike (Betty)] thanks (Jill) The goal is to search using a regular expression to find all parenthesized names that occur anywhere in the text BUT in-between any brackets. So in the text above, the result I am looking for is: Richard Robert Jill

Read the article
How can I use Perl regular expressions to parse XML data?

- by Luke

I have a pretty long piece of XML that I want to parse. I want to remove everything except for the subclass-code and city. So that I am left with something like the example below. EXAMPLE TEST SUBCLASS|MIAMI CODE <?xml version="1.0" standalone="no"?> <web-export> <run-date>06/01/2010 <pub-code>TEST <ad-type>TEST <cat-code>Real Estate</cat-code> <class-code>TEST</class-code> <subclass-code>TEST SUBCLASS</subclass-code> <placement-description></placement-description> <position-description>Town House</position-description> <subclass3-code></subclass3-code> <subclass4-code></subclass4-code> <ad-number>0000284708-01</ad-number> <start-date>05/28/2010</start-date> <end-date>06/09/2010</end-date> <line-count>6</line-count> <run-count>13</run-count> <customer-type>Private Party</customer-type> <account-number>100099237</account-number> <account-name>DOE, JOHN</account-name> <addr-1>207 CLARENCE STREET</addr-1> <addr-2> </addr-2> <city>MIAMI</city> <state>FL</state> <postal-code>02910</postal-code> <country>USA</country> <phone-number>4014612880</phone-number> <fax-number></fax-number> <url-addr> </url-addr> <email-addr>[email protected]</email-addr> <pay-flag>N</pay-flag> <ad-description>DEANESTATES2BEDS2BATHSAPPLIANCED</ad-description> <order-source>Import</order-source> <order-status>Live</order-status> <payor-acct>100099237</payor-acct> <agency-flag>N</agency-flag> <rate-note></rate-note> <ad-content> MIAMI/Dean Estates: 2 beds, 2 baths. Applianced. Central air. Carpets. Laundry. 2 decks. Pool. Parking. Close to everything.No smoking. No utilities. $1275 mo. 401-578-1501. </ad-content> </ad-type> </pub-code> </run-date> </web-export> PERL So what I want to do is open an existing file read the contents then use regular expressions to eliminate the unnecessary XML tags. open(READFILE, "FILENAME"); while(<READFILE>) { $_ =~ s/<\?xml version="(.*)" standalone="(.*)"\?>\n.*//g; $_ =~ s/<subclass-code>//g; $_ =~ s/<\/subclass-code>\n.*/|/g; $_ =~ s/(.*)PJ RER Houses /PJ RER Houses/g; $_ =~ s/\G //g; $_ =~ s/<city>//g; $_ =~ s/<\/city>\n.*//g; $_ =~ s/<(\/?)web-export>(.*)\n.*//g; $_ =~ s/<(\/?)run-date>(.*)\n.*//g; $_ =~ s/<(\/?)pub-code>(.*)\n.*//g; $_ =~ s/<(\/?)ad-type>(.*)\n.*//g; $_ =~ s/<(\/?)cat-code>(.*)<(\/?)cat-code>\n.*//g; $_ =~ s/<(\/?)class-code>(.*)<(\/?)class-code>\n.*//g; $_ =~ s/<(\/?)placement-description>(.*)<(\/?)placement-description>\n.*//g; $_ =~ s/<(\/?)position-description>(.*)<(\/?)position-description>\n.*//g; $_ =~ s/<(\/?)subclass3-code>(.*)<(\/?)subclass3-code>\n.*//g; $_ =~ s/<(\/?)subclass4-code>(.*)<(\/?)subclass4-code>\n.*//g; $_ =~ s/<(\/?)ad-number>(.*)<(\/?)ad-number>\n.*//g; $_ =~ s/<(\/?)start-date>(.*)<(\/?)start-date>\n.*//g; $_ =~ s/<(\/?)end-date>(.*)<(\/?)end-date>\n.*//g; $_ =~ s/<(\/?)line-count>(.*)<(\/?)line-count>\n.*//g; $_ =~ s/<(\/?)run-count>(.*)<(\/?)run-count>\n.*//g; $_ =~ s/<(\/?)customer-type>(.*)<(\/?)customer-type>\n.*//g; $_ =~ s/<(\/?)account-number>(.*)<(\/?)account-number>\n.*//g; $_ =~ s/<(\/?)account-name>(.*)<(\/?)account-name>\n.*//g; $_ =~ s/<(\/?)addr-1>(.*)<(\/?)addr-1>\n.*//g; $_ =~ s/<(\/?)addr-2>(.*)<(\/?)addr-2>\n.*//g; $_ =~ s/<(\/?)state>(.*)<(\/?)state>\n.*//g; $_ =~ s/<(\/?)postal-code>(.*)<(\/?)postal-code>\n.*//g; $_ =~ s/<(\/?)country>(.*)<(\/?)country>\n.*//g; $_ =~ s/<(\/?)phone-number>(.*)<(\/?)phone-number>\n.*//g; $_ =~ s/<(\/?)fax-number>(.*)<(\/?)fax-number>\n.*//g; $_ =~ s/<(\/?)url-addr>(.*)<(\/?)url-addr>\n.*//g; $_ =~ s/<(\/?)email-addr>(.*)<(\/?)email-addr>\n.*//g; $_ =~ s/<(\/?)pay-flag>(.*)<(\/?)pay-flag>\n.*//g; $_ =~ s/<(\/?)ad-description>(.*)<(\/?)ad-description>\n.*//g; $_ =~ s/<(\/?)order-source>(.*)<(\/?)order-source>\n.*//g; $_ =~ s/<(\/?)order-status>(.*)<(\/?)order-status>\n.*//g; $_ =~ s/<(\/?)payor-acct>(.*)<(\/?)payor-acct>\n.*//g; $_ =~ s/<(\/?)agency-flag>(.*)<(\/?)agency-flag>\n.*//g; $_ =~ s/<(\/?)rate-note>(.*)<(\/?)rate-note>\n.*//g; $_ =~ s/<ad-content>(.*)\n.*//g; $_ =~ s/\t(.*)\n.*//g; $_ =~ s/<\/ad-content>(.*)\n.*//g; } close( READFILE1 ); Is there an easier way of doing this? I don't want to use any modules. I know that it might make this easier but the file I am reading has a lot of data in it.

Read the article
How do you implement a good profanity filter?

- by Ben Throop

Many of us need to deal with user input, search queries, and situations where the input text can potentially contain profanity or undesirable language. Oftentimes this needs to be filtered out. Where can one find a good list of swear words in various languages and dialects? Are there APIs available to sources that contain good lists? Or maybe an API that simply says "yes this is clean" or "no this is dirty" with some parameters? What are some good methods for catching folks trying to trick the system, like a$$, azz, or a55? Bonus points if you offer solutions for PHP. :) Edit: Response to answers that say simply avoid the programmatic issue: I think there is a place for this kind of filter when, for instance, a user can use public image search to find pictures that get added to a sensitive community pool. If they can search for "penis", then they will likely get many pictures of, yep. If we don't want pictures of that, then preventing the word as a search term is a good gatekeeper, though admittedly not a foolproof method. Getting the list of words in the first place is the real question. So I'm really referring to a way to figure out of a single token is dirty or not and then simply disallow it. I'd not bother preventing a sentiment like the totally hilarious "long necked giraffe" reference. Nothing you can do there. :)

Read the article
Regexp match in Java

- by tinti

Regexp in Java I want to make a regexp who do this verify if a word is like [0-9A-Za-z][._-'][0-9A-Za-z] example for valid words A21a_c32 daA.da2 das'2 dsada ASDA 12SA89 non valid words dsa#da2 34$ Thanks

Read the article
Find ASCII "arrows" in text

- by ulver

I'm trying to find all the occurrences of "Arrows" in text, so in "<----=====><==->>" the arrows are: "<----", "=====>", "<==", "->", ">" This works: String[] patterns = {"<=*", "<-*", "=*>", "-*>"}; for (String p : patterns) { Matcher A = Pattern.compile(p).matcher(s); while (A.find()) { System.out.println(A.group()); } } but this doesn't: String p = "<=*|<-*|=*>|-*>"; Matcher A = Pattern.compile(p).matcher(s); while (A.find()) { System.out.println(A.group()); } No idea why. It often reports "<" instead of "<====" or similar. What is wrong?

Read the article
Problem with Javascript RegExp-mask

- by OrjanL

I have a string that looks something like this: {theField} > YEAR (today, -3) || {theField} < YEAR (today, +3) I want it to be replaced into: {theField} > " + YEAR (today, -3) + " || {theField} < " + YEAR (today, +3) + " I have tried this: String.replace(/(.*)(YEAR|MONTH|WEEK|DAY+)(.*[)]+)/g, "$1 \" + $2 $3 + \"") But that gives me: {theField} > YEAR (today, +3) || {theField} > " + YEAR (today, +3) + " Does anyone have any ideas?

Read the article
Replaceing <a href="mailto: with just email aadress

- by Lauri

I want to replace all "mailto:" links in html with plain emails. In: text .... <a href="mailto:[email protected]">not needed</a> text Out: text .... [email protected] text I did this: $str = preg_replace("/\<a.+href=\"mailto:(.*)\".+\<\/a\>/", "$1", $str); But it fails if there are multiple emails in string or html inside "a" tag In: <a href="mailto:[email protected]">not needed</a><a href="mailto:[email protected]"><font size="3">[email protected]</font></a> Out: [email protected]">

Read the article
preg_replace only part of match

- by Tony Vipros

Hi, I'm using preg_replace to create urls for modrewrite based paging links. I use: $nextURL = preg_replace('%/([\d]+)/%','/'.($pageNumber+1).'/',$currentURL); which works fine, however I was wondering if there is a better way without having to include the '/' in the replacement parameter. I need to match the number as being between two / as the URLs can sometimes contain numbers other than the page part. These numbers are never only numbers however, so have /[\d]+/ stops them from getting replaced.

Read the article
Parsing a context-free grammar in Python

- by Yuval A

What tools are available in Python to assist in parsing a context-free grammar? Of course it is possible to roll my own, but I am looking for a generic tool that can generate a parser for a given CFG.

Read the article
Building a Hashtag in Javascript without matching Anchor Names, BBCode or Escaped Characters

- by Martindale

I would like to convert any instances of a hashtag in a String into a linked URL: #hashtag - should have "#hashtag" linked. This is a #hashtag - should have "#hashtag" linked. This is a [url=http://www.mysite.com/#name]named anchor[/url] - should not be linked. This isn't a pretty way to use quotes - should not be linked. Here is my current code: String.prototype.parseHashtag = function() { return this.replace(/[^&][#]+[A-Za-z0-9-_]+(?!])/, function(t) { var tag = t.replace("#","") return t.link("http://www.mysite.com/tag/"+tag); }); }; Currently, this appears to fix escaped characters (by excluding matches with the amperstand), handles named anchors, but it doesn't link the #hashtag if it's the first thing in the message, and it seems to grab include the 1-2 characters prior to the "#" in the link. Halp!

Read the article
Bash: Extract Range with Regular Expressioin (maybe sed?)

- by sixtyfootersdude

I have a file that is similar to this: <many lines of stuff> SUMMARY: <some lines of stuff> END OF SUMMARY I want to extract just the stuff between SUMMARY and END OF SUMMARY. I suspect I can do this with sed but I am not sure how. I know I can modify the stuff in between with this: sed "/SUMMARY/,/END OF SUMMARY/ s/replace/with/" fileName (But not sure how to just extract that stuff). I am Bash on Solaris.

Read the article
Does Ruby have an addon similar to Perl 6 grammars?

- by dreftymac

Perl has been one of my go-to programming language tools for years and years. Perl 6 grammars looks like a great language feature. I'd like to know if someone has started something like this for Ruby.

Read the article
regular expression of 0's and 1's

- by Lopa

Hello all I got this question which asks me to figure out why is it foolish to write a regular expression for the language that consists of strings of 0's and 1's that are palindromes( they read the same backwards and forwards). part 2 of the question says using any formal mechanism of your choice, show how it is possible to express the language that consists of strings of 0's and 1's that are palindromes?

Read the article
Intersection of two regular expressions

- by Henry

Hi, Im looking for function (PHP will be the best), which returns true whether exists string matches both regexpA and regexpB. Example 1: $regexpA = '[0-9]+'; $regexpB = '[0-9]{2,3}'; hasRegularsIntersection($regexpA,$regexpB) returns TRUE because '12' matches both regexps Example 2: $regexpA = '[0-9]+'; $regexpB = '[a-z]+'; hasRegularsIntersection($regexpA,$regexpB) returns FALSE because numbers never matches literals. Thanks for any suggestions how to solve this. Henry

Read the article
Pattern/Matcher in Java?

- by user1007059

I have a certain text in Java, and I want to use pattern and matcher to extract something from it. This is my program: public String getItemsByType(String text, String start, String end) { String patternHolder; StringBuffer itemLines = new StringBuffer(); patternHolder = start + ".*" + end; Pattern pattern = Pattern.compile(patternHolder); Matcher matcher = pattern.matcher(text); while (matcher.find()) { itemLines.append(text.substring(matcher.start(), matcher.end()) + "\n"); } return itemLines.toString(); } This code works fully WHEN the searched text is on the same line, for instance: String text = "My name is John and I am 18 years Old"; getItemsByType(text, "My", "John"); immediately grabs the text "My name is John" out of the text. However, when my text looks like this: String text = "My name\nis John\nand I'm\n18 years\nold"; getItemsByType(text, "My", "John"); It doesn't grab anything, since "My" and "John" are on different lines. How do I solve this?

Read the article
How to remove a tab attribute in ASP .NET AJAX Toolkit using Regular Expression

- by Nassign

I have tried to remove the following tag generated by the AJAX Control toolkit. The scenario is our GUI team used the AJAX control toolkit to make the GUI but I need to move them to normal ASP .NET view tag using MultiView. I want to remove all the __designer: attributes Here is the code <asp:TextBox ID="a" runat="server" __designer:wfdid="w540" /> <asp:DropdownList ID="a" runat="server" __designer:wfdid="w541" /> ..... <asp:DropdownList ID="a" runat="server" __designer:wfdid="w786" /> I tried to use the regular expression find replace in Visual Studio using: Find: :__designer\:wfdid="w{([0-9]+)}" Replace with empty space Can any regular expression expert help?

Read the article
Regexp: Replace only in specific context

- by blinry

In a text, I would like to replace all occurrences of $word by [$word]($word) (to create a link in Markdown), but only if it is not already in a link. Example: [$word homepage](http://w00tw00t.org) should not become [[$word]($word) homepage](http://w00tw00t.org). Thus, I need to check whether $word is somewhere between [ and ] and only replace if it's not the case. Can you think of a preg_replace command for this?

Read the article

< Previous Page | 81 82 83 84 85 86 87 88 89 90 91 92 | Next Page >