Search Results

Search found 3956 results on 159 pages for 'regex cookbook'.

Page 98/159 | < Previous Page | 94 95 96 97 98 99 100 101 102 103 104 105  | Next Page >

  • What regular expression(s) would I use to remove escaped html from large sets of data.

    - by Elizabeth Buckwalter
    Our database is filled with articles retrieved from RSS feeds. I was unsure of what data I would be getting, and how much filtering was already setup (WP-O-Matic Wordpress plugin using the SimplePie library). This plugin does some basic encoding before insertion using Wordpress's built in post insert function which also does some filtering. I've figured out most of the filters before insertion, but now I have whacko data that I need to remove. This is an example of whacko data that I have data in one field which the content I want in the front, but this part removed which is at the end: <img src="http://feeds.feedburner.com/~ff/SoundOnTheSound?i=xFxEpT2Add0:xFbIkwGc-fk:V_sGLiPBpWU" border="0"></img> <img src="http://feeds.feedburner.com/~ff/SoundOnTheSound?d=qj6IDK7rITs" border="0"></img> &lt;img src=&quot;http://feeds.feedburner.com/~ff/SoundOnTheSound?i=xFxEpT2Add0:xFbIkwGc-fk:D7DqB2pKExk&quot; Notice how some of the images are escape and some aren't. I believe this has to do with the last part being cut off so as to be unrecognizable as an html tag, which then caused it to be html endcoded. Another field has only this which is now filtered before insertion, but I have to get rid of the others: &lt;img src=&quot;http://farm3.static.flickr.com/2183/2289902369_1d95bcdb85.jpg&quot; alt=&quot;post_img&quot; width=&quot;80&quot; (all examples are on one line, but broken up for readability) Question: What is the best way to work with the above escaped html (or portion of an html tag)? I can do it in Perl, PHP, SQL, Ruby, and even Python. I believe Perl to be the best at text parsing, so that's why I used the Perl tag. And PHP times out on large database operations, so that's pretty much out unless I wanted to do batch processing and what not. PS One of the nice things about using Wordpress's insert post function, is that if you use php's strip_tags function to strip out all html, insert post function will insert <p> at the paragraph points. Let me know if there's anything more that I can answer. Some article that didn't quite answer my questions. (http://stackoverflow.com/questions/2016751/remove-text-from-within-a-database-text-field) (http://stackoverflow.com/questions/462831/regular-expression-to-escape-html-ampersands-while-respecting-cdata)

    Read the article

  • Regular Expression :match string containing only non repeating words

    - by nash
    I have this situation(Java code): 1) a string such as : "A wild adventure" should match. 2) a string with adjacent repeated words: "A wild wild adventure" shouldn't match. With this regular expression: .* \b(\w+)\b\s*\1\b.* i can match strings containing adjacent repeated words. How to reverse the situation i.e how to match strings which do not contain adjacent repeat words

    Read the article

  • string parsing help

    - by sprugman
    I've got a string like this: #################### Section One #################### Data A Data B #################### Section Two #################### Data C Data D etc. I want to parse it into something like: $arr( 'Section One' => array('Data A', 'Data B'), 'Section Two' => array('Data C', 'Data D') ) At first I tried this: $sections = preg_split("/(\r?\n)(\r?\n)#/", $file_content); The problem is, the file isn't perfectly clean: sometimes there are different numbers of blank lines between the sections, or blank spaces between data rows. The section head pattern itself seems to be relatively consistent: #################### Section Title #################### The number of #'s is probably consistent, but I don't want to count on it. The white space on the title line is pretty random. Once I have it split into sections, I think it'll be pretty straightforward, but any help writing a killer reg ex to get it there would be appreciated. (Or if there's a better approach than reg ex...)

    Read the article

  • replacing text within quotes until next quote

    - by Jordan Trainor
    String input = "helloj\"iojgeio\r\ngsk\\"jopri\"gj\r\negjoijisgoe\"joijsofeij\"\"\"ojgsoij\""; This is my current code that works but iv added some code that has to run before this which makes some '"' split onto another line thus making the code below obsolute unless under certain cirumstances the '"' is not put onto the next line. firstQuote = input.IndexOf("\""); lastQuote = input.LastIndexOf("\""); input = input.Substring(0, firstQuote) + "<span>quote" + input.Substring(firstQuote + 1, lastQuote - (firstQuote + 1) + "quote</span>" + input.Substring(lastQuote + 1, lines.Length - (lastQuote + 1); How could I change the input string from input = "helloj\"iojgeio\r\ngsk\\"jopri\"gj\r\negjoijisgoe\"joijsofeij\"\"\"ojgsoij\""; to input = "helloj(<span>quoteiojgeio\r\ngsk\\"jopriquote</span>gj\r\negjoijisgoe<span>quotejoijsofeijquote</span>quote<span>quote</span>ojgsoij<span>quote</span>";

    Read the article

  • Extract a sentence out of sentences separated by delimitors

    - by Laura
    Below is a sample line I have extracted from a website: below a satisfactory level; &quot;an off year for tennis&quot;; &quot;his performance was off&quot; The output displays as: below a satisfactory level; "an off year for tennis"; "his performance was off" I want to get only the first sentence "below a satisfactory level"; Here is the code I have tried after exploring many stackoverflow posts: $data=explode('; ',$str); echo $data[0]; But somehow it is not working. Thanks in advance.

    Read the article

  • Regexp look-behind to match internet speeds

    - by Sandman
    So the user may search for "10 mbit" after which I want to capture the "10" so I can use it in a speed-search rather than a string-search. This isn't a problem, the below regexp does this fine: if (preg_match("/(\d+)\smbit/", $string)){ ... } But, the user may search for something like "10/10 mbit" or "10-100 mbit". I don't want to match those with the above regexp - they should be handled in another fashion. So I would like a regexp that matches "10 mbit" if the number is all-numeric as a whole word (i.e. contained by whitespace, newline or lineend/linestart) Using lookbehind, I did this: if (preg_match("#(?<!/)(\d+)\s+mbit#i", $string)){ Just to catch those that doesn't have "/" before them, but this matched true for this string: "10/10 mbit" so I'm obviously doing something wrong here, but what?

    Read the article

  • Delete all characters in a multline string up to a given pattern

    - by biffabacon
    Using Python I need to delete all charaters in a multiline string up to the first occurrence of a given pattern. In Perl this can be done using regular expressions with something like: #remove all chars up to first occurrence of cat or dog or rat $pattern = 'cat|dog|rat' $pagetext =~ s/(.*?)($pattern)/$2/xms; What's the best way to do it in Python?

    Read the article

  • Is there an easy method to combine two relative paths in C# ?

    - by Ioannis
    I want to combine two relative paths in C#. For example: string path1 = "/System/Configuration/Panels/Alpha"; string path2 = "Panels/Alpha/Data"; I want to return string result = "/System/Configuration/Panels/Alpha/Data"; I can implement this by splitting the second array and compare it in a for loop but I was wondering if there is something similar to Path.Combine available or if this can be accomplished with regular expressions or Linq? Thanks

    Read the article

  • Jakarta Regexp 1.5 Backreferences?

    - by Matt Smith
    Why does this match: String str = "099.9 102.2" + (char) 0x0D; RE re = new RE("^([0-9]{3}.[0-9]) ([0-9]{3}.[0-9])\r$"); System.out.println(re.match(str)); But this does not: String str = "099.9 102.2" + (char) 0x0D; RE re = new RE("^([0-9]{3}.[0-9]) \1\r$"); System.out.println(re.match(str)); The back references don't seem to be working... What am I missing?

    Read the article

  • How to use PHP preg_replace regular expression to find and replace text

    - by Roger
    I wrote this PHP code to make some substitutions: function cambio($txt){ $from=array( '/\+\>([^\+\>]+)\<\+/', //finds +>text<+ '/\%([^\%]+)\%/', //finds %text% ); $to=array( '<span class="P">\1</span>', '<span>\1</span>', ); return preg_replace($from,$to,$txt); } echo cambio('The fruit I most like is: +> %apple% %banna% %orange% <+.'); Resulting into this: The fruit I most like is: <span class="P"> <span>apple</span> <span>banna</span> <span>orange</span> </span>. however I needed to identify the fruit's span tags, like this: The fruit I most like is: <span class="P"> <span class="a">apple</span> <span class="b">banna</span> <span class="c">coco</span> </span>. I'd buy a fruit to whom discover a regular expression to accomplish this :-)

    Read the article

  • Dreamweaver regular expression substitution followed by number

    - by mark
    Hi. I'm using Dreamweaver to update copyright dates across my site. I want to preserve the existing spacing (or lack thereof) between years. Examples: © 2002-2008 should update to © 2002-2009 © 2003 - 2008 should update to © 2003 - 2009 This is the regular expression I'm using to accomplish this in Dreamweaver's find & replace function Find: ©\s*(\d{4}\s*-\s*)\d{3}[^9] Replace: © $1 2009 Here's the PROBLEM: This expression works, but has that that extra space between the hyphen and 2009. If I write the replace expression without the space, as © $12009 then dreamweaver looks for the 12,009th substitution in the find expression, and, not finding one, prints $12009. Any ideas?

    Read the article

  • Can a repeated piece of regular expression create multiple groups? Such as this example...

    - by Yousui
    Hi guys, I'm using RUBY 's regular expression to deal with text such as ${1:aaa|bbbb} ${233:aaa | bbbb | ccc ccccc } ${34: aaa | bbbb | cccccccc |d} ${343: aaa | bbbb | cccccccc |dddddd ddddddddd} ${3443:a aa|bbbb|cccccccc|d} ${353:aa a| b b b b | c c c c c c c c | dddddd} I want to get the trimed text between each pipe line. For example, for the first line of my upper example, I want to get the result aaa and bbbb, for the second line, I want aaa, bbbb and ccc ccccc. Now I have wrote a piece of regular expression and a piece of ruby code to test it: array = "${33:aaa|bbbb|cccccccc}".scan(/\$\{\s*(\d+)\s*:(\s*[^\|]+\s*)(?:\|(\s*[^\|]+\s*))+\}/) puts array Now my problem is the (?:\|(\s*[^\|]+\s*))+ part can't create multiple groups. I don't know how to solve this problem, because the number of text I need in each line is variable. Anyone can help? Great thanks.

    Read the article

  • Mod rewrite with multiple query strings

    - by Boris
    Hi, I'm a complete n00b when it comes to regular expressions. I need these redirects: (1) www.mysite.com/products.php?id=001&product=Product-Name&source=Source-Name should become -> www.mysite.com/Source-Name/001-Product-Name (2) www.mysite.com/stores.php?id=002&name=Store-Name should become -> www.mysite.com/002-Store-Name Any help much appreciated :)

    Read the article

  • Help with Regular Expression

    - by shivesh
    Hello I need help with Regular Expression, I want to match each section (number and it's text - 2 groups), the text can be multi line, each section ends when another section starts (another number) or when .END is reached or EOF. Demo Expression: \(\d{1,3}\) ([\s\S]*?)(\.END|\(\d{1,3}\)) Input text: (1) some text some text some text some text some text some text (2) some text some textsome text (3) some textsome text some textsome textsome text (4) some text .END first group should match number (with brackets) and second group should match corresponded text.

    Read the article

  • Regular expression to retrieve everything before first slash

    - by alex
    I need a regular expression to basically get the first part of a string, before the first slash (). For example in the following: C:\MyFolder\MyFile.zip The part I need is "C:" Another example: somebucketname\MyFolder\MyFile.zip I would need "somebucketname" I also need a regular expression to retrieve the "right hand" part of it, so everything after the first slash (excluding the slash.) For example somebucketname\MyFolder\MyFile.zip would return MyFolder\MyFile.zip.

    Read the article

  • [Python] OR in regular expression?

    - by www.yegorov-p.ru
    Hello. I have text file with several thousands lines. I want to parse this file into database and decided to write a regexp. Here's part of file: blablabla checked=12 unchecked=1 blablabla unchecked=13 blablabla checked=14 As a result, I would like to get something like (12,1) (0,13) (14,0) Is it possible?

    Read the article

  • How do I write this URL in Django?

    - by alex
    (r'^/(?P<the_param>[a-zA-z0-9_-]+)/$','myproject.myapp.views.myview'), How can I change this so that "the_param" accepts a URL(encoded) as a parameter? So, I want to pass a URL to it. mydomain.com/http%3A//google.com

    Read the article

< Previous Page | 94 95 96 97 98 99 100 101 102 103 104 105  | Next Page >