Search Results

Search found 4539 results on 182 pages for 'regex grouping'.

Page 108/182 | < Previous Page | 104 105 106 107 108 109 110 111 112 113 114 115  | Next Page >

  • What regular expression(s) would I use to remove escaped html from large sets of data.

    - by Elizabeth Buckwalter
    Our database is filled with articles retrieved from RSS feeds. I was unsure of what data I would be getting, and how much filtering was already setup (WP-O-Matic Wordpress plugin using the SimplePie library). This plugin does some basic encoding before insertion using Wordpress's built in post insert function which also does some filtering. I've figured out most of the filters before insertion, but now I have whacko data that I need to remove. This is an example of whacko data that I have data in one field which the content I want in the front, but this part removed which is at the end: <img src="http://feeds.feedburner.com/~ff/SoundOnTheSound?i=xFxEpT2Add0:xFbIkwGc-fk:V_sGLiPBpWU" border="0"></img> <img src="http://feeds.feedburner.com/~ff/SoundOnTheSound?d=qj6IDK7rITs" border="0"></img> &lt;img src=&quot;http://feeds.feedburner.com/~ff/SoundOnTheSound?i=xFxEpT2Add0:xFbIkwGc-fk:D7DqB2pKExk&quot; Notice how some of the images are escape and some aren't. I believe this has to do with the last part being cut off so as to be unrecognizable as an html tag, which then caused it to be html endcoded. Another field has only this which is now filtered before insertion, but I have to get rid of the others: &lt;img src=&quot;http://farm3.static.flickr.com/2183/2289902369_1d95bcdb85.jpg&quot; alt=&quot;post_img&quot; width=&quot;80&quot; (all examples are on one line, but broken up for readability) Question: What is the best way to work with the above escaped html (or portion of an html tag)? I can do it in Perl, PHP, SQL, Ruby, and even Python. I believe Perl to be the best at text parsing, so that's why I used the Perl tag. And PHP times out on large database operations, so that's pretty much out unless I wanted to do batch processing and what not. PS One of the nice things about using Wordpress's insert post function, is that if you use php's strip_tags function to strip out all html, insert post function will insert <p> at the paragraph points. Let me know if there's anything more that I can answer. Some article that didn't quite answer my questions. (http://stackoverflow.com/questions/2016751/remove-text-from-within-a-database-text-field) (http://stackoverflow.com/questions/462831/regular-expression-to-escape-html-ampersands-while-respecting-cdata)

    Read the article

  • Regular expression to retrieve everything before first slash

    - by alex
    I need a regular expression to basically get the first part of a string, before the first slash (). For example in the following: C:\MyFolder\MyFile.zip The part I need is "C:" Another example: somebucketname\MyFolder\MyFile.zip I would need "somebucketname" I also need a regular expression to retrieve the "right hand" part of it, so everything after the first slash (excluding the slash.) For example somebucketname\MyFolder\MyFile.zip would return MyFolder\MyFile.zip.

    Read the article

  • Regexp look-behind to match internet speeds

    - by Sandman
    So the user may search for "10 mbit" after which I want to capture the "10" so I can use it in a speed-search rather than a string-search. This isn't a problem, the below regexp does this fine: if (preg_match("/(\d+)\smbit/", $string)){ ... } But, the user may search for something like "10/10 mbit" or "10-100 mbit". I don't want to match those with the above regexp - they should be handled in another fashion. So I would like a regexp that matches "10 mbit" if the number is all-numeric as a whole word (i.e. contained by whitespace, newline or lineend/linestart) Using lookbehind, I did this: if (preg_match("#(?<!/)(\d+)\s+mbit#i", $string)){ Just to catch those that doesn't have "/" before them, but this matched true for this string: "10/10 mbit" so I'm obviously doing something wrong here, but what?

    Read the article

  • Using awk to return only certain chunks of data

    - by Koriar
    I'm not 100% certain how to phrase my question simply, so I apologize if this has been answered somewhere and I was just unable to find it. What I have are debug logs with authentication packets in them along with a bunch of other output. I need to search through about 2 million lines of logs to find every packet that contains a certain mac address. The packets look something like this (slightly censored): -----------------[ header ]----------------- Event: Authd-Response (1900) Sequence: -54 Timestamp: 1969-12-31 19:30:00 (0) ---------------[ attributes ]--------------- Auth-Result = Auth-Accept Service-Profile-SID = 53 Service-Profile-SID = 49 RADIUS-Access-Accept-Attr/WiMAX-Capability = 0x(numbers) Session-Timeout = 3600 Service-Profile-SID = 4 Service-Profile-SID = 29 Chargeable-User-Identity = "(Numbers)" User-Password = "(the MAC address I'm looking for)" -------------------------------------------- However there are about 10 different possible types with different possible lengths. They all start with the header line and end with the all-dashes line. I've had success using awk to get the code blocks themselves using this: awk '/-----------------\[ header \]-----------------/,/--------------------------------------------/' filename.txt But I was hoping to be able to use it to return only the packets which contain the MAC address that I need. I've been trying to figure this out for a few days now and I'm pretty stuck. I could try and write a bash script, but I could swear that I've used awk to do something like this before...

    Read the article

  • jquery sortable with regexp

    - by Chris Lively
    I am trying to figure out the right regexp to match on list item id's. For example: <ul id="MyList" class="connectedSortable"> <li id="id=1-32">Item 1</li> <li id="id=2_23">Item 2</li> <li id="id=3">Item 3</li> <li id="id=4">Item 4</li> <li id="id=5">Item 5</li> <li id="id=6">Item 6</li> </ul> On the serialize method, I want it to pull everything after the equal sign (=) $(function () { $("#MyList, #OtherList").sortable({ connectWith: '.connectedSortable', update: function () { $("#MyListOrder").val($("#MyList").sortable('serialize', { regexp: '/(.+)[=](.+)/)' })); } }).disableSelection(); }); I tried the above, but that didn't quite work. My regexp expression is wrong and I don't know what it should be. Ideas?

    Read the article

  • Confusion in RegExp Reluctant quantifier? Java

    - by Dusk
    Hi, Could anyone please tell me the reason of getting an output as: ab for the following RegExp code using Relcutant quantifier? Pattern p = Pattern.compile("abc*?"); Matcher m = p.matcher("abcfoo"); while(m.find()) System.out.println(m.group()); // ab and getting empty indices for the following code? Pattern p = Pattern.compile(".*?"); Matcher m = p.matcher("abcfoo"); while(m.find()) System.out.println(m.group());

    Read the article

  • best REGEXP friendly Text Editors + most powerful REGEXP syntax?

    - by John
    I am fluent with Microsoft Visual 2005 regular expressions and they are a big time saver. I seem to learn them best by having a vaguely organized cheat sheet thrown at me, at which point I read just a little and play with them until I understand what's going on. That learning approach has worked well for me, for now. I would really like to take this to the next level though. Basically -- What is the REGEXP convention that is generally regarded as the most open-ended and powerful? VS2005 Regexps seem kind of gimped, so maybe I'm a kid playing in a sandbox. Are there text editors out there that can perform a highlight all matches, list lines containing string, or some kind of powerful function like that in conjunction with the very strongest REGEXP language? If not I can just use multiple programs and a weird technique but I'd like to avoid that. I wonder if a stronger REGEXP language or a "stronger" regEXP writer might be able to have his search match all results on all lines even by clicking a "find next" by adding some simple criteria to the search. Anyway, please provide advice!

    Read the article

  • Matching several items inside one string with preg_match_all() and end characters

    - by nefo_x
    I have the following code: preg_match_all('/(.*) \((\d+)\) - ([\d\.\d]+)[,?]/U', "E-Book What I Didn't Learn At School... (2) - 3525.01, FREE Intro DVD/Vid (1) - 0.15", $match); var_dump($string, $match); and get the following ouput: array(4) { [0]=> array(1) { [0]=> string(54) "E-Book What I Didn't Learn At School... (2) - 3525.01," } [1]=> array(1) { [0]=> string(39) "E-Book What I Didn't Learn At School..." } [2]=> array(1) { [0]=> string(1) "2" } [3]=> array(1) { [0]=> string(7) "3525.01" } } which matches only one items... what i need is to get all items from such strings. when i've added "," sign to the end of the string - it worked fine. but that is non-sense in adding comma to each string. Any advice?

    Read the article

  • How to export the matches only in a pattern search in vim?

    - by Mert Nuhoglu
    Is there a way to grab and export the match part only in a pattern search without changing the current file? For example, from a file containing: 57","0","37","","http://www.thisamericanlife.org/Radio_Episode.aspx?episode=175" 58","0","37","","http://www.thisamericanlife.org/Radio_Episode.aspx?episode=170" I want to export a new file containing: http://www.thisamericanlife.org/Radio_Episode.aspx?episode=175 http://www.thisamericanlife.org/Radio_Episode.aspx?episode=170 I can do this by using substitution like this: :s/.\{-}\(http:\/\/.\{-}\)".\{-}/\1/g :%w>>data But the substitution command changes the current file. Is there a way to do this without changing the current file?

    Read the article

  • Parsing HTML with XPath and PHP

    - by Peter
    Is there a way (using XPath and PHP) to do the following (WITHOUT external XSLT files)? Remove all tables and their contents Remove everything after the first h1 tag Keep only paragraphs (INCLUDING their inner HTML (links, lists, etc)) I received an XSLT answer here, but I'm looking for XPATH queries that don't require external files. Currently, I've got the HTML in question loaded into a SimpleXmlElement via: $doc = @DOMDocument::loadHTML($xml); $data = simplexml_import_dom($doc); Now I need help with: $data = $data->xpath('??????'); Been working with this one for several days to no avail. I really appreciate the help. Edit: I don't particularly care what's inside the paragraphs, as I can use strip_tags to eliminate what I don't want. All I need to do is to isolate the paragraphs from the rest of the source. I suppose a more specific, accurate requirement would be this: Return only paragraphs (and their html contents) that aren't contained in tables, and only before the first h1 tag

    Read the article

  • C# regular expression

    - by vert
    How would I write a regular expression (C#) which will check a given string to see if any of its characters are characters OTHER than the following: a-z A-Z Æ æ Å å Ø ø - ' Thanks!

    Read the article

  • Is there a way to get the PREMATCH ($`) and POSTMATCH ($') from pcrecpp?

    - by Eric Peers
    Is there a way to obtain the C++ equivalent of Perl's PREMATCH ($`) and POSTMATCH ($') from pcrecpp? I would be happy with a string, a char *, or pairs indices/startpos+length that point at this. StringPiece seems like it might accomplish part of this, but I'm not certain how to get it. in perl: $_ = "Hello world"; if (/lo\s/) { $pre = $`; #should be "Hel" $post = $'; #should be "world" } in C++ I would have something like: string mystr = "Hello world"; //do I need to map this in a StringPiece? if (pcrecpp::RE("lo\s").PartialMatch(mystr)) { //should I use Consume or FindAndConsume? //What should I do here to get pre+post matches??? } pcre plainjane c seems to have the ability to return the vector with the matches including the "end" portion of the string, so I could theoretically extract such a pre/post variable, but that seems like a lot of work. I like the simplicty of the pcrecpp interface. Suggestions? Thanks! --Eric

    Read the article

  • regular expression and escaping

    - by pstanton
    Sorry if this has been asked, my search brought up many off topic posts. I'm trying to convert wildcards from a user defined search string (wildcard is "*") to postgresql like wildcard "%". I'd like to handle escaping so that "%" => "\%" and "\*" => "*" I know i could replace \* with something else prior to replacing * and then swap it back, but i'd prefer not to and instead only convert * using a pattern that selects it when not proceeded by \. String convertWildcard(String like) { like = like.replaceAll("%", "\\%"); like = like.replaceAll("\\*", "%"); return like; } Assert.assertEquals("%", convertWildcard("*")); Assert.assertEquals("\%", convertWildcard("%")); Assert.assertEquals("*", convertWildcard("\*")); // FAIL Assert.assertEquals("a%b", convertWildcard("a*b")); Assert.assertEquals("a\%b", convertWildcard("a%b")); Assert.assertEquals("a*b", convertWildcard("a\*b")); // FAIL ideas welcome.

    Read the article

  • How to find which delimiter was used during string split (VB.NET)

    - by typoknig
    Hi all, lets say I have a string that I want to split based on several characters, like ".", "!", and "?". How do I figure out which one of those characters split my string so I can add that same character back on to the end of the split segments in question? Dim linePunctuation as Integer = 0 Dim myString As String = "some text. with punctuation! in it?" For i = 1 To Len(myString) If Mid$(entireFile, i, 1) = "." Then linePunctuation += 1 Next For i = 1 To Len(myString) If Mid$(entireFile, i, 1) = "!" Then linePunctuation += 1 Next For i = 1 To Len(myString) If Mid$(entireFile, i, 1) = "?" Then linePunctuation += 1 Next Dim delimiters(3) As Char delimiters(0) = "." delimiters(1) = "!" delimiters(2) = "?" currentLineSplit = myString.Split(delimiters) Dim sentenceArray(linePunctuation) As String Dim count As Integer = 0 While linePunctuation > 0 sentenceArray(count) = currentLineSplit(count)'Here I want to add what ever delimiter was used to make the split back onto the string before it is stored in the array.' count += 1 linePunctuation -= 1 End While

    Read the article

  • Help with Regular Expression

    - by shivesh
    Hello I need help with Regular Expression, I want to match each section (number and it's text - 2 groups), the text can be multi line, each section ends when another section starts (another number) or when .END is reached or EOF. Demo Expression: \(\d{1,3}\) ([\s\S]*?)(\.END|\(\d{1,3}\)) Input text: (1) some text some text some text some text some text some text (2) some text some textsome text (3) some textsome text some textsome textsome text (4) some text .END first group should match number (with brackets) and second group should match corresponded text.

    Read the article

  • preg_replace only replaces first occurrence then skips to next line

    - by Dom
    Got a problem where preg_replace only replaces the first match it finds then jumps to the next line and skips the remaining parts on the same line that I also want to be replaced. What I do is that I read a CSS file that sometimes have multiple "url(media/pic.gif)" on a row and replace "media/pic.gif" (the file is then saved as a copy with the replaced parts). The content of the CSS file is put into the variable $resource_content: $resource_content = preg_replace('#(url\((\'|")?)(.*)((\'|")?\))#i', '${1}'.url::base(FALSE).'${3}'.'${4}', $resource_content); Does anyone know a solution for why it only replaces the first match per line?

    Read the article

  • sed - trying to replace first occurrence after a match

    - by wakkaluba
    I am facing a situation that drives me nuts. I am setting up an update server which uses a json file. Don't ask why or how, it sucks and is my only possibility to achieve it. I have been trying and researching for HOURS (many) because I went ballistic and wanted to crack this on my own. But I have to realize I got stuck and need help. So sorry for this chunk but I think it is somewhat important to see... The file is a one liner and repeating the following sequence with changing values (of course). "plugin_name_foo_bar": {"buildDate": "bla", "dependencies": [{"name": "bla", "optional": true, "version": "1.00"}], "developers": [{"developerId": "bla", "email": "[email protected]", "name": "Bla bla2nd"}], "excerpt": "some text {excerpt} !bla.png|thumbnail,border=1! ", "gav": "bla", "labels": ["report", "scm-related"], "name": "plugin_name_foo_bar", "previousTimestamp": "bla", "previousVersion": "1.0", "releaseTimestamp": "bla", "requiredCore": "1", "scm": "github.com", "sha1": "ynnBM2jWo25ZLDdP3ybBOnV/Pio=", "title": "bla", "url": "http://bla.org", "version": "1.0", "wiki": "https://bla.org"}, "Exclusion": {"buildDate": "bla", "dependencies": [], and the next plugin block is glued straight afterwards. What I now want to do is to search for "plugin_foo_bar": {" as this is the unique identifier for a new plugin description block. I want to replace the first sha1 value occuring afterwards. That's where I keep failing. I always grab the first,last or any occurrence in the entire file and not the block :( "title" is the unique identifier after the sha1 value. So I tried to make the .* less greedy but it ain't working out. last attempt was heading towards: sed -i 's/("name": "plugin_name_foo_bar.*sha1": ")([a-zA-Z0-9!@#\$%^&*()\[\]]*)(", "title"\)/\1blablabla\2/1' default.json to find the sha1 value of that plugin but still no joy. I hope someone knows - preferably a simpler approach - before I now continue with trial and error until I have to puke and freakout. I am working with SED on Windows, so Unix approach might help me to figure out how to achieve this in batch but please make it as one-liner if possible. Scripts are a real pain to convert. And I just need SED and no other solution with other tools like AWK. That is absolutely out of discussion. Any help is appreciated :) Cheers Jan

    Read the article

< Previous Page | 104 105 106 107 108 109 110 111 112 113 114 115  | Next Page >