Search Results

Search found 3825 results on 153 pages for 'regex negation'.

Page 103/153 | < Previous Page | 99 100 101 102 103 104 105 106 107 108 109 110  | Next Page >

  • Writing a PHP web crawler using cron

    - by Horse
    Hi all I have written myself a web crawler using simplehtmldom, and have got the crawl process working quite nicely. It crawls the start page, adds all links into a database table, sets a session pointer, and meta refreshes the page to carry onto the next page. That keeps going until it runs out of links That works fine however obviously the crawl time for larger websites is pretty tedious. I wanted to be able to speed things up a bit though, and possibly make it a cron job. Any ideas on making it as quick and efficient as possible other than setting the memory limit / execution time higher?

    Read the article

  • Regular expressions in a Python find-and-replace script?

    - by Haidon
    I'm new to Python scripting, so please forgive me in advance if the answer to this question seems inherently obvious. I'm trying to put together a large-scale find-and-replace script using Python. I'm using code similar to the following: findreplace = [ ('term1', 'term2'), ] inF = open(infile,'rb') s=unicode(inF.read(),charenc) inF.close() for couple in findreplace: outtext=s.replace(couple[0],couple[1]) s=outtext outF = open(outFile,'wb') outF.write(outtext.encode('utf-8')) outF.close() How would I go about having the script do a find and replace for regular expressions? Specifically, I want it to find some information (metadata) specified at the top of a text file. Eg: Title: This is the title Author: This is the author Date: This is the date and convert it into LaTeX format. Eg: \title{This is the title} \author{This is the author} \date{This is the date} Maybe I'm tackling this the wrong way. If there's a better way than regular expressions please let me know! Thanks!

    Read the article

  • parse string with regular exression

    - by llamerr
    I trying to parse this string: $right = '34601)S(1,6)[2] - 34601)(11)[2] + 34601)(3)[2,4]'; with following regexp: const word = '(\d{3}\d{2}\)S{0,1}\([^\)]*\)S{0,1}\[[^\]]*\])'; preg_match('/'.word.'{1}(?:\s{1}([+-]{1})\s{1}'.word.'){0,}/', $right, $matches); print_r($matches); i want to return array like this: Array ( [0] => 34601)S(1,6)[2] - 34601)(11)[2] + 34601)(3)[2,4] [1] => 34601)S(1,6)[2] [2] => - [3] => 34601)(11)[2] [4] => + [5] => 34601)(3)[2,4] ) but i return only following: Array ( [0] => 34601)S(1,6)[2] - 34601)(11)[2] + 34601)(3)[2,4] [1] => 34601)S(1,6)[2] [2] => + [3] => 34601)(3)[2,4] ) i think, its becouse of [^)]* or [^]]* in the word, but how i should correct regexp for matching this in another way? i tryied to specify it: \d+(?:[,#]\d+){0,} so word become const word = '(\d{3}\d{2}\)S{0,1}\(\d+(?:[,#]\d+){0,}\)S{0,1}\[\d+(?:[,#]\d+){0,}\])'; but it gives nothing

    Read the article

  • regexp target last main li in list

    - by veilig
    I need to target the starting tag of the last top level LI in a list that may or may-not contain sublists in various positions - without using CSS or Javascript. Is there a simple/elegant regexp that can help with this? I'm no guru w/ them, but it appears the need for greedy/non-greedy selectors when I'm selecting all the middle text (.*) / (.+) changes as nested lists are added and moved around in the list - and this is throwing me off. $pattern = '/^(<ul>.*)<li>(.+<\/li><\/ul>)$/'; $replacement = '$1<li id="lastLi">$3'; Perhaps there is an easier approach?? converting to XML to target the LI and then convert back? ie: Single Element <ul> <li>TARGET</li> </ul> Multiple Elements <ul> <li>foo</li> <li>TARGET</li> </ul> Nested Lists before end <ul> <li> foo <ul> <li>bar</li> </ul> <li> <li>TARGET</li> </ul> Nested List at end <ul> <li>foo</li> <li> TARGET <ul> <li>bar</li> </ul> </li> </ul>

    Read the article

  • javascript split() array contains

    - by Mahesha999
    While learning JavaScript, I did not get why the output when we print the array returned of the Sting.split() method (with regular expression as an argument) is as explained below. var colorString = "red,blue,green,yellow"; var colors = colorString.split(/[^\,]+/); document.write(colors); //this print 7 times comma: ,,,,,,, However when I print individual element of the array colors, it prints an empty string, three commas and an empty string: document.write(colors[0]); //empty string document.write(colors[1]); //, document.write(colors[2]); //, document.write(colors[3]); //, document.write(colors[4]); //empty string document.write(colors[5]); //undefined document.write(colors[6]); //undefined Then, why printing the array directly gives seven commas. Though I think its correct to have three commas in the second output, I did not get why there is a starting (at index 0) and ending empty string (at index 4). Please explain I am screwed up here.

    Read the article

  • PHP: Regular Expression to get a URL from a string

    - by Matthew Iselin
    I'm working on some PHP code which takes input from various sources and needs to find the URLs and save them somewhere. The kind of input that needs to be handled is as follows: http://www.youtube.com/watch?v=IY2j_GPIqRA Try google: http://google.com! (note exclamation mark is not part of the URL) Is http://somesite.com/ down for anyone else? Output: http://www.youtube.com/watch?v=IY2j_GPIqRA http://google.com http://somesite.com/ I've already borrowed one regular expression from the internet which works, but unfortunately wipes the query string out - not good! Any help putting together a regular expression, or perhaps another solution to this problem, would be appreciated.

    Read the article

  • Reading a line backwards

    - by Jimmy
    Hi, I'm using regular expression to count the total spaces in a line (first occurrence). match(/^\s*/)[0].length; However this reads it from the start to end, How can I read it from end to start. Thanks

    Read the article

  • glibc regexp performance

    - by Jack
    Anyone has experience measuring glibc regexp functions? Are there any generic tests I need to run to make such a measurements (in addition to testing the exact patterns I intend to search)? Thanks.

    Read the article

  • multiline sed using backreferences...

    - by pagid
    Hi, I'm converting patch scripts using a commandline script - within these scripts there's the combination two lines like: --- /dev/null +++ filename.txt which needs to be converted to: --- filename.txt +++ filename.txt Initially I tried: less file.diff | sed -e "s/---\/dev\null\n+++ \(.*\)/--- \1\n+++ \1/" But I had to find out that multiline-handling is much more complex in sed :( Any help is appreciated...

    Read the article

  • Correct syntax for matching a string inside a variable against an array

    - by Jamex
    Hi, I have a variable, $var, that contains a string of characters, this is a dynamic variable that contains the values from inputs. $var could be 'abc', or $var could be 'blu', I want to match the string inside variable against an array, and return all the matches. $array = array("blue", "red", "green"); What is the correct syntax for writing the code in php, my rough code is below $match = preg_grep($var, $array); (incorrect syntax of course) I tried to put quotes and escape slashes, but so far no luck. Any suggestion? TIA

    Read the article

  • validation of special characters

    - by jpallavi
    I want to validate login name with special characters !@#S%^*()+_-?/<:"';. space using regular expression in ruby on rails. These special characters should not be acceptable. What is the code for that? Thanks, Pallavi

    Read the article

  • replace <br> to new line between pre tag

    - by saturngod
    I want to convert <p>Code is following</p> <pre> &lt;html&gt;<br>&lt;/html&gt; </pre> to <p>Code is following</p> <pre> &lt;html&gt; &lt;/html&gt; </pre> I don't know how to write regular expression for replace between pre tag in PHP. I tried this code http://stackoverflow.com/questions/1517102/replace-newlines-with-br-tags-but-only-inside-pre-tags but it's not working for me.

    Read the article

  • Square Brackets in Python Regular Expressions (re.sub)

    - by user1479984
    I'm migrating wiki pages from the FlexWiki engine to the FOSwiki engine using Python regular expressions to handle the differences between the two engines' markup languages. The FlexWiki markup and the FOSwiki markup, for reference. Most of the conversion works very well, except when I try to convert the renamed links. Both wikis support renamed links in their markup. For example, Flexwiki uses: "Link To Wikipedia":[http://www.wikipedia.org/] FOSwiki uses: [[http://www.wikipedia.org/][Link To Wikipedia]] both of which produce something that looks like I'm using the regular expression renameLink = re.compile ("\"(?P<linkName>[^\"]+)\":\[(?P<linkTarget>[^\[\]]+)\]") to parse out the link elements from the FlexWiki markup, which after running through something like "Link Name":[LinkTarget] is reliably producing groups <linkName> = Link Name <linkTarget = LinkTarget My issue occurs when I try to use re.sub to insert the parsed content into the FOSwiki markup. My experience with regular expressions isn't anything to write home about, but I'm under the impression that, given the groups <linkName> = Link Name <linkTarget = LinkTarget a line like line = renameLink.sub ( "[[\g<linkTarget>][\g<linkName>]]" , line ) should produce [[LinkTarget][Link Name]] However, in the output to the text files I'm getting [[LinkTarget [[Link Name]] which breaks the renamed links. After a little bit of fiddling I managed a workaround, where line = renameLink.sub ( "[[\g<linkTarget>][ [\g<linkName>]]" , line ) produces [[LinkTarget][ [[Link Name]] which, when displayed in FOSwiki looks like <[[Link Name> <--- Which WORKS, but isn't very pretty. I've also tried line = renameLink.sub ( "[[\g<linkTarget>]" + "[\g<linkName>]]" , line ) which is producing [[linkTarget [[linkName]] There are probably thousands of instances of these renamed links in the pages I'm trying to convert, so fixing it by hand isn't any good. For the record I've run the script under Python 2.5.4 and Python 2.7.3, and gotten the same results. Am I missing something really obvious with the syntax? Or is there an easy workaround?

    Read the article

  • JavaScript Regular expressions, match and replace link

    - by Thoman
    Hello please help me <html> <body> http://domainname.com/abc/xyz.zip http://domainname2.com/abc/xyz.zip </body> </html> I want replace with link and out put like <html> <body> <a href="http://domainname.com/abc/xyz.zip">http://domainname.com/abc/xyz.zip</a> <a href="http://domainname2.com/abc/xyz.zip">http://domainname2.com/abc/xyz.zip</a> </body> </html> Great Thank

    Read the article

  • Way to partialy match a Ruby string using Regexp

    - by Fabiano PS
    I'm working on 2 cases: assume I have those var: a = "hello" b = "hello-SP" b = "not_hello" 1 - Any partial matches I want to accept any string that has the var a inside, so b and c would match. 2 - Patterned match I want to match a string that has a inside, followed by '-', so b would match, c does not. I am having problem, because I always used the syntax /expression/ to define Regexp, so how dinamicaly define an RegExp on Ruby??

    Read the article

  • Find and replace braced tags within a MySQL table

    - by Cy
    I have about 40000 records in that table that contains plain text and within the plain text, contains that kind of tags which its only characteristic is that they are braced between [ ] [caption id="attachment_2948" align="alignnone" width="480" caption="the caption goes here"] How could I remove those? (replace by nothing) I could also run a PHP program if necessary to do the cleanup.

    Read the article

  • Filter list of phone numbers using php

    - by LiveEn
    I have a list of phone numbers that start with the below numbers and in different formats...i need to grab the numbers that start only with the below numbers/format using php...... 020 8 07974 +44 (0) 20 +44 0 440203 any help will be appreciated..

    Read the article

  • How to grep lines having specific format.

    - by Nitin
    I have got a file with following format. 1234, 'US', 'IN',...... 324, 'US', 'IN',...... ... ... 53434, 'UK', 'XX', .... ... ... 253, 'IN', 'UP',.... 253, 'IN', 'MH',.... Here I want to extract only those lines having 'IN' as 2nd keyword. i.e. 253, 'IN', 'UP',.... 253, 'IN', 'MH',.... Can any one please tell me a command to grep it.

    Read the article

  • Regular Expression Pattern for C# with matches

    - by Sumit Gupta
    I am working on project where I need to find Frequency from a given text. I wrote a Regular expression that try to detect frequency, however I am stuck with how C# handle it and how exactly I use it in my software My regular experssion is (\d*)(([,\.]?\s*((k|m)?hz)*)|(\s*((k|m)?hz)*))$ And I am trying to find value from 23,2 Hz 24,4Hz 25,0 Hzsadf 26 Hz 27Khz 28hzzhzhzhdhdwe 29 30.4Hz 31.8 Hz 4343.34.234 Khz 65SD Further Explanation: System needs to work for US and Belgium Culture hence, 23.2 (US) = 23,2 (Be) I try to find a Digit, followed by either khz,mhz,hz or space or , or . If it is , or . then it should have another Digit followed by khz, mhz, hz Any help is appericated.

    Read the article

  • How to avoid resetting the java Scanner position

    - by Derek
    I have some code that looks more or less like this: while(scanner.hasNext()) { if(scanner.findInLine("Test") !=null) { //do some things }else{ scanner.nextLine(); } } I am using this to parse an ~10MB text file. The problem is, if I put a breakpoint on the while() and the scanner.nextLine(), I can see that sometimes the scanners position (in the debug window) goes back to zero. I think this is causing me some kind of loop blow up, because the regext in findInLine() starts at zero, looks through some amount of text, advancing the position, and then it randomly gets set back to zero, so it has to re-parse all that text again. Any ideas what can be causing that? Am I even doing this the right way? Thanks Some additional info: The Scanner is instantiated from an InputStream. After diubg sine debugging, it appears that there is a HearCharBuffer that Scanner uses and it only allows 1024 characters at a time, and then resets. Is there a way to avoid this, or do things differently? That seems like a small amount of characters to be able to scan. Derek

    Read the article

  • string substitution regular expression not working in tcl

    - by Puneet Mittal
    i am trying to replace all the special characters including white space, hyphen, etc, to underscore, from a string variable in tcl. I wrote the code below but it doesn't seem to be working. set varname $origVar puts "Variable Name :>> $varname" if {$varname != ""} { regsub -all {[\s-\]\[$^?+*()|\\%&#]} $varname "_" $newVar } puts "New Variable :>> $newVar" one issue is that, instead of replacing the string in $varname, it is replacing the data inside $origVar. No idea why, and also i read the example code (for proper syntax) in my tcl book and according to that it should be something like this regsub -all {[\s-][$^?+*()|\\%&#]} $varname "_" newVar so i used the same syntax but it didn't work and gave the same result as modifying the $origVar instead of required $varname value.

    Read the article

  • Using `rack-rewrite` to Remove the Month and Date from a Permlink

    - by Bryan Veloso
    I've started the process of moving my blog to Octopress, but unfortunately, a limitation of Jekyll doesn't allow me to use abbreviated month names for my permalinks. Therefore I'm looking to just get rid of the month and day bits altogether. I'ved read in this article that you can use rack-rewrite to take care of the redirection, since I am using Heroku to host this. So how would I turn: This: example.com/journal/2012/jan/03/post-of-the-day/ Into this: example.com/journal/2012/post-of-the-day/ Extra points: If I had another rule that redirected /blog/ to /journal/, would that rule still adhere to the above one as well? So from: This: example.com/blog/2012/jan/03/post-of-the-day/ To this: example.com/journal/2012/jan/03/post-of-the-day/ And finally to: example.com/journal/2012/post-of-the-day/ Thanks for the assistance in advance. :)

    Read the article

  • How do I process the largest match first in PHP?

    - by animuson
    Ok, so I tried searching around first but I didn't exactly know how to word this question or a search phrase. Let me explain. I have data that looks like this: <!-- data:start --> <!-- 0:start --> <!-- 0:start -->0,9<!-- 0:stop --> <!-- 1:start -->0,0<!-- 1:stop --> <!-- 2:start -->9,0<!-- 2:stop --> <!-- 3:start -->9,9<!-- 3:stop --> <!-- 4:start -->0,9<!-- 4:stop --> <!-- 0:stop --> <!-- 1:start --> <!-- 0:start -->1,5<!-- 0:stop --> <!-- 1:start -->1,6<!-- 1:stop --> <!-- 2:start -->3,6<!-- 2:stop --> <!-- 3:start -->3,8<!-- 3:stop --> <!-- 4:start -->4,8<!-- 4:stop --> <!-- 1:stop --> <!-- 2:start --> <!-- 0:start -->0,7<!-- 0:stop --> <!-- 1:start -->1,7<!-- 1:stop --> <!-- 2:stop --> <!-- data:stop --> So it's basically a bunch of points. Here is the code I'm currently using to try and parse it so that it would create an array like so: Array ( 0 => Array ( 0 => "0,9", 1 => "0,0", 2 => "9,0", 3 => "9,9", 4 => "0,9" ), 1 => Array ( 0 => "1,5", 1 => "1,6", 2 => "3,6", 3 => "3,8", 4 => "4,8" ), 2 => Array ( 0 => "0,7", 1 => "1,7" ) ) However, it is returning an array that looks like this: Array ( 0 => "0,9", 1 => "0,0", 2 => "9,0" ) Viewing the larger array that I have on my screen, you see that it's setting the first instance of that variable when matching. So how do I get it to find the widest match first and then process the insides. Here is the function I am currently using: function explosion($text) { $number = preg_match_all("/(<!-- ([\w]+):start -->)\n?(.*?)\n?(<!-- \\2:stop -->)/s", $text, $matches, PREG_SET_ORDER); if ($number == 0) return $text; else unset($item); foreach ($matches as $item) if (empty($data[$item[2]])) $data[$item[2]] = $this->explosion($item[3]); return $data; } I'm sure it will be something stupid and simple that I've overlooked, but that just makes it an easy answer for you I suppose.

    Read the article

  • How would I create a VIM or Vi command to delete all text after a certain character for every line i

    - by Jason Down
    Scenario: I have a text file that has pipe (as in the "|" character) delimited data. Each field of data in the pipe delimited fields can be of variable length, so counting characters won't work (or using some sort of substring function... if that even exists in VIM). Is it possible, using VIM / Vi to delete all data from the second pipe to the end of the line for the entire file? There are approx 150,000 lines, so doing this manually would only be appealing to a masochist... e.g. Change the following lines from: 1111|random sized text 12345|more random data la la la|1111|abcde 2222|random sized text abcdefghijk|la la la la|2222|defgh 3333|random sized text|more random data|33333|ijklmnop to: 1111|random sized text 12345 2222|random sized text abcdefghijk 3333|random sized text I'm sure this can be done somehow... I hope. TIA UPDATE: I should have mentioned that I'm running this on Windows XP, so I don't have access to some of the mentioned *nix commands (CUT is not recognized on Windows).

    Read the article

< Previous Page | 99 100 101 102 103 104 105 106 107 108 109 110  | Next Page >