regex for html - Page 200

regular expression remove comment tag

- by Thoman

I want remove html commnent tag </div> Abc after remove </div> Abc

Get main article image with PHP

Hello! I'd like to get the main image for an article, much like Facebook does when you post a link (but without the choosing image part). The data we have to work with is the whole pages HTML as a variable. The page & URL will be different for every time this function runs. Are there any libraries or classes that are particularly good at getting the main body of content, much like Instapaper that would be of any help?

Read the article

getting all of the image absolute path in a page?

- by ryanxu

I am trying to get the src of all of the images in a page. But some pages use absolute paths and some do not. So I am wondering whats the best way to do this? right now I am using this. $imgsrc_regex = '#<\s*img [^\>]*src\s*=\s*(["\'])(.*?)\1#im'; preg_match_all($imgsrc_regex, $html, $matches);

Read the article

php: trim br tags from the beginning of a string?

- by Thanos

I know that: preg_replace('<br\s*\/?>', '', $string); will remove all br tags from $string... How can we remove all <br><br/><br /> tags only if they are in the very beginning of $string? ($string in my case is html code with various tags...)

Read the article

CSS, HTML issue. How would I get the main body of the document to be a certain way down from the top?

- by orano10000

Basically I have a navbar and a title, that both have the properties, position: fixed; top: (VALUE I INSERTED); My problem is that obviously when I write the main body of the document the text is underneath the title/navbar. I need to get the text to be underneath the title and navbar, but without it having a fixed position. If any more information with code is needed, just comment saying so. Thanks in advance.

Read the article

Array with nested values. Display in ul list. php html.

- by btwong

i have a record set returned from a data base that is looking like this: id | level | lft | rgt | title --------------------------------- 1 | | 1 | 8 | title 1 2 | - | 2 | 5 | sub title 1-1 3 | -- | 3 | 4 | sub sub title 1 4 | - | 6 | 7 | sub title 1-2 5 | | 9 | 12 | title 2 6 | - | 10 | 11 | sub title 2 AS you can see its a hierarchy list, with left n right values. I am trying to display this record set in a list with the correct indentation, so that it appears like this: Title 1 Sub title 1-1 Sub sub title sub title 1-2 Title 2 sub title 2 Any pointers to do this with the one record set? Or should i use multiple queries to display this?

Read the article

Forbidden Patterns Check-In Policy in TFS 2010

- by Jaxidian

I've been trying to use the Forbidden Patterns part of the TFS 2010 Power Tools and I'm just not understanding something - I simply cannot get anything to change as I try to use this! I'm using the version that was released recently (I believe April 23, 2010), so it's not an old version. First off, yes, I know it's regex based, so let's clear that doubt... I have tried to block the following scenarios: 1) I have modified all of my T4 EF templates to generate files named EntityName.gen.cs. I then attempted to prevent TFS from wanting to check those files in. I used the regular expression \.gen\.cs\z and it didn't change a single thing! I even tried it without the \z and nadda! 2) I don't want app.config and web.config files to be checked-in by default because we have these things stored into app.config.base and web.config.base files that our build scripts use to generate our per-environment app.config and web.config files. As such, I tried the following regexes and again, nothing worked! web\.config\z, app\.config\z, web\.release\.config\z and web\.debug\.config\z. What is it that I am screwing up with this?

Read the article

Mathematica regular expressions on unicode strings.

- by dreeves

This was a fascinating debugging experience. Can you spot the difference between the following two lines? StringReplace["–", RegularExpression@"[\\s\\S]" -> "abc"] StringReplace["-", RegularExpression@"[\\s\\S]" -> "abc"] They do very different things when you evaluate them. It turns out it's because the string being replaced in the first line consists of a unicode en dash, as opposed to a plain old ascii dash in the second line. In the case of the unicode string, the regular expression doesn't match. I meant the regex "[\s\S]" to mean "match any character (including newline)" but Mathematica apparently treats it as "match any ascii character". How can I fix the regular expression so the first line above evaluates the same as the second? Alternatively, is there an asciify filter I can apply to the strings first? PS: The Mathematica documentation says that its string pattern matching is built on top of the Perl-Compatible Regular Expressions library (http://pcre.org) so the problem I'm having may not be specific to Mathematica.

Read the article

Passing string with (accidental) escape character loses character even though it's a raw string

- by Steen

I have a function with a python doctest that fails because one of the test input strings has a backslash that's treated like an escape character even though I've encoded the string as a raw string. My doctest looks like this: >>> infile = [ "Todo: fix me", "/** todo: fix", "* me", "*/", r"""//\todo stuff to fix""", "TODO fix me too", "toDo bug 4663" ] >>> find_todos( infile ) ['fix me', 'fix', 'stuff to fix', 'fix me too', 'bug 4663'] And the function, which is intended to extract the todo texts from a single line following some variation over a todo specification, looks like this: todos = list() for line in infile: print line if todo_match_obj.search( line ): todos.append( todo_match_obj.search( line ).group( 'todo' ) ) And the regular expression called todo_match_obj is: r"""(?:/{0,2}\**\s?todo):?\s*(?P<todo>.+)""" A quick conversation with my ipython shell gives me: In [35]: print "//\todo" // odo In [36]: print r"""//\todo""" //\todo And, just in case the doctest implementation uses stdout (I haven't checked, sorry): In [37]: sys.stdout.write( r"""//\todo""" ) //\todo My regex-foo is not high by any standards, and I realize that I could be missing something here. EDIT: Following Alex Martellis answer, I would like suggestions on what regular expression would actually match the blasted r"""//\todo fix me""". I know that I did not originally ask for someone to do my homework, and I will accept Alex's answer as it really did answer my question (or confirm my fears). But I promise to upvote any good solutions to my problem here :) I'm using Python 2.6.4 (r264:75706, Dec 7 2009, 18:45:15) Thank you for reading this far (If you skipped directly down here, I understand)

Read the article

Detecting Xml namespace fast

- by Anna Tjsoken

Hello there, This may be a very trivial problem I'm trying to solve, but I'm sure there's a better way of doing it. So please go easy on me. I have a bunch of XSD files that are internal to our application, we have about 20-30 Xml files that implement datasets based off those XSDs. Some Xml files are small (<100Kb), others are about 3-4Mb with a few being over 10Mb. I need to find a way of working out what namespace these Xml files are in order to provide (something like) intellisense based off the XSD. The implementation of this is not an issue - another developer has written the code for this. But I'm not sure the best (and fastest!) way of detecting the namespace is without the use of XmlDocument (which does a full parse). I'm using C# 3.5 and the documents come through as a Stream (some are remote files). All the files are *.xml (I can detect if it was extension based) but unfortunately the Xml namespace is the only way. Right now I've tried XmlDocument but I've found it to be innefficient and slow as the larger documents are awaiting to be parsed (even the 100Kb docs). public string GetNamespaceForDocument(Stream document); Something like the above is my method signature - overloads include string for "content". Would a RegEx (compiled) pattern be good? How does Visual Studio manage this so efficiently? Another college has told me to find a fast Xml parser in C/C++, parse the content and have a stub that gives back the namespace as its slower in .NET, is this a good idea?

Read the article

Backreferences in lookbehind

- by polygenelubricants

Can you use backreferences in a lookbehind? Let's say I want to split wherever behind me a character is repeated twice. String REGEX1 = "(?<=(.)\\1)"; // DOESN'T WORK! String REGEX2 = "(?<=(?=(.)\\1)..)"; // WORKS! System.out.println(java.util.Arrays.toString( "Bazooka killed the poor aardvark (yummy!)" .split(REGEX2) )); // prints "[Bazoo, ka kill, ed the poo, r aa, rdvark (yumm, y!)]" Using REGEX2 (where the backreference is in a lookahead nested inside a lookbehind) works, but REGEX1 gives this error at run-time: Look-behind group does not have an obvious maximum length near index 8 (?<=(.)\1) ^ This sort of make sense, I suppose, because in general the backreference can capture a string of any length (if the regex compiler is a bit smarter, though, it could determine that \1 is (.) in this case, and therefore has a finite length). So is there a way to use a backreference in a lookbehind? And if there isn't, can you always work around it using this nested lookahead? Are there other commonly-used techniques?

Read the article

Regex issue with comma's telling me there are 6 args, instead of intended 4

- by Azher

I have a scenario outline table that looks like the following: Scenario Outline: Verify Full ad details Given I am on the xxx classified home page And I have entered <headline> in the search field & clicked on search When I click on full details Then I should see <headline> <year> <mileage> <price> displaying correctly and successfully Examples: |headline |year |mileage |price | |alfa romeo 166 |2005 |73,000 |6,990 | When I run my scenario it spits out that I have 6 args. But what I thought, I should only have 4 args: headline, year, mileage and price. I am thinking that it is taking the comma's and what is before and after it as two seperate args. Is there any way that I can make cucumber think that there are only 4 args with the example below? I have looked at messing around with regex but I dont seem to be getting anywhere. Any help would be greatly appreciated.

Read the article

String to array or Array to string tips on formats, etc

- by user316841

hi, first of all thanks for taking your time! I'm a junior Dev, working with PHP + mysql. My issue: I'm saving data from a form to my database. From this form, there's only need to save the contacts: Name, phone number, address. But, it would be nice to have a small reference to the user answers. Let's say for each question we've got a value betwee 1 and 4. Since there's no need to create a table just for it, because what's needed is just the personal contacts. I'm thinking of recording each question/answer, as a letter and its correspondent value. Example (A2, B1, C5, D3, etc). My question is: Is there a format I could afterwards, handle easily ? Convert to array (string to array) in case the client change ideas, and ask this data, placed in table columns ? Just to prevent this situation! Example, From (A2, B1, C5 ) to array( "A" = "1", "B" = "1", "C" = "5" ) For now I guess, Regex is the answer, but it's allways hard to figure it out and I'm allways getting in troubles =) Thanks!

Read the article

Extract a pattern from the output of curl

- by allentown

I would like to use curl, on the command line, to grab a url, pipe it to a pattern, and return a list of urls that match that pattern. I am running into problems with greedy aspects of the pattern, and can not seem to get past it. Any help on this would be apprecaited. curl http://www.reddit.com/r/pics/ | grep -ioE "http://imgur\.com/.+(jpg|jpeg|gif|png)" So, grab the data from the url, which returns a mess of html, which may need some linebreaks somehow replaced in, onless the regex can return more than one pattern in a single line. The patter is pretty simple, any string that matches... starts with http://imgur.com/ has A-Z a-z 0-9 (maybe some others) and is so far, 5 chars long, 8 should cover it forever if I wanted to limit that aspect of the patter, which I don't ends in a .grraphic_file_format_extention (jpg, jpeg, gif, png) Thats about it, at that url, with default settings, I should generally get back a good set of images. I would not be objectionable to using the RSS feel url for the same page, it may be easier to parse actually. Thanks everyone!

Read the article

Regular Expression doesn't match

- by dododedodonl

Hi All, I've got a string with very unclean HTML. Before I parse it, I want to convert this: <TABLE><TR><TD width="33%" nowrap=1><font size="1" face="Arial"> NE </font> </TD> <TD width="33%" nowrap=1><font size="1" face="Arial"> DEK </font> </TD> <TD width="33%" nowrap=1><font size="1" face="Arial"> 143 </font> </TD> </TR></TABLE> in NE DEK 143 so it is a bit easier to parse. I've got this regular expression (RegexKitLite): NSString *str = [dataString stringByReplacingOccurrencesOfRegex:@"<TABLE><TR><TD width=\"33%\" nowrap=1><font size=\"1\" face=\"Arial\">(.+?)<\\/font> <\\/TD>(.+?)<TD width=\"33%\" nowrap=1><font size=\"1\" face=\"Arial\">(.+?)<\\/font> <\\/TD>(.+?)<TD width=\"33%\" nowrap=1><font size=\"1\" face=\"Arial\">(.+?)<\\/font> <\\/TD>(.+?)<\\/TR><\\/TABLE>" withString:@"$1 $3 $5"]; I'm no an expert in Regex. Can someone help me out here? Regards, dodo

Read the article

Why does findstr not handle case properly (in some circumstances)?

- by paxdiablo

While writing some recent scripts in cmd.exe, I had a need to use findstr with regular expressions - customer required standard cmd.exe commands (no GnuWin32 nor Cygwin nor VBS nor Powershell). I just wanted to know if a variable contained any upper-case characters and attempted to use: > set myvar=abc > echo %myvar%|findstr /r "[A-Z]" abc > echo %errorlevel% 0 When %myvar% is set to abc, that actually outputs the string and sets errorlevel to 0, saying that a match was found. However, the full-list variant: > echo %myvar%|findstr /r "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]" > echo %errorlevel% 1 does not output the line and it correctly sets errorlevel to 1. In addition: > echo %myvar%|findstr /r "^[A-Z]*$" > echo %errorlevel% 1 also works as expected. I'm obviously missing something here even if it's only the fact that findstr is somehow broken. Why does the first (range) regex not work in this case? And yet more weirdness: > echo %myvar%|findstr /r "[A-Z]" abc > echo %myvar%|findstr /r "[A-Z][A-Z]" abc > echo %myvar%|findstr /r "[A-Z][A-Z][A-Z]" > echo %myvar%|findstr /r "[A]" The last two above also does not output the string!!

Read the article

How to non-greedy multiple lookbehind matches

- by ArtK

Source: <prefix><content1><suffix1><prefix><content2><suffix2> Engine: PCRE RegEx1: (?<=<prefix>)(.*)(?=<suffix1>) RegEx2: (?<=<prefix>)(.*)(?=<suffix2>) Result1: <content1> Result2: <content1><suffix1><prefix><content2> The desired result for RegEx2 is just <content2> but it is obviously greedy. How do I make RegEx2 non-greedy and use only the last matching lookbehind? [I hope I have translated this correctly from the NoteTab syntax. I don't do much RegEx coding. The <prefix>, <content> & <suffix> terms are just meant to represent arbitrary strings. Only the "<" in the "?<=" lookbehind command is significant.] I suspect it is something simple but after too many hours of searching I'm giving up on solving it myself. Thanks for the help Art

Read the article

find and replace values in a flat-file using PHP

- by peirix

I'd think there was a question on this already, but I can't find one. Maybe the solution is too easy... Anyway, I have a flat-file and want to let the user change the values based on a name. I've already sorted out creating new name+value-pairs using the fopen('a') mode, using jQuery to send the AJAX call with newValue and newName. But say the content looks like this: host|http:www.stackoverflow.com folder|/questions/ folder2|/users/ And now I want to change the folder value. So I'll send in folder as oldName and /tags/ as newValue. What's the best way to overwrite the value? The order in the list doesn't matter, and the name will always be on the left, followed by a |(pipe), the value and then a new-line. My first thought was to read the list, store it in an array, search all the [0]'s for oldName, then change the [1] that belongs to it, and then write it back to a file. But I feel there is a better way around this? Any ideas? Maybe regex?

Read the article

Search and replace hundreds of strings in tens of thousands of files?

- by C Johnson

I am looking into changing the file name of hundreds of files in a (C/C++) project that I work on. The problem is our software has tens of thousands of files that including (i.e. #include) these hundreds of files that will get changed. This looks like a maintenance nightmare. If I do this I will be stuck in Ultra-Edit for weeks, rolling hundreds of regex's by hand like so: ^\#include.*["<\\/]stupid_name.*$ with #include <dir/new_name.h> Such drudgery would be worse than peeling hundreds of potatoes in a sunken submarine in the antarctic with a spoon. I think it would rather be ideal to put the inputs and outputs into a table like so: stupid_name.h <-> <dir/new_name.h> stupid_nameb.h <-> <dir/new_nameb.h> stupid_namec.h <-> <dir/new_namec.h> and feed this into a regular expression engine / tool / app / etc... My Ultimate Question: Is there a tool that will do that? Bonus Question: Is it multi-threaded? I looked at quite a few search and replace topics here on this website, and found lots of standard queries that asked a variant of the following question: standard question: Replace one term in N files. as opposed to: my question: Replace N terms in N files. Thanks in advance for any replies.

Read the article

How can I extract a string between matching braces in Perl?

- by Srilesh

My input file is as below : HEADER {ABC|*|DEF {GHI 0 1 0} {{Points {}}}} {ABC|*|DEF {GHI 0 2 0} {{Points {}}}} {ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}} {ABC|*|XYZ:ghi:jkl {JKL 0 372 0} {{Points {}}}} {ABC|*|XYZ:mno:pqr {GHI 0 34 0} {{Points {}}}} { ABC|*|XYZ:abc:pqr {GHI 0 68 0} {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}} } TRAILER I want to extract the file into an array as below : $array[0] = "{ABC|*|DEF {GHI 0 1 0} {{Points {}}}}" $array[1] = "{ABC|*|DEF {GHI 0 2 0} {{Points {}}}}" $array[2] = "{ABC|*|XYZ:abc:def {GHI 0 22 0} {{Points {{F1 1.1} {F2 1.2} {F3 1.3} {F4 1.4}}}}}" .. .. $array[5] = "{ ABC|*|XYZ:abc:pqr {GHI 0 68 0} {{Points {{F1 11.11} {F2 12.10} {F3 14.11} {F4 16.23}}}} }" Which means, I need to match the first opening curly brace with its closing curly brace and extract the string in between. I have checked the below link, but this doesnt apply to my question. http://stackoverflow.com/questions/413071/regex-to-get-string-between-curly-braces-i-want-whats-between-the-curly-braces I am trying but would really help if someone can assist me with their expertise ... Thanks Sri ...

Read the article

Pulling specific entries from RSS feed [PHP]

- by n0s

So, I have an RSS feed with variations of each item. What I want to do is just get entries that contain a specific section of text. For example: <item> <title>RADIO SHOW - CF64K - 05-20-10 + WRAPUP </title> <link>http://linktoradioshow.com</link> <comments>Radio show from 05-20-10</comments> <pubDate>Thu, 20 May 2010 19:12:12 +0200</pubDate> <category domain="http://linktoradioshow.com/browse/199">Audio / Other</category> <dc:creator>n0s</dc:creator> <guid>http://otherlinktoradioshow.com/</guid> <enclosure url="http://linktoradioshow.com/" length="13005" /> </item> <item> <title>RADIO SHOW - CF128K - 05-20-10 + WRAPUP </title> <link>http://linktoradioshow.com</link> <comments>Radio show from 05-20-10</comments> <pubDate>Thu, 20 May 2010 19:12:12 +0200</pubDate> <category domain="http://linktoradioshow.com/browse/199">Audio / Other</category> <dc:creator>n0s</dc:creator> <guid>http://otherlinktoradioshow.com/</guid> <enclosure url="http://linktoradioshow.com/" length="13005" /> </item> I only want to display the results that contain the string CF64K. While it's probably really simple regex, I can't seem to wrap my head around getting it right. I always get seem to only be able to display the string 'CF64K', and not the stuff that surrounds it. Thanks in advance.

Read the article

How to extract block of XML from a log file on Linux

- by dragonmantank

I have a log file that looks like the following: 2010-05-12 12:23:45 Some sort of log entry 2010-05-12 01:45:12 Request XML: <RootTag> <Element>Value</Element> <Element>Another Value</Element> </RootTag> 2010-05-12 01:45:32 Response XML: <ResponseRoot> <Element>Value</Element> </ResponseRoot> 2010-05-12 01:45:49 Another log entry What I want to do is extract the Request and Response XML (and ultimately dump them into their own single files). I had a similar parser that used egrep but the XML was all on one line, not multiple ones like above. The log files are also somewhat large, hitting 500-600 megs a log. Smaller logs I would read in via a PHP script and use regex matching, but the amount of memory required for such a large file would more than likely kill the script. Is there an easy way using the built-in tools on a Linux box (CentOS in this case) to extract multiple lines or am I going to have to bite the bullet and use Perl or PHP to read in the entire file to extract it?

Read the article

.NET Regular Expression to find actual words in text

- by Mehdi Anis

I am using VB .NET to write a program that will get the words from a suplied text file and count how many times each word appears. I am using this regular expression:- parser As New Regex("\w+") It gives me almost 100% correct words. Except when I have words like "Ms Word App file name is word.exe." or "is this a c# statment If(ab?1,0) ?" In such cases I get [word & exe] AND [If, a, b, 1 and 0] as seperate words. it would be nice (for my purpose) that I received word.exe and (If(ab?1,0) as words. I guess \w+ looks for white space, sentence terminating punctuation mark and other punctuation marks to determine a word. I want a similar regular Expression that will not break a word by a punctuation mark, if the punctuation mark is not the end of the word. I think end-of-word can be defined by a trailing WhiteSpace, Sentence terminating Punctuation (you may think of others). if you can suggest some regular expression 9for VB .NET) that will be great help. Thanks.

Read the article

PHP Preg_replace after a specific amount of characters with a conditional

- by Marc Ripley

I've been working on this for a bit, but my regex is weak. I need to check to see if a number is a whole number (single digit) and append a ".001" to it if so. The problem is, it's in the middle of a line with values separated by commas. MATERIALS,1,1,9999;1 4PL1 PB_Mel,,1,6,0.173,0.173,0.375,0,0.375,0,0,0,0,2,0,1,1 Needs to be MATERIALS,1,1,9999;1 4PL1 PB_Mel,,1.001,6,0.173,0.173,0.375,0,0.375,0,0,0,0,2,0,1,1 The line must start with "MATERIALS". There are more than one MATERIALS lines. The value will always be after 5 commas. I was trying something like this to even replace the number, but I don't think the approach is quite right: $stripped = preg_replace('/(MATERIALS)(,.*?){4}(,\d+?),/', '\2,', $stripped); I tried going through a preg_match_all for if process, to at least get the conditional working, but I still have to replace the lines. for($i=0;$i<sizeof($materialsLines[0]);$i++) { $section = explode(",",$materialsLines[0][$i]); if (strlen($section[5]) == 1) { $section[5] .= ".001"; } $materialsLines[0][$i] = implode(",",$section); }

Read the article

How to take name in one preg_match

- by Julianto

Hello guys, I am trying to extract just the names result from the hypothetical HTML file below. <ul class="cat"> <li>sport</li> <li>movie</li> </ul> <ul class="person-list"> <li>name 1</li> <li>name 2</li> <li>name 3</li> <li>name 4</li> <li>name 5</li> <li>name 6</li> </ul> Ideally, the result should come in an array format like the one below: Array( name 1 , name 2 , name 3 , .......... ) OK I can easily do this with 2 regex matches but I was wondering if I can do it with just one. Thanks in advance!

Search Results

Search found 32731 results on 1310 pages for 'regex for html'.

Page 200/1310 | < Previous Page | 196 197 198 199 200 201 202 203 204 205 206 207 | Next Page >

- by Thoman

- by PaulAdamDavis

- by ryanxu

- by Thanos

- by orano10000

- by btwong

- by Jaxidian

- by dreeves

- by Steen

- by Anna Tjsoken

- by polygenelubricants

- by Azher

- by user316841

- by allentown

- by dododedodonl

- by paxdiablo

- by ArtK

- by peirix

- by C Johnson

- by Srilesh

- by n0s

- by dragonmantank

- by Mehdi Anis

- by Marc Ripley

- by Julianto

< Previous Page | 196 197 198 199 200 201 202 203 204 205 206 207 | Next Page >