I need to match html tags(the whole tag), based on the tag name.
For script tags I have this:
<script.+src=.+(\.js|\.axd).+(</script>|>)
It correctly matches both tags in the following html:
<script src="Scripts/JScript1.js" type="text/javascript" />
<script type="text/javascript" src="Scripts/JScript2.js" />
However, when I do link tags with the following:
<link.+href=.+(\.css).+(</link>|>)
It matches all of this at once(eg it returns one match containing both items):
<link href="Stylesheets/StyleSheet1.css" rel="Stylesheet" type="text/css" />
<link href="Stylesheets/StyleSheet2.css" rel="Stylesheet" type="text/css" />
What am I missing here? The regexes are essentially identical except for the text to match to?
Also, I know that regex is not a great tool for HTML parsing...I will probably end up using the HtmlAgilityPack in the end, but this is driving me nuts and I want an answer if only for my own mental health!
I want to put space between punctuations and other words in a sentence. But boost::regex_replace() replaces the punctuation with space, and I want to keep a punctuation in the sentence!
for example in this code the output should be "Hello . hi , "
regex e1("[.,]");
std::basic_string<char> str = "Hello.hi,";
std::basic_string<char> fmt = " ";
cout<<regex_replace(str, e1, fmt)<<endl;
Can you help me?
I have a url which looks like this
https://test.high.com/people/11111111-name-firstname-_custa/deals/new
Now i need to match document.URL
if im on that Page if so i will alert a message.
The important part is /deals/new
How can i match that in Javascript?
I'm trying to match the username with a regex. Please don't suggest a split.
USERNAME=geo
Here's my code:
String input = "USERNAME=geo";
Pattern pat = Pattern.compile("USERNAME=(\\w+)");
Matcher mat = pat.matcher(input);
if(mat.find()) {
System.out.println(mat.group());
}
why doesn't it find geo in the group? I noticed that if I use the .group(1), it finds the username. However the group method contains USERNAME=geo. Why?
I was able to extract href value of anchors in an html string. Now, what I want to achieve is extract the href value and replace this value with a new GUID. I need to return both the replaced html string and list of extracted href value and it's corresponding GUID.
Thanks in advance.
My existing code is like:
Dim sPattern As String = "<a[^>]*href\s*=\s*((\""(?<URL>[^\""]*)\"")|(\'(?<URL>[^\']*)\')|(?<URL>[^\s]* ))"
Dim matches As MatchCollection = Regex.Matches(html, sPattern, RegexOptions.IgnoreCase Or RegexOptions.IgnorePatternWhitespace)
If Not IsNothing(matches) AndAlso matches.Count > 0 Then
Dim urls As List(Of String) = New List(Of String)
For Each m As Match In matches
urls.Add(m.Groups("URL").Value)
Next
End If
Sample HTML string:
<html><body><a title="http://www.google.com" href="http://www.google.com">http://www.google.com</a><br /><a href="http://www.yahoo.com">http://www.yahoo.com</a><br /><a title="http://www.apple.com" href="http://www.apple.com">Apple</a></body></html>
Hi,
I haven't found my answer after reading through all of these posts, so I'm hoping one of you heavy hitter regex folks can help me out. I'm trying to isolate the tag name and any attributes from the following string format:
{TAG:TYPE attr1="foo" attr2="bar" attr3="zing" attr4="zang" attr5="zoom" ...}
NOTE: in the above example, TAG will always be the same and TYPE will be one of several preset strings (e.g. share,print,display etc...). TAG and TYPE are uppercased only for the example but will not be case sensitive for real.
I am trying to implement a regular expression to allow only one or two digits after a hyphen '-' and it doesn't work properly. It allows as many digits as user types after '-'
Please suggest my ExtJS
Ext.apply(Ext.form.VTypes, {
hyphenText: "Number and hyphen",
hyphenMask: /[\d\-]/,
hyphenRe: /^\d+-\d{1,2}$/,
hyphen: function(v){
return Ext.form.VTypes.hyphenRe.test(v);
}
});
//Input Field for Issue no
var <portlet:namespace/>issueNoField = new Ext.form.TextField({
fieldLabel: 'Issue No',
width: 120,
valueField:'IssNo',
vtype: 'hyphen'
});
This works only to the limit that it allows digits and -. But it also has to allow only 1 to 2 digits after - at most.
Is something wrong in my regex? hyphenRe: /^\d+-\d{1,2}$/,
As the title says , i need to find 2 specific words in a sentence. But they can be in any order and any casing. How do i go about doing this using regex.
E.g. This is a very long sentence used as a test
From that sentence i need to extract the words test and long in any order i.e. test can be first or long can be first.
UPDATE:
What i did not mention the first part is it needs to be case insensitive as well
Hi,
How can I take a line like this:
Digital Presentation (10:45), (11:30), 12:00, 12:40, 13:20, 14:00, 14:40, 15:20, 16:00, 16:40, 17:20, 18:00, 18:40, 19:20, 20:00, 20:40, 21:20, 22:00, 22:40, 23:10, 23:40.
And match all the 24 hour times so I can convert to a more human readable format using date()?
Also I want to match times in the 24:00-24:59 range too
Thanks!
I have a regex which I'm using to match user functions inside an IDE (Sublime). This matches what I want (the function name itself), but it also matches the first parentheses. Therefore the match is like follows:
this._myFunction('content');
Notice the opening paran.
Here is my expression:
(?:[^\._])?([\w-]+)(?:[\(]){1}
How can I exclude the opening paran from getting matched?
.
As a bonus question: How can I successfully not match the string: function, because as you can expect function( matches (not fun in JS).
Thank you to anyone who can assist.
I'm trying to monitor a small section of a web page for changes using the the Google Page Monitor extension --
https://chrome.google.com/extensions/detail/pemhgklkefakciniebenbfclihhmmfcd
Under advanced settings I can use either Regex or Selectors to accomplish this, but need help with this. In the following html, I'd like to monitor the following for changes in either the URL in line 4 or the text in line 5. Any pointers gratefully accepted.
<div id="rtBtmBox"><div id="sectHead" style="margin-bottom:5px;">
<h3>SLJ's Pick of the Day</h3></div>
<p align="center">From the March issue</p>
<p align="center"><a target="_blank" href="http://www.schoollibraryjournal.com/article/CA6723937.html">
<font color="#0000ff"><strong><em>The Summer I Turned Pretty</em></strong><br/>
I have the following HTML snippet
<tr>
<td class="1">...</td>
<td class="2">...</td>
<td class="3">...</td>
<td class="4">...</td>
</tr>
etc...
I basically have N rows, and each row contains 4 TD's each with a unique class.
I would like a simple way to split out all the rows and TD's by class so I can choose what data I want to use.
I expect the easiest way to achieve this would be regex (maybe two). One to split up the TR's then another to split up the TDs (by class preferably)
Thanks
I am trying this example myhash = {/(\d+)/ => "hello"} with ruby 1.9.2p136 (2010-12-25) [i386-mingw32].
It doesn't work as expected (edit: as it turned out it shouldn't work as I was expecting):
irb(main):004:0> myhash = {/(\d+)/ => "hello"}
=> {/(\d+)/=>"Hello"}
irb(main):005:0> myhash[2222]
=> nil
irb(main):006:0> myhash["2222"]
=> nil
In Rubular which is on ruby1.8.7 the regex works.
What am I missing?
I need to extract the SAME type of information (e.g. First name, Last Name, Telephone, ...), from numerous different text sources (each with a different format & different order of the variables of interest).
I want a function that does the extraction based on a regular expression and returns the result as DESCRIPTIVE variables. In other words, instead of returning each match result as submatch[0], submatch[1], submatch[2], ..., have it do EITHER of the following:
1.)
return std::map so that the submatches can be accessed via:
submatch["first_name"], submatch["last_name"], submatch["telephone"]
2.)
return a variables with the submatches so that the submatches can be accessed via:
submatch_first_name, submatch_last_name, submatch_telephone
I can write a wrapper class around boost::regex to do #1, but I was hoping there would be a built-in or a more elegant way to do this in C++/Boost/STL/C.
Suppose I was trying to match the following expression using regex.h in C++, and trying to obtain the subexpressions contained:
/^((1|2)|3) (1|2)$/
Suppose it were matched against the string "3 1", the subexpressions would be:
"3 1"
"3"
"1"
If, instead it were matched against the string "2 1", the subexpressions would be:
"2 1"
"2"
"2"
"1"
Which means that, depending on how the first subexpression evaluates, the final one is in a different element in the pmatch array. I realise this particular example is trivial, as I could remove one of the sets of brackets, or grab the last element of the array, but it becomes problematic in more complicated expressions.
Suppose all I want are the top-level subexpressions, the ones which aren't subexpressions of other subexpressions. Is there any way to only get them? Or, alternatively, to know how many subexpressions are matched within a subexpression, so that I can traverse the array irrespective of how it evaluates?
Thanks
I'm trying to make an url that adds a / to all hrefs and srcs in a string.
It should only add a / to urls that don't have a http:// at their beginning and that don't have / yet also.
If we have this:
<a href="ABC">...
<img src="DEFG">...
<a href="/HIJ">...
<a href="http://KLMN">...
The results should be something like this:
<a href="/ABC">...
<img src="/DEFG">...
<a href="/HIJ">...
<a href="http://KLMN">...
This is what i've come up till now:
&(href|src)="?!(\/|http::\/\/)(.+)"
And the replace would be
$1="/$2"
It isn't working, though.
What am I doing wrong?
How would the working regex have to look like
Ok so I have this string:
<img src=images/imagename.gif alt='descriptive text here'>
and I am trying to split it up into the following two strings (array of two strings, what ever, just broken up).
imagename.gif
descriptive text here
Note yes, its' actually the & lt; and not < same with the closing on the string.
I know regex is the answer, but not the best at regext to know to pull it off in php.
With a cURL request I load a complete website into a variable: $buffer.
In the source of the site there are two labels in between which my relevant content is placed.
****** bunch of code *******
<!-- InstanceBeginEditable name="Kopij" -->
this part I want to store in a match
<!-- InstanceEndEditable -->
****** bunch of code *******
I've been messing around with preg_match and its regexp. Can someone try to help me?
Thanx in advance.
/^\d{1,2}[:][0-5][0-9]$/
is what I have. this limits minutes to 00-59. It does not, however, limit hours to between 0 and 12. For similarity and uniformity I would like to do this with RegEx alone if possible.
Further-more I would like the first digit to be optional. i.e. 09:30 accepted as well as 9:30. I played around with ranges, but something out of range is always acceptable.
In my regex expression, I was trying to match a password between 8 and 16 character, with at least 2 of each of the following: lowercase letters, capital letters, and digits.
In my expression I have:
^((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,16})$
But I don't understand why it wouldn't work like this:
^((?=\d)(?=[a-z])(?=[A-Z])(?=\d)(?=[a-z])(?=[A-Z]){8,16})$
Doesnt ".*" just meant "zero or more of any character"? So why would I need that if I'm just checking for specific conditions?
And why did I need the period before the curly braces defining the limit of the password?
And one more thing, I don't understand what it means to "not consume any of the string" in reference to "?=".
If you missed my sessions at OpenWorld then don't worry - all the content we used for pattern matching (presentation and hands-on lab) is now available for download.
My presentation "SQL: The Best Development Language for Big Data?" is available for download from the OOW Content Catalog, see here: https://oracleus.activeevents.com/2013/connect/sessionDetail.ww?SESSION_ID=9101
For the hands-on lab ("Pattern Matching at the Speed of Thought with Oracle Database 12c") we used the Oracle-By-Example content. The OOW hands-on lab uses Oracle Database 12c Release 1 (12.1) and uses the MATCH_RECOGNIZE clause to perform some basic pattern matching examples in SQL. This lab is broken down into four main steps:
Logically partition and order the data that is used in the MATCH_RECOGNIZE clause with its PARTITION BY and ORDER BY clauses.
Define patterns of rows to seek using the PATTERN clause of the MATCH_RECOGNIZE clause. These patterns use regular expressions syntax, a powerful and expressive feature, applied to the pattern variables you define.
Specify the logical conditions required to map a row to a row pattern variable in the DEFINE clause.
Define measures, which are expressions usable in the MEASURES clause of the SQL query.
You can download the setup files to build the ticker schema and the student notes from the Oracle Learning Library. The direct link to the example on using pattern matching is here: http://apex.oracle.com/pls/apex/f?p=44785:24:0::NO:24:P24_CONTENT_ID,P24_PREV_PAGE:6781,2.
Is there Perl's YAPE::Regex::Explain alternative to python?
For example, which could do following regex
\w+=\d+|\w+='[^']+'
to explanations like this
NODE EXPLANATION
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
=' '=\''
--------------------------------------------------------------------------------
[^']+ any character except: ''' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
' '\''
I want to write the regex pattern which should match the string in between also.
For example:
I have writtenthe regex pattern like this
^((?!mystring).)*$
Which means match words which doesnot contain mystring. But i want regex pattern to match like this.
mystringabcdfrevrgf
regex matcher should return
abcdfrevrgf
How will i achieve this, Please help Thanks in advance.
Answer:
((?!mystring)(.*))$
I have text files formatted as such:
R156484COMP_004A7001_20100104_065119.txt
I need to consistently extract the R****COMP, the 004A7001 number, 20100104 (date), and don't care about the 065119 number. the problem is that not ALL of the files being parsed have the exact naming convention. some may be like this:
R168166CRIT_156B2075_SU2_20091223_123456.txt
or
R285476COMP_SU1_125A6025_20100407_123456.txt
So how could I use regex instead of split to ensure I am always getting that serial (ex. 004A7001), the date (ex. 20100104), and the R****COMP (or CRIT)???
Here is what I do now but it only gets the files formatted like my first example.
if (file.Count(c => c == '_') != 3) continue;
and further down in the code I have:
string RNumber = Path.GetFileNameWithoutExtension(file);
string RNumberE = RNumber.Split('_')[0];
string RNumberD = RNumber.Split('_')[1];
string RNumberDate = RNumber.Split('_')[2];
DateTime dateTime = DateTime.ParseExact(RNumberDate, "yyyyMMdd", Thread.CurrentThread.CurrentCulture);
string cmmDate = dateTime.ToString("dd-MMM-yyyy");