strip_tags only catches tags that have a beginning and end tag. With the strings I'm working with it's causing issues and I need to removed all HTML tags.
I've noticed that MySql has an extensive search capacity, allowing both wildcards and regular expressions. However, I'm in somewhat in a bind since I'm trying to extract multiple values from a single string in my select query.
For example, if I had the text "<span>Test</span> this <span>query</span>", perhaps using regular expressions I could find and extract values "Test" or "query", but in my case, I have potentially n such strings to extract. And since I can't define n columns in my select statement, that means I'm stuck.
Is there anyway I could have a list of values (ideally separated by commas) of any text contained with span tags?
In other words, if I ran this query, I would get "Test,query" as the value of spanlist:
select <insert logic here> as spanlist from HtmlPages ...
I'm trying to get the contents of the second quotes and only the second quotes from a string. Right now I'm able to get the contents of all three quotes. What am I doing wrong? Is it possible to just print the second value in the output array?
Text
2014-06-02 11:48:41.519 -0700 Information 94 NICOLE Client "[WebDirect] (207.230.229.204) [207.230.229.204]" opening database "FMServer_Sample" as "Admin".
PHP
if (preg_match_all('~(["\'])([^"\']+)\1~', $line, $matches))
$database_names = $matches[2];
print_r($database);
Output
[WebDirect] (207.230.229.204) [207.230.229.204], FMServer_Sample, Admin
Hello
Please help with below
I need match only words where counting of characters same
for example same counting for a b c
abc ///match 1 (abc)
aabbcc match 2(abc)
adabb not mach 2(ab)
ttt match 0(abc)
I have this piece of code:
function func1(text) {
var pattern = /([\s\S]*?)(\<\?(?:attrib |if |else-if |else|end-if|search |for |end-for)[\s\S]*?\?\>)/g;
var result;
while (result = pattern.exec(text)) {
if (some condition) {
throw new Error('failed');
}
...
}
}
This works, unless the throw statement is executed. In that case, the next time I call the function, the exec() call starts where it left off, even though I am supplying it with a new value of 'text'.
I can fix it by writing
var pattern = new RegExp('.....');
instead, but I don't understand why the first version is failing. How is the regular expression persisting between function calls? (This is happening in the latest versions of Firefox and Chrome.)
Edit Complete test case:
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8">
<title>Test Page</title>
<style type='text/css'>
body {
font-family: sans-serif;
}
#log p {
margin: 0;
padding: 0;
}
</style>
<script type='text/javascript'>
function func1(text, count) {
var pattern = /(one|two|three|four|five|six|seven|eight)/g;
log("func1");
var result;
while (result = pattern.exec(text)) {
log("result[0] = " + result[0] + ", pattern.index = " + pattern.index);
if (--count <= 0) {
throw "Error";
}
}
}
function go() {
try { func1("one two three four five six seven eight", 3); } catch (e) { }
try { func1("one two three four five six seven eight", 2); } catch (e) { }
try { func1("one two three four five six seven eight", 99); } catch (e) { }
try { func1("one two three four five six seven eight", 2); } catch (e) { }
}
function log(msg) {
var log = document.getElementById('log');
var p = document.createElement('p');
p.innerHTML = msg;
log.appendChild(p);
}
</script>
</head>
<body><div>
<input type='button' id='btnGo' value='Go' onclick='go();'>
<hr>
<div id='log'></div>
</div></body>
</html>
The regular expression continues with 'four' as of the second call on FF and Chrome, not on IE7 or Opera.
I'm trying to extract the postal codes from yell.com using php and preg_replace.
I successfully extracted the postal code but only along with the address. Here is an example
$URL = "http://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=17824062&keywords=shop&layout=&companyName=&location=London&searchType=advance&broaderLocation=&clarifyIndex=0&clarifyOptions=CLOTHES+SHOPS|CLOTHES+SHOPS+-+LADIES|&ooa=&M=&ssm=1&lCOption32=RES|CLOTHES+SHOPS+-+LADIES&bandedclarifyResults=1";
//get yell.com page in a string
$htmlContent = $baseClass-getContent($URL);
//get postal code along with the address
$result2 = preg_match_all("/(.*)/", $htmlContent, $matches);
print_r($matches);
The above code ouputs something like
Array ( [0] = Array ( [0] = 7, Royal Parade, Chislehurst, Kent BR7 6NR [1] = 55, Monmouth St, London, WC2H 9DG .... the problem that I have is that I don't know how to extract the the postal code because it doesn't have an exact number of digits (sometimes it has 6 digits and sometimes has only 5 times). Basically I should extract the lasted 2 words from each array .
Thank you in advance for any help !
Using Python I need to insert a newline character into a string every 64 characters. In Perl it's easy:
s/(.{64})/$1\n/
How could this be done using regular expressions in Python?
Is there a more pythonic way to do it?
Is there a way to obtain the C++ equivalent of Perl's PREMATCH ($`) and POSTMATCH ($') from pcrecpp? I would be happy with a string, a char *, or pairs indices/startpos+length that point at this.
StringPiece seems like it might accomplish part of this, but I'm not certain how to get it.
in perl:
$_ = "Hello world";
if (/lo\s/) {
$pre = $`; #should be "Hel"
$post = $'; #should be "world"
}
in C++ I would have something like:
string mystr = "Hello world"; //do I need to map this in a StringPiece?
if (pcrecpp::RE("lo\s").PartialMatch(mystr)) { //should I use Consume or FindAndConsume?
//What should I do here to get pre+post matches???
}
pcre plainjane c seems to have the ability to return the vector with the matches including the "end" portion of the string, so I could theoretically extract such a pre/post variable, but that seems like a lot of work. I like the simplicty of the pcrecpp interface.
Suggestions? Thanks!
--Eric
I have a searching system that splits the keyword into chunks and searches for it in a string like this:
var regexp_school = new RegExp("(?=.*" + split_keywords[0] + ")(?=.*" + split_keywords[1] + ")(?=.*" + split_keywords[2] + ").*", "i");
I would like to modify this so that so that I would only search for it in the beginning of the words.
For example if the string is:
"Bbe be eb ebb beb"
And the keyword is: "be eb"
Then I want only these to hit "be ebb eb"
In other words I want to combine the above regexp with this one:
var regexp_school = new RegExp("^" + split_keywords[0], "i");
But I'm not sure how the syntax would look like.
I'm also using the split fuction to split the keywords, but I dont want to set a length since I dont know how many words there are in the keyword string.
split_keywords = school_keyword.split(" ", 3);
If I leave the 3 out, will it have dynamic lenght or just lenght of 1? I tried doing a
alert(split_keywords.lenght);
But didnt get a desired response
I have a regular expression, links = re.compile('<a(.+?)href=(?:"|\')?((?:https?://|/)[^\'"]+)(?:"|\')?(.*?)>(.+?)</a>',re.I).findall(data)
to find links in some html, it is taking a long time on certain html, any optimization advice?
One that it chokes on is http://freeyourmindonline.net/Blog/
For the love of God I am not getting this easy code to work! It is always alerting out "null" which means that the string does not match the expression.
var pattern = "^\w+@[a-zA-Z_]+?\.[a-zA-Z]{2,3}$";
function isEmailAddress(str) {
str = "[email protected]";
alert(str.match(pattern));
return str.match(pattern);
}
In Brzozowski's "Derivatives of Regular Expressions" and elsewhere, the function d(R) returning ? if a R is nullable, and Ø otherwise, includes clauses such as the following:
d(R1 + R2) = d(R1) + d(R2)
d(R1 · R2) = d(R1) ? d(R2)
Clearly, if both R1 and R2 are nullable then (R1 · R2) is nullable, and if either R1 or R2 is nullable then (R1 + R2) is nullable. It is unclear to me what the above clauses are supposed to mean, however. My first thought, mapping (+), (·), or the Boolean operations to regular sets is nonsensical, since in the base case,
d(a) = Ø (for all a ? S)
d(?) = ?
d(Ø) = Ø
and ? is not a set (nor is the return type of d, which is a regular expression). Furthermore, this mapping isn't indicated, and there is a separate notation for it. I understand nullability, but I'm lost on the definition of the sum, product, and Boolean operations in the definition of d: how are ? or Ø returned from d(R1) ? d(R2), for instance, in the definition off d(R1 · R2)?
I have this file "file.txt" which I want to split into many smaller ones.
Each line of the file has an id field which looks like "id:1" for a line belonging to id 1.
For each id in the file, I like to create a file named idid.txt and put all lines that belong to this id in that file.
My brute force bash script solution reads as follows.
count=1
while [ $count -lt 19945 ]
do
cat file.txt | grep "id:$count " >> ./sets/id$count.txt
count='expr $count + 1'
done
Now this is very inefficient as I have do read through the file about 20.000 times.
Is there a way to do the same operation with only one pass through the file? -
What I'm probably asking for is a way to use the value that matches for a regular expression to name the associated output file.
How do I write a swtich for the following conditional?
If the url contains "foo", then settings.base_url is "bar".
The following is achieving the effect required but I've a feeling this would be more manageable in a switch:
var doc_location = document.location.href;
var url_strip = new RegExp("http:\/\/.*\/");
var base_url = url_strip.exec(doc_location)
var base_url_string = base_url[0];
//BASE URL CASES
// LOCAL
if (base_url_string.indexOf('xxx.local') > -1) {
settings = {
"base_url" : "http://xxx.local/"
};
}
// DEV
if (base_url_string.indexOf('xxx.dev.yyy.com') > -1) {
settings = {
"base_url" : "http://xxx.dev.yyy.com/xxx/"
};
}
Thanks
I have used regExp quit a bit of times but still far from being an expert. This time I want to validate a formula (or math expression) by regExp. The difficult part here is to validate proper starting and ending parentheses with in the formula.
I believe, there would be some sample on the web but I could not find it. Can somebody post a link for such example? or help me by some other means?
Hi,
how can I make sure a certain keyword just occurs once in the input with regular expression?
I think there is some mistakes in the expression below as I can repeat the same keywords,
if (!preg_match('/\b(.php?){1}\b/', $cfg_path))
{
$error = true;
echo '<error elementid="cfg_path" message="PATH - make sure you have a \'.php?\' in the path."/>';
}
I just want this to be true,
form.php?category=something or form.php?
but not this,
form.php?.php?category=something or form.php?.php?
please let me know how to fix it.
thanks.
Regular expressions are often pointed to as the classical example of a language that is not Turning complete. For example "regular expressions" is given in as the answer to this SO question looking for languages that are not Turing complete.
In my, perhaps somewhat basic, understanding of the notion of Turning completeness, this means that regular expressions cannot be used check for patterns that are "balanced". Balanced meaning have an equal number of opening characters as closing characters. This is because to do this would require you to have some kind of state, to allow you to match the opening and closing characters.
However the .NET implementation of regular expressions introduces the notion of a balanced group. This construct is designed to let you backtrack and see if a previous group was matched. This means that a .NET regular expressions:
^(?<p>a)*(?<-p>b)*(?(p)(?!))$
Could match a pattern that:
ab
aabb
aaabbb
aaaabbbb
... etc. ...
Does this means .NET's regular expressions are Turing complete? Or are there other things that are missing that would be required for the language to be Turing complete?
At the minute i have:
$Text = preg_replace("/\[code\](.*?)\[\/code\]/s", "<mytag>\\1</mytag>", $Text);
how can i escape the backreference using htmlentities()?
Hi, everyone
I have a piece of text and I've got to parse usernames and hashes out of it. Right now I'm doing it with two regular expressions. Could I do it with just one multiline regular expression?
#!/usr/bin/env python
import re
test_str = """
Hello, UserName.
Please read this looooooooooooooooong text. hash
Now, write down this hash: fdaf9399jef9qw0j.
Then keep reading this loooooooooong text.
Hello, UserName2.
Please read this looooooooooooooooong text. hash
Now, write down this hash: gtwnhton340gjr2g.
Then keep reading this loooooooooong text.
"""
logins = re.findall('Hello, (?P<login>.+).',test_str)
hashes = re.findall('hash: (?P<hash>.+).',test_str)
Hi,
Could anyone please tell me the reason of getting an output as: ab for the following RegExp code using Relcutant quantifier?
Pattern p = Pattern.compile("abc*?");
Matcher m = p.matcher("abcfoo");
while(m.find())
System.out.println(m.group()); // ab
and getting empty indices for the following code?
Pattern p = Pattern.compile(".*?");
Matcher m = p.matcher("abcfoo");
while(m.find())
System.out.println(m.group());
I've taken a regular expression from jQuery to detect if a browser's engine is WebKit and gets it's version number, it returns 3 values extracted from the userAgent string: webkit/….…, webkit and ….… [“….…” being the version number].
I would like the regular expression to return just 2 values: webkit and ….….
I'm rubbish at regular expressions, so please can you give an explanation of the expression with your answer.
The regular expression I'm currently working with and wish to improve is: /(webkit)[\/]([\w.]+)/.
I appreciate all your help, thanks in advance!
How do I split on all nonalphanumeric characters, EXCEPT the apostrophe?
re.split('\W+',text)
works, but will also split on apostrophes. How do I add an exception to this rule?
Thanks!
Hi have some forms that I want to use some basic php validation (regular expressions) on, how do you go about doing it? I have just general text input, usernames, passwords and date to validate. I would also like to know how to check for empty input boxes. I have looked on the interenet for this stuff but I haven't found any good tutorials.
Thanks
I have a lot lines contains XXXXXXXXX number format. I want change number XXXXXXXXX to XX.XXX.XXX.X
XXXXXXXXX = 9 digit random number
Anyone can help me? Thanks in advance
I have a string that looks like this:
var str = "Hello world, hello >world, hello world!";
... and I'd like to replace all the hellos with e.g. bye and world with earth, except the words that start with   or >. Those should be ignored. So the result should be:
bye earth, hello >world, bye earth!
Tried to this with
str.replace(/(?!\ )hello/gi,'bye'));
But it doesn't work.