Regexp in Java
I want to make a regexp who do this
verify if a word is like [0-9A-Za-z][._-'][0-9A-Za-z]
example for valid words
A21a_c32
daA.da2
das'2
dsada
ASDA
12SA89
non valid words
dsa#da2
34$
Thanks
I have a string that looks something like this:
{theField} > YEAR (today, -3) || {theField} < YEAR (today, +3)
I want it to be replaced into:
{theField} > " + YEAR (today, -3) + " || {theField} < " + YEAR (today, +3) + "
I have tried this:
String.replace(/(.*)(YEAR|MONTH|WEEK|DAY+)(.*[)]+)/g, "$1 \" + $2 $3 + \"")
But that gives me:
{theField} > YEAR (today, +3) || {theField} > " + YEAR (today, +3) + "
Does anyone have any ideas?
I'm trying to find all the occurrences of "Arrows" in text, so in
"<----=====><==->>"
the arrows are:
"<----", "=====>", "<==", "->", ">"
This works:
String[] patterns = {"<=*", "<-*", "=*>", "-*>"};
for (String p : patterns) {
Matcher A = Pattern.compile(p).matcher(s);
while (A.find()) {
System.out.println(A.group());
}
}
but this doesn't:
String p = "<=*|<-*|=*>|-*>";
Matcher A = Pattern.compile(p).matcher(s);
while (A.find()) {
System.out.println(A.group());
}
No idea why. It often reports "<" instead of "<====" or similar.
What is wrong?
I would like to convert any instances of a hashtag in a String into a linked URL:
#hashtag - should have "#hashtag" linked.
This is a #hashtag - should have "#hashtag" linked.
This is a [url=http://www.mysite.com/#name]named anchor[/url] - should not be linked.
This isn't a pretty way to use quotes - should not be linked.
Here is my current code:
String.prototype.parseHashtag = function() {
return this.replace(/[^&][#]+[A-Za-z0-9-_]+(?!])/, function(t) {
var tag = t.replace("#","")
return t.link("http://www.mysite.com/tag/"+tag);
});
};
Currently, this appears to fix escaped characters (by excluding matches with the amperstand), handles named anchors, but it doesn't link the #hashtag if it's the first thing in the message, and it seems to grab include the 1-2 characters prior to the "#" in the link.
Halp!
Hi,
I'm using preg_replace to create urls for modrewrite based paging links.
I use:
$nextURL = preg_replace('%/([\d]+)/%','/'.($pageNumber+1).'/',$currentURL);
which works fine, however I was wondering if there is a better way without having to include the '/' in the replacement parameter. I need to match the number as being between two / as the URLs can sometimes contain numbers other than the page part. These numbers are never only numbers however, so have /[\d]+/ stops them from getting replaced.
What tools are available in Python to assist in parsing a context-free grammar?
Of course it is possible to roll my own, but I am looking for a generic tool that can generate a parser for a given CFG.
Hi,
Im looking for function (PHP will be the best), which returns true whether exists string matches both regexpA and regexpB.
Example 1:
$regexpA = '[0-9]+';
$regexpB = '[0-9]{2,3}';
hasRegularsIntersection($regexpA,$regexpB) returns TRUE because '12' matches both regexps
Example 2:
$regexpA = '[0-9]+';
$regexpB = '[a-z]+';
hasRegularsIntersection($regexpA,$regexpB) returns FALSE because numbers never matches literals.
Thanks for any suggestions how to solve this.
Henry
I want to replace all "mailto:" links in html with plain emails.
In: text .... <a href="mailto:[email protected]">not needed</a> text
Out: text .... [email protected] text
I did this:
$str = preg_replace("/\<a.+href=\"mailto:(.*)\".+\<\/a\>/", "$1", $str);
But it fails if there are multiple emails in string or html inside "a" tag
In: <a href="mailto:[email protected]">not needed</a><a href="mailto:[email protected]"><font size="3">[email protected]</font></a>
Out: [email protected]">
Perl has been one of my go-to programming language tools for years and years. Perl 6 grammars looks like a great language feature. I'd like to know if someone has started something like this for Ruby.
How can I 301 redirect any URL that starts with a number between 1 - 9999, for example
domain.com/12/something/anotherthing
domain.com/378/product/widgets
domain.com/2560
Hello all
I got this question which asks me to figure out why is it foolish to write a regular expression for the language that consists of strings of 0's and 1's that are palindromes( they read the same backwards and forwards).
part 2 of the question says using any formal mechanism of your choice, show how it is possible to express the language that consists of strings of 0's and 1's that are palindromes?
Many of us need to deal with user input, search queries, and situations where the input text can potentially contain profanity or undesirable language. Oftentimes this needs to be filtered out.
Where can one find a good list of swear words in various languages and dialects?
Are there APIs available to sources that contain good lists? Or maybe an API that simply says "yes this is clean" or "no this is dirty" with some parameters?
What are some good methods for catching folks trying to trick the system, like a$$, azz, or a55?
Bonus points if you offer solutions for PHP. :)
Edit: Response to answers that say simply avoid the programmatic issue:
I think there is a place for this kind of filter when, for instance, a user can use public image search to find pictures that get added to a sensitive community pool. If they can search for "penis", then they will likely get many pictures of, yep. If we don't want pictures of that, then preventing the word as a search term is a good gatekeeper, though admittedly not a foolproof method. Getting the list of words in the first place is the real question.
So I'm really referring to a way to figure out of a single token is dirty or not and then simply disallow it. I'd not bother preventing a sentiment like the totally hilarious "long necked giraffe" reference. Nothing you can do there. :)
I have a file that is similar to this:
<many lines of stuff>
SUMMARY:
<some lines of stuff>
END OF SUMMARY
I want to extract just the stuff between SUMMARY and END OF SUMMARY. I suspect I can do this with sed but I am not sure how. I know I can modify the stuff in between with this:
sed "/SUMMARY/,/END OF SUMMARY/ s/replace/with/" fileName
(But not sure how to just extract that stuff).
I am Bash on Solaris.
I have a certain text in Java, and I want to use pattern and matcher to extract something from it. This is my program:
public String getItemsByType(String text, String start, String end) {
String patternHolder;
StringBuffer itemLines = new StringBuffer();
patternHolder = start + ".*" + end;
Pattern pattern = Pattern.compile(patternHolder);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
itemLines.append(text.substring(matcher.start(), matcher.end())
+ "\n");
}
return itemLines.toString();
}
This code works fully WHEN the searched text is on the same line, for instance:
String text = "My name is John and I am 18 years Old";
getItemsByType(text, "My", "John");
immediately grabs the text "My name is John" out of the text. However, when my text looks like this:
String text = "My name\nis John\nand I'm\n18 years\nold";
getItemsByType(text, "My", "John");
It doesn't grab anything, since "My" and "John" are on different lines. How do I solve this?
In a text, I would like to replace all occurrences of $word by [$word]($word) (to create a link in Markdown), but only if it is not already in a link. Example:
[$word homepage](http://w00tw00t.org)
should not become
[[$word]($word) homepage](http://w00tw00t.org).
Thus, I need to check whether $word is somewhere between [ and ] and only replace if it's not the case.
Can you think of a preg_replace command for this?
How can I convert some regular language to its equivalent Context Free Grammar(CFG)?
Whether the DFA corresponding to that regular expression is required to be constructed or is there some rule for the above conversion?
For example, considering the following regular expression
01+10(11)*
How can I describe the grammar corresponding to the above RE?
Hello,
I have a grep expression using cygwin grep on Win.
grep -a "\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u" all_fbs.txt > rockon_fbs.txt
Once I identify the emoticon class, however, I want to strip them out of the data. However, the same regexp above within a sed results in a syntax error (yes, I realize I could use /d instead of //g, but this doesn't make a difference, I still get the error.)
sed "s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g"
The full line is:
grep -a "\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u" all_fbs.txt | sed "s/\(\\,,/\|\\m/\|\\m/\\>\.</\\m/\|:u\)*//g" | sed "s/^/ROCKON\t/" > rockon_fbs.txt
The result is:
sed: -e expression #1, char 14: unknown option to `s'
I know it's coming from the sed regexp I'm asking about it b/c if I remove that portion of the full line, then I get no error (but, of course, the emoticons are not filtered out).
Thanks in advance,
Steve
Hi
I am trying to import this:
http://en.wikipedia.org/wiki/List_of_countries_by_continent_%28data_file%29
which is of the format like:
AS AF AFG 004 Afghanistan, Islamic Republic of
EU AX ALA 248 Åland Islands
EU AL ALB 008 Albania, Republic of
AF DZ DZA 012 Algeria, People's Democratic Republic of
OC AS ASM 016 American Samoa
EU AD AND 020 Andorra, Principality of
AF AO AGO 024 Angola, Republic of
NA AI AIA 660 Anguilla
if i do
<? explode(" ",$data"); ?>
that works fine apart from countries with more than 1 word.
how can i split it so i get the first 4 bits of data (the chars/ints) and the 5th bit of data being whatever remains?
this is in php
thank you
I have tried to remove the following tag generated by the AJAX Control toolkit.
The scenario is our GUI team used the AJAX control toolkit to make the GUI but I need to move them to normal ASP .NET view tag using MultiView.
I want to remove all the __designer: attributes
Here is the code
<asp:TextBox ID="a" runat="server" __designer:wfdid="w540" />
<asp:DropdownList ID="a" runat="server" __designer:wfdid="w541" />
.....
<asp:DropdownList ID="a" runat="server" __designer:wfdid="w786" />
I tried to use the regular expression find replace in Visual Studio using:
Find:
:__designer\:wfdid="w{([0-9]+)}"
Replace with empty space
Can any regular expression expert help?
I have a Joomla plugin (not important in this context), which is designed to take an input with a load of numbers (within a paragraph of text) and replace them with a series of s.
My problem is that I need to do a preg_replace on my $article-text, but I don't know how to then apply the changes to the matched terms. I've seen the preg_replace_callback, but I don't know how I can call that within a function.
function onPrepareContent( &$article, &$params, $limitstart )
{
global $mainframe;
// define the regular expression
$pattern = "#{lotterynumbers}(.*?){/lotterynumbers}#s";
if(isset($article->text)){
preg_match($pattern, $article->text, $matches);
$numbers = explode("," , $matches[1]);
foreach ($numbers as $number) {
echo "<div class='number'><span>" . $number . "</span></div>";
}
}else{
$article->text = 'No numbers';
}
return true;
}
AMENDED CODE:
function onPrepareContent( &$article, &$params, $limitstart )
{
global $mainframe;
// define the regular expression
$pattern = "#{lotterynumbers}(.*?){/lotterynumbers}#s";
if(isset($article->text)){
preg_match($pattern, $article->text, $matches);
$numbers = explode("," , $matches[1]);
foreach ($numbers as $number) {
$numberlist[] = "<div class='number'><span>" . $number . "</span></div>";
}
$numberlist = implode("", $numberlist);
$article->text = preg_replace($pattern, $numberlist, $article->text);
}else{
$article->text = 'No numbers';
}
return true;
}
i have array which values are user input like:
aa df rrr5 4323 54 hjy 10 gj @fgf %d
would be that array,
now i want to check each value in array
whether its numeric or alphabetic (a-zA-Z) or alphanumeric
and save them in other respective arrays
i have done:
my @num;
my @char;
my @alphanum;
my $str =<>;
my @temp = split(" ",$str);
foreach (@temp)
{
print "input : $_ \n";
if ($_ =~/^(\d+\.?\d*|\.\d+)$/)
{
push(@num,$_);
}
}
this works,
similarly i want to check for alphabet, and alphanumeric values
note: alphanumeric ex. fr43 6t$ $eed5 *jh
Ok... changing the question here... I'm getting an error when I try this:
SELECT COUNT ( DISTINCT mid, regexp_replace(na_fname, '\\s*', '', 'g'), regexp_replace(na_lname, '\\s*', '', 'g'))
FROM masterfile;
Is it possible to use regexp in a distinct clause like this?
The error is this:
WARNING: nonstandard use of \\ in a string literal
LINE 1: ...CT COUNT ( DISTINCT mid, regexp_replace(na_fname, '\\s*', ''...
Ok so i'm executing the following line of code in javascript
RegExp('(http:\/\/t.co\/)[a-zA-Z0-9\-\.]{8}').exec(tcont);
where tcont is equal to some string like 'Test tweet to http://t.co/GXmaUyNL' (the content of a tweet obtained by jquery).
However it is returning, in the case above for example, 'http://t.co/GXmaUyNL,http://t.co/'.
This is frustracting because I want the url without the bit on the end - after and including the comma.
Any ideas why this is appearing? Thanks
I want to change
<lang class='brush:xhtml'>test</lang>
to
<pre class='brush:xhtml'>test</pre>
my code like that.
<?php
$content="<lang class='brush:xhtml'>test</lang>";
$pattern=array();
$replace=array();
$pattern[0]="/<lang class=([A-Za-z='\":])* </";
$replace[0]="<pre $1>";
$pattern[1]="/<lang>/";
$replace[1]="</pre>";
echo preg_replace($pattern, $replace,$content);
?>
but it's not working. How to change my code or something wrong in my code ?
I'm seeking a solution to splitting a string which contains text in the following format:
"abcd efgh 'ijklm no pqrs' tuv"
which will produce the following results:
['abcd', 'efgh', 'ijklm no pqrs', 'tuv']
In other words, it splits by whitespace unless inside of a single quoted string. I think it could be done with .NET regexps using "Lookaround" operators, particularly balancing operators. I'm not so sure about Perl.