Search Results

Search found 5493 results on 220 pages for 'boost regex'.

Page 133/220 | < Previous Page | 129 130 131 132 133 134 135 136 137 138 139 140  | Next Page >

  • ruby parametrized regular expression

    - by astropanic
    I have a string like "{some|words|are|here}" or "{another|set|of|words}" So in general the string consists of an opening curly bracket,words delimited by a pipe and a closing curly bracket. What is the most efficient way to get the selected word of that string ? I would like do something like this: @my_string = "{this|is|a|test|case}" @my_string.get_column(0) # => "this" @my_string.get_column(2) # => "is" @my_string.get_column(4) # => "case" What should the method get_column contain ?

    Read the article

  • Regular Expression to match unlimited number of options

    - by Pekka
    I want to be able to parse file paths like this one: /var/www/index.(htm|html|php|shtml) into an ordered array: array("htm", "html", "php", "shtml") and then produce a list of alternatives: /var/www/index.htm /var/www/index.html /var/www/index.php /var/www/index.shtml Right now, I have a preg_match statement that can split two alternatives: preg_match_all ("/\(([^)]*)\|([^)]*)\)/", $path_resource, $matches); Could somebody give me a pointer how to extend this to accept an unlimited number of alternatives (at least two)? Just regarding the regular expression, the rest I can deal with. The rule is: The list needs to start with a ( and close with a ) There must be one | in the list (i.e. at least two alternatives) Any other occurrence(s) of ( or ) are to remain untouched.

    Read the article

  • PHP regular expression subpattern behaviour

    - by codecowboy
    I want to match both the src and title attributes of an image tag: pattern: <img [^>]*src=["|\']([^"|\']+["|\'])|title=["|\']([^"|\']+) target: <img src="http://someurl.jpg" class="quiz_caption" title="Caption goes here!"> This pattern gives me one unwanted match, title="content", and the match I actually want which is the value between the quotes after the word 'title', i.e 'content'. So, my matches are: <img src="http://someurl.jpg http://someurl.jpg title="Caption goes here!" Caption goes here! Is there a way to avoid the third of these matches? I'm using PCRE in PHP 5.2.x

    Read the article

  • Is it possible to "learn" a regular expression by user-provided examples?

    - by DR
    Is it possible to "learn" a regular expression by user-provided examples? To clarify: I do not want to learn regular expressions. I want to create a program which "learns" a regular expression from examples which are interactively provided by a user, perhaps by selecting parts from a text or selecting begin or end markers. Is it possible? Are there algorithms, keywords, etc. which I can Google for? EDIT: Thank you for the answers, but I'm not interested in tools which provide this feature. I'm looking for theoretical information, like papers, tutorials, source code, names of algorithms, so I can create something for myself.

    Read the article

  • How can I convert SQL comments with -- to # using Perl?

    - by NJTechie
    I have various SQL files with '--' comments and we migrated to the latest version of MySQL and it hates these comments. I want to replace -- with #. I am looking for a recursive, inplace replace one-liner. This is what I have: perl -p -i -e 's/--/# /g'` ``fgrep -- -- * A sample .sql file: use myDB; --did you get an error I get the following error: Unrecognized switch: --did (-h will show valid options). p.s : fgrep skipping 2 dashes was just discussed here if you are interested. Any help is appreciated.

    Read the article

  • Efficient way to organise data file in columns with Python

    - by user1700959
    I'm getting an output data file of a program which looks like this, with more than one line for each time step: 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 0.0000E+00 7.9819E-06 1.7724E-02 2.3383E-02 3.0048E-02 3.8603E-02 4.9581E-02 5.6635E-02 4.9991E-02 3.9052E-02 3.0399E-02 .... I want to arrange it in ten columns I have made a Python script, using regular expressions to delete \n in the proper lines, but I think that there should be a simpler more elegant way to do it, here is my script: import re with open('inputfile', encoding='utf-8') as file1: datai=file1.read() dataf=re.sub(r'(?P<nomb>( \d\.\d\d\d\dE.\d\d){8})\n','\g<nomb>',datai) with open('result.txt',mode='w',encoding='utf-8') as resultfile: resultfile.write(datof) Thanks in advance

    Read the article

  • Java Split not working as expected

    - by daaabears
    I am trying to use a simple split to break up the following string: 00-00000 My expression is: ^([0-9][0-9])(-)([0-9])([0-9])([0-9])([0-9])([0-9]) And my usage is: String s = "00-00000"; String pattern = "^([0-9][0-9])(-)([0-9])([0-9])([0-9])([0-9])([0-9])"; String[] parts = s.split(pattern); If I play around with the Pattern and Matcher classes I can see that my pattern does match and the matcher tells me my groupCount is 7 which is correct. But when I try and split them I have no luck.

    Read the article

  • replaceAll() method using parameter from text file

    - by Herman Plani Ginting
    i have a collection of raw text in a table in database, i need to replace some words in this collection using a set of words. i put all the term to be replace and its substitutes in a text file as below min=admin lelet=lambat lemot=lambat nii=nih ntu=itu and so on. i have successfully initiate a variabel of File and Scanner to read the collection of the term and its substitutes. i loop all the dataset and save the raw text in a string in the same loop i loop all the term collection and save its row to a string name 'pattern', and split the pattern into two string named 'term' and 'replacer' in this loop i initiate a new string which its value is the string from the dataset modified by replaceAll(term,replacer) end loop for term collection then i insert the new string to another table in database end loop for dataset i do it manualy as below replaceAll("min","admin") and its works but its really something to code it manually for almost 2000 terms to be replace it. anyone ever face this kind of really something.. i really need a help now desperate :( package sentimenrepo; import javax.swing.*; import java.sql.*; import java.io.*; //import java.util.HashMap; import java.util.Scanner; //import java.util.Map; /** * * @author herman */ public class synonimReplaceV2 extends SwingWorker { protected Object doInBackground() throws Exception { new skripsisentimen.sentimenttwitter().setVisible(true); Integer row = 0; File synonimV2 = new File("synV2/catatan_kata_sinonim.txt"); String newTweet = ""; DB db = new DB(); Connection conn = db.dbConnect("jdbc:mysql://localhost:3306/tweet", "root", ""); try{ Statement select = conn.createStatement(); select.executeQuery("select * from synonimtweet"); ResultSet RS = select.getResultSet(); Scanner scSynV2 = new Scanner(synonimV2); while(RS.next()){ row++; String no = RS.getString("no"); String tweet = " "+ RS.getString("tweet"); String published = RS.getString("published"); String label = RS.getString("label"); clean2 cleanv2 = new clean2(); newTweet = cleanv2.cleanTweet(tweet); try{ Statement insert = conn.createStatement(); insert.executeUpdate("INSERT INTO synonimtweet_v2(no,tweet,published,label) values('" +no+"','"+newTweet+"','"+published+"','"+label+"')"); String current = skripsisentimen.sentimenttwitter.txtAreaResult.getText(); skripsisentimen.sentimenttwitter.txtAreaResult.setText(current+"\n"+row+"original : "+tweet+"\n"+newTweet+"\n______________________\n"); skripsisentimen.sentimenttwitter.lblStat.setText(row+" tweet read"); skripsisentimen.sentimenttwitter.txtAreaResult.setCaretPosition(skripsisentimen.sentimenttwitter.txtAreaResult.getText().length() - 1); }catch(Exception e){ skripsisentimen.sentimenttwitter.lblStat.setText(e.getMessage()); } skripsisentimen.sentimenttwitter.lblStat.setText(e.getMessage()); } }catch(Exception e){ skripsisentimen.sentimenttwitter.lblStat.setText(e.getMessage()); } return row; } class clean2{ public clean2(){} public String cleanTweet(String tweet){ File synonimV2 = new File("synV2/catatan_kata_sinonim.txt"); String pattern = ""; String term = ""; String replacer = ""; String newTweet=""; try{ Scanner scSynV2 = new Scanner(synonimV2); while(scSynV2.hasNext()){ pattern = scSynV2.next(); term = pattern.split("=")[0]; replacer = pattern.split("=")[1]; newTweet = tweet.replace(term, replacer); } }catch(Exception e){ e.printStackTrace(); } System.out.println(newTweet+"\n"+tweet); return newTweet; } } }

    Read the article

  • Matching content between tags in web source

    - by Semas
    Hello, I was wondering what could be the fastest and the easiest way to grab text that is between tags in string. For example i have this string: Lorem ipsum <a>dolor sit amet</a>, <b>consectetur</b> adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. And i need to find text that is between tags <a> </a> and <b> </b>. Thank you.

    Read the article

  • What is the best regular expression for validating email addresses?

    - by acrosman
    Over the years I have slowly developed a regular expression that validates MOST email addresses correctly, assuming they don't use an IP address as the server part. Currently the expression is: ^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$ I use this in several PHP programs, and it works most of the time. However, from time to time I get contacted by someone that is having trouble with a site that uses it, and I end up having to make some adjustment (most recently I realized that I wasn't allowing 4-character TLDs). What's the best regular expression you have or have seen for validating emails? I've seen several solutions that use functions that use several shorter expressions, but I'd rather have one long complex expression in a simple function instead of several short expression in a more complex function.

    Read the article

  • php array regular expressions

    - by bell
    I am using regular expressions in php to match postcodes found in a string. The results are being returned as an array, I was wondering if there is any way to assign variables to each of the results, something like $postcode1 = first match found $postcode2 = second match found here is my code $html = "some text here bt123ab and another postcode bt112cd"; preg_match_all("/([a-zA-Z]{2})([0-9]{2,3})([a-zA-Z]{2})/", $html, $matches, PREG_SET_ORDER); foreach ($matches as $val) { echo $val[0]; } I am very new to regular expressions and php, forgive me if this is a stupid question. Thanks in advance

    Read the article

  • Complex regular expression

    - by Jose3d
    Hello, i will like to capture a substring part of a text choosing the number of characters but if any word is cut then get until de last blank. As example if this is the text: "This is an example of text lorem ipsum, etc..." and i would like to get for instance 12 characters that are: "This is an e". In this case example is cutted, then i would like to get "This is an". Its possible do this with Regular Expressions? Thanks in advance. Jose

    Read the article

  • Match Anything Except a Sub-pattern

    - by Tim Lytle
    I'd like to accomplish what this (invalid I believe) regular expression tries to do: <p><a>([^(<\/a>)]+?)<\/a></p>uniquestring Essentially match anything except a closing anchor tag. Simple non-greedy doesn't help here because `uniquestring' may very well be after another distant closing anchor tag: <p><a>text I don't <tag>want</tag> to match</a></p>random data<p><a>text I do <tag>want to</tag> match</a></p>uniquestring more matches <p><a>of <tag>text I do</tag> want to match</a></p>uniquestring So I have more tag in between the anchor tags. And I'm using the presence of uniquestring to determine if I want to match the data. So a simple non-greedy ends up matching everything from the start of the data I don't want to the end of the data I do want. I know I'm edging close to the problems regular expressions (or at least my knowledge of them) aren't good at solving. I could just through the data at an HTML/XML parser, but it is just one simple(ish) search. Is there some easy way to do this that I'm just missing?

    Read the article

  • Why are these strings escaping from my regular expression in python?

    - by dohkoxar
    In my code, I load up an entire folder into a list and then try to get rid of every file in the list except the .mp3 files. import os import re path = '/home/user/mp3/' dirList = os.listdir(path) dirList.sort() i = 0 for names in dirList: match = re.search(r'\.mp3', names) if match: i = i+1 else: dirList.remove(names) print dirList print i After I run the file, the code does get rid of some files in the list but keeps these two especifically: ['00. Various Artists - Indie Rock Playlist October 2008.m3u', '00. Various Artists - Indie Rock Playlist October 2008.pls'] I can't understand what's going on, why are those two specifically escaping my search.

    Read the article

  • Extract IP address from an html string (python)

    - by GoJian
    My Friends, I really want to extract a simple IP address from a string (actually an one-line html) using Python. But it turns out that 2 hours passed I still couldn't come up with a good solution. >>> s = "<html><head><title>Current IP Check</title></head><body>Current IP Address: 165.91.15.131</body></html>" -- '165.91.15.131' is what I want! I tried using regular expression, but so far I can only get to the first number. >>> import re >>> ip = re.findall( r'([0-9]+)(?:\.[0-9]+){3}', s ) >>> ip ['165'] In fact, I don't feel I have a firm grasp on reg-expression and the above code was found and modified from elsewhere on the web. Seek your input and ideas!

    Read the article

  • How would I make this faster? Parsing Word/sorting by heading [on hold]

    - by Doof12
    Currently it takes about 3 minutes to run through a single 53 page word document. Hopefully you all have some advice about speeding up the process. Code: import win32com.client as win32 from glob import glob import io import re from collections import namedtuple from collections import defaultdict import pprint raw_files = glob('*.docx') word = win32.gencache.EnsureDispatch('Word.Application') word.Visible = False oFile = io.open("rawsort.txt", "w+", encoding = "utf-8")#text dump doccat= list() for f in raw_files: word.Documents.Open(f) doc = word.ActiveDocument #whichever document is active at the time doc.ConvertNumbersToText() print doc.Paragraphs.Count for x in xrange(1, doc.Paragraphs.Count+1):#for loop to print through paragraphs oText = doc.Paragraphs(x) if not oText.Range.Tables.Count >0 : results = re.match('(?P<number>(([1-3]*[A-D]*[0-9]*)(.[1-3]*[0-9])+))', oText.Range.Text) stylematch = re.match('Heading \d', oText.Style.NameLocal) if results!= None and oText.Style != None and stylematch != None: doccat.append((oText.Style.NameLocal, oText.Range.Text[:len(results.group('number'))],oText.Range.Text[len(results.group('number')):])) style = oText.Style.NameLocal else: if oText.Range.Font.Bold == True : doccat.append(style, oText) oFile.write(unicode(doccat)) oFile.close() The for Paragraph loop obviously takes the most amount of time. Is there some way of identifying and appending it without going through every Paragraph?

    Read the article

  • What should I know about Python to identify comments in different source files?

    - by Can't Tell
    I have a need to identify comments in different kinds of source files in a given directory. ( For example java,XML, JavaScript, bash). I have decided to do this using Python (as an attempt to learn Python). The questions I have are 1) What should I know about python to get this done? ( I have an idea that Regular Expressions will be useful but are there alternatives/other modules that will be useful? Libraries that I can use to get this done?) 2) Is Python a good choice for such a task? Will some other language make this easier to accomplish?

    Read the article

  • Are there any way to apply regexp in java ignoring letter case?

    - by Roman
    Simple example: we have string "Some sample string Of Text". And I want to filter out all stop words (i.e. "some" and "of") but I don't want to change letter case of other words which should be retained. If letter case was unimportant I would do this: str.toLowerCase().replaceAll ("a|the|of|some|any", ""); Is there an "ignore case" solution with regular expressions in java?

    Read the article

  • Convert Json date string to JavaScript date object

    - by dagda1
    Hi, I have the following JSON object which has a date field in the following format: { "AlertDate": "\/Date(1277334000000+0100)\/", "Progress": 1, "ReviewPeriod": 12 } I want to write a regular expression or a function to convert it to a javascript object so that it is in the form: { "AlertDate": "AlertDate":new Date(1277334000000), "Progress": 1, "ReviewPeriod": 12 } The above date format fails validation in the JQuery parseJSON method. I would like to convert the 1277334000000+0100 into the correct number of milliseconds to create the correct date when eval is called after validation. Can anyone help me out with a good approach to solving this? Cheers Paul

    Read the article

< Previous Page | 129 130 131 132 133 134 135 136 137 138 139 140  | Next Page >