Python Regular Expressions: Capture lookahead value (capturing text without consuming it)

Posted by Lattyware on Stack Overflow See other posts from Stack Overflow or by Lattyware
Published on 2012-04-09T23:16:19Z Indexed on 2012/04/09 23:28 UTC
Read the original article Hit count: 345

Filed under:

lookaround

I wish to use regular expressions to split words into groups of (vowels, not_vowels, more_vowels), using a marker to ensure every word begins and ends with a vowel.

import re

MARKER = "~"
VOWELS = {"a", "e", "i", "o", "u", MARKER}

word = "dog"

if word[0] not in VOWELS:
    word = MARKER+word

if word[-1] not in VOWELS:
    word += MARKER

re.findall("([%]+)([^%]+)([%]+)".replace("%", "".join(VOWELS)), word)

In this example we get:

[('~', 'd', 'o')]

The issue is that I wish the matches to overlap - the last set of vowels should become the first set of the next match. This appears possible with lookaheads, if we replace the regex as follows:

re.findall("([%]+)([^%]+)(?=[%]+)".replace("%", "".join(VOWELS)), word)

We get:

[('~', 'd'), ('o', 'g')]

Which means we are matching what I want. However, it now doesn't return the last set of vowels. The output I want is:

[('~', 'd', 'o'), ('o', 'g', '~')]

I feel this should be possible (if the regex can check for the second set of vowels, I see no reason it can't return them), but I can't find any way of doing it beyond the brute force method, looping through the results after I have them and appending the first character of the next match to the last match, and the last character of the string to the last match. Is there a better way in which I can do this?

The two things that would work would be capturing the lookahead value, or not consuming the text on a match, while capturing the value - I can't find any way of doing either.

Developer IT

Python Regular Expressions: Capture lookahead value (capturing text without consuming it) - Developer IT

Python Regular Expressions: Capture lookahead value (capturing text without consuming it)

python

regex

python-3.x

lookaround

Related posts about python

unmet dependencies in Ubuntu 12.04

How can I get sikuli-ide to work?

Getting PATH right for python after MacPorts install

call python with system() in R to run a python script emulating the python console

Python - Calling a non python program from python?

Related posts about regex

Find multiple regex in each line and skip result if one of the regex doesn't match

OWASP Regex Repository: Is this regex correct?

Make a Perl-style regex interpreter behave like a basic or extended regex interpreter

JS regex isn't matching, even thought it works with a regex tester

c# RegEx with "|"

Categories cloud