Search Results

Search found 1 results on 1 pages for 'lattyware'.

Page 1/1 | 1 

  • Python Regular Expressions: Capture lookahead value (capturing text without consuming it)

    - by Lattyware
    I wish to use regular expressions to split words into groups of (vowels, not_vowels, more_vowels), using a marker to ensure every word begins and ends with a vowel. import re MARKER = "~" VOWELS = {"a", "e", "i", "o", "u", MARKER} word = "dog" if word[0] not in VOWELS: word = MARKER+word if word[-1] not in VOWELS: word += MARKER re.findall("([%]+)([^%]+)([%]+)".replace("%", "".join(VOWELS)), word) In this example we get: [('~', 'd', 'o')] The issue is that I wish the matches to overlap - the last set of vowels should become the first set of the next match. This appears possible with lookaheads, if we replace the regex as follows: re.findall("([%]+)([^%]+)(?=[%]+)".replace("%", "".join(VOWELS)), word) We get: [('~', 'd'), ('o', 'g')] Which means we are matching what I want. However, it now doesn't return the last set of vowels. The output I want is: [('~', 'd', 'o'), ('o', 'g', '~')] I feel this should be possible (if the regex can check for the second set of vowels, I see no reason it can't return them), but I can't find any way of doing it beyond the brute force method, looping through the results after I have them and appending the first character of the next match to the last match, and the last character of the string to the last match. Is there a better way in which I can do this? The two things that would work would be capturing the lookahead value, or not consuming the text on a match, while capturing the value - I can't find any way of doing either.

    Read the article

1