Python regex to find multiple consecutive punctuations -


i streaming plain text records via mapreduce , need check each plain text record 2 or more consecutive punctuation symbols. 12 symbols need check are: -/\()!"+,'&..

i have tried translating punctuation list array this: punctuation = [r'-', r'/', r'\\', r'\(', r'\)', r'!', r'"', r'\+', r',', r"'", r'&', r'\.']

i can find individual characters nested loops, example:

for t in test_cases:     print t     p in punctuation:         print p         if re.search(p, t):             print 'found match!', p, t         else:             print 'no match' 

however, single backslash character not found when test , don't know how results 2 or more consecutive occurrences in row. i've read need use + symbol, don't know correct syntax use this.

here test cases:

the quick '''brown fox &&quick brown fox quick\brown fox quick\\brown fox -quick brown// fox quick--brown fox (quick brown) fox,,, quick ++brown fox "quick brown" fox quick/brown fox quick&brown fox ""quick"" brown fox quick,, brown fox quick brown fox… quick-brown fox ((quick brown fox quick brown)) fox quick brown fox!!! 'quick' brown fox 

which when translated pythonic list looks this:

test_cases = [ "the quick '''brown fox", 'the &&quick brown fox', 'the quick\\brown fox', 'the quick\\\\brown fox', 'the -quick brown// fox', 'the quick--brown fox', 'the (quick brown) fox,,,', 'the quick ++brown fox', 'the "quick brown" fox', 'the quick/brown fox', 'the quick&brown fox', 'the ""quick"" brown fox', 'the quick,, brown fox', 'the quick brown fox...', 'the quick-brown fox', 'the ((quick brown fox', 'the quick brown)) fox', 'the quick brown fox!!!', "the 'quick' brown fox" ] 

how use python regex identify , report matches punctuation symbol appears 2 or more times in row?

the punctuation characters can put character class square brackets. depends, whether series of 2 or more punctuation characters consists of punctuation character or whether punctuation characters same.

in first case curly braces can appended specify number of minimum (2) , maximum repetitions. latter unbounded , left empty:

[...]{2,} # min. 2 or more 

if repetitions of same character needs found, first matched punctuation character put group. same group (= same character) follows 1 or more:

([...])\1+ 

the reference \1 means first group in expression. groups, represented opening parentheses numbered left right.

the next issue escaping. there escaping rules python strings , additional escaping needed in regular expression. character class not require escaping, backslash must doubled. following example quadruplicates backslash, 1 doubling because of string, second because of regular expression.

raw strings r'...' useful patterns, here both single , double quotation marks needed.

>>> import re >>> test_cases = [     "the quick '''brown fox",     'the &&quick brown fox',     'the quick\\brown fox',     'the quick\\\\brown fox',     'the -quick brown// fox',     'the quick--brown fox',     'the (quick brown) fox,,,',     'the quick ++brown fox',     'the "quick brown" fox',     'the quick/brown fox',     'the quick&brown fox',     'the ""quick"" brown fox',     'the quick,, brown fox',     'the quick brown fox...',     'the quick-brown fox',     'the ((quick brown fox',     'the quick brown)) fox',     'the quick brown fox!!!',     "the 'quick' brown fox" ] >>> pattern_any_punctuation = re.compile('([-/\\\\()!"+,&\'.]{2,})') >>> pattern_same_punctuation = re.compile('(([-/\\\\()!"+,&\'.])\\2+)') >>> t in test_cases:     match = pattern_same_punctuation.search(t)     if match:         print("{:24} => {}".format(t, match.group(1)))     else:         print(t)  quick '''brown fox   => ''' &&quick brown fox    => && quick\brown fox quick\\brown fox     => \\ -quick brown// fox   => // quick--brown fox     => -- (quick brown) fox,,, => ,,, quick ++brown fox    => ++ "quick brown" fox quick/brown fox quick&brown fox ""quick"" brown fox  => "" quick,, brown fox    => ,, quick brown fox...   => ... quick-brown fox ((quick brown fox    => (( quick brown)) fox    => )) quick brown fox!!!   => !!! 'quick' brown fox >>>  

Comments

Popular posts from this blog

Django REST Framework perform_create: You cannot call `.save()` after accessing `serializer.data` -

Why does Go error when trying to marshal this JSON? -