Skip to content

Regex

Basic finding

python
import re

text = "The agent's phone number is 407-444-1234. Call soon!"

pattern = "phone"

match = re.search(pattern, text)    # Returns first match, None if not found
print(match.span()) # (12,17) - location of word
print(match.start() + match.end()) # 29 - sum

text = "one one two two three"

matches = re.findall('one', text)
print(matches) # ['one', 'one'] - List of matches, just the text though

# Iterate through matches
for match in re.finditer('one', text):
    # Returns match objects
    print(match.span())
    print(match.group())    # Returns actual text

Character identifiers and Quantifiers

IdentifierMeaning
\dDigit (123)
\dDigit (123)
\DNot digit (AbC)
\wAlphanumeric (number, letter, some special characters)
\WNot alphanumeric (symbols, +=-*)
\sWhitespace
\SNot whitespace
.Wildcard (any character)
QuantifierMeaning
+Occurs one or more
{n}Occurs exactly n times
{s,e}Occurs s to e times
{n,}Occurs n or more
*Occurs zero or more
?Occurs once or none (basically optional letter/number)
python
text = "The agent's phone number is 407-444-1234. Call soon!"
phone = re.search(r'\d{3}-\d{3}-\d{4}', text)
print(phone.group()) # 407-444-1234

# Grouping regex
phone_pattern = re.compile(r'(\d{3})-(\d{3})-(\d{4})')
phone = re.search(phone_pattern, text)
print(phone.group(1)) # 407 - INDEX STARTS AT 1

# OR, pipe operator
re.search(r'cat|dog', 'The cat is here')

# Wildcard, .
print(re.findall(r'.at', 'The cat in the hat sat there.')) # ['cat', 'hat', 'sat']

# Starts with, ^
print(re.findall(r'^T.*', 'The cat in the hat sat there.')) # ['The cat in the hat sat there.']

#ends with, $
print(re.findall(r'\d$', 'The number is 2')) # ['2']