Python

Constant dripping wears away a stone

Regular Expressions(RE)


A regular expression is a sequence of characters. The modulereis a built-in package in Python that is used to work with regular expressions such as searching a string for a specific pattern, spliting a string, substituting a substring, etc. Before a program can use the functions in the module, it needs to import the package using the statementimport re.

  • Raw String: whenrorRis put in front of a regular expression, it means a raw string. For example, '\n' represents a new line. However, r'\n' represents a string with two characters:\followed byn.
  • Metacharacters:
    • [characters] Match the characters inside the square brackets.
    • [^characters] Match any characters that are not inside the square brackets.
    • [character1-character2] Match a range of characters from character1 to character2.
      • [b-f] is the same as [bcdef].
      • [0-9] is the same as [0123456789].
      • [0-29] is the same as [0129].
    • \ is followed by a character to signal a special sequence. See next section "Special Sequences" for details.
    • . match any character.
    • ^ is put in front of a string; match the string with a prefix of another string.
    • $ is put after a string; match the string with a suffix of another string.
    • * is put after a character in a string; match zero or more occurrences of the character.
    • + is put after a character in a string; match one or more occurrences of the character.
    • ? is put after a character in a string; match zero or one occurrence of the character.
    • {} is put after a character in a string; match the character the number of occurrences specified inside the curly brackets.
        Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible.
      • {m} match exactlymoccurrences of the character.
      • {m, n} match frommtonoccurrences of the character; match as many occurrences in the range [m, n] as possible.
      • {m, } match at leastmoccurrences of the character; match as many occurrences as possible.
      • {, n} match at mostnoccurrences of the character; match as many occurrences as possible.
      • {m, n}? match frommtonoccurrences of the character; matches as few occurrences as possible.
    • | means or. For example, A|B is to match A or B.
  • Special Sequences:
    • \A is followed by a string. Match the string with a prefix of another string. A prefix of a string is a substring that contains consecutive characters from the beginning of the string.
    • \b is followed by a string or follows a string. If it is followed by a string, match that string with prefixes of another string. If it follows a string, match that string with suffixes of another string. A suffix of a string is a substring that contains consecutive characters to the end of the string.
    • \B is followed by a string or follows a string. If it is followed by a string, return a match if that string is a substring of another string but is not a prefix of the string. If it follows a string, returns a match if that string is a substring of another string but is not a suffix of the string.
    • \d match digit characters.
    • \D match non-digit characters.
    • \s match whitespaces.
    • \S match non-whitespace characters.
    • \w match word characters (a-z, A-Z, 0-9, and the underscore character _).
    • \W match non-word characters.
    • \Z is followed by a string; match that string with a suffix of another string.
  • Regular Expression Functions:
    • findall(pattern, string[, flags=0]): Find all the occurrences ofpatterninstringand return a list of all matches in the order they are found. If no matches are found, return an empty list.
    • search(pattern, string[, flags=0]): Search the first occurrence ofpatterninstringand return the corresponding match object. If no matched pattern, return None.
    • split(pattern, string[, maxsplit=0, flags=0]): Splitstringby the occurrences ofpatternand return a list of the split strings. Ifmaxsplitis given, only split the string by at mostmaxsplitoccurrences of the pattern and return a list of the split strings.
    • sub(pattern, replacement, string[, count=0, flags=0]): Replace every occurrence ofpatterninstringwith replacementand return the changed string. If no such pattern is found in the string, return the original string. Ifcountis given, only replace the first countoccurrences ofpatternwith replacement.
    • subn(pattern, replacement, string[, count=0, flags=0]): Works the same as sub(pattern, replacement, string[, count=0, flags=0]), except that it returns a tuple which includes the new string and the number of replacements made.
    • match(pattern, string[, flags=0]): Ifpatternmatches a prefix ofstring, return the corresponding a match object instance. If no such match, return None.
    • fullmatch(pattern, string[, flags=0]): Ifpatternmatches the entire stringstring,return the corresponding match object; otherwise, return None.
    • finditer(pattern, string[, flags=0]): Matchpatternwith substrings instringand return an iterator that yields the corresponding match object instances of the non-overlapping matched substrings. If no matches, return an empty iterator.
  • Match Object:
    • span(): Return a tuple of the start and end positions of the given match object if it is not None.
    • string: If the given match object is not None, return its corresponding string.
    • group(): If the given match object is not None, return its corresponding matched substring.
    • start(): If the given match object is not None, return the first index of its matched substring.
    • end(): If the given match object is not None, return the last index of its matched substring.
    • pos: If the given match object is not None, return the first index of its corresponding string.
    • endpos: If the given match object is not None, return the last index of its corresponding string.