• Home
  • About
  • Notes
  • Projects
<- Back to notes
notestools9/23/2025

Regex

Basic regex patterns and rules for pattern matching and text processing

0. Regular Expression

Match, Find or Manage Text

Rules

  1. On default, it will find any matching text
  2. . -> matches to 'Any Character'
  3. [...] = Character Sets
    • [abc] -> matches to 'One of the Character between []', in this case, a,b,or c
    • [^abc] -> opposite to above, it will not match to a text if text contains one of the chracter inside [^...]
    • [a-c] -> selects range, characters between a-c will match (inclusive)
  4. Repetitions -> + * ?
    • e* -> matches to '', 'e', 'ee', 'eee', ... -> # of 'e' => 0~
    • e+ -> matches to 'e', 'ee', 'eee', ... -> # of 'e' => 1~
    • e? -> indicates that 'e' is optional
  5. { n } => n is integer, charactor in front of {n} needs to be repeated n times
    • e{ n, } -> 'e' needs to be repeated 'at least n times'
    • e{ n, n+m } -> inclusive range (n~n+m)
  6. Capture Group -> ( ... )
    • \n -> references nth group
    • (?:) -> non-capturing groups, can't reference this group with \n
    • (c|r) -> captures c or r
    • | -> doesn't have to be inside ()
  7. Line starts, ends -> ^ (start), $ (end)
  8. Word Character -> \w
    • letter, number, and underscore
    • no dots, colons, special marks (something like ?!)
    • \W -> opposite (excludes any word character)
  9. Number Character -> \d
  10. Space Character -> \s
  11. Lookahead -> looks next word or character
    • positive: \d+(?=ab) -> looks for ab after some digits
    • negative: \d+(?!ab) -> looks if ab doesn't exist after some digits
  12. Lookbehind -> looks previous word or character
    • positive: (?<=$)\d+ -> looks for digits that comes after '$'
    • negative: (?<!$)
  13. Flags
    • global flag: select all matches -> //g
    • multiline flag: handle each line seperately -> //m
    • case-insensitive flag -> //i
    • all of above can be combined -> //gm, //gmi
  14. greedy & lazy matching (I don't get this)
    • .*r -> will find any text that ends with r
    • .*?r -> will find the first word that ends with r
# 3.
b[aei]r # this will match  with 'bar', 'ber', 'bir'
b[^aei]r # this won't match with 'bar', 'ber', 'bir'

# 4.
be*r # 'br', 'ber', 'beer'
be+r # 'ber', 'beer'
colou?r # 'color', 'colour'

# 5.
[0-9]{4} # any number with 4 digits, 1111, 1234, 5021, ...

# 6.
(ha)-\1, (haa)-\2 # 'ha-ha, haa-haa'
(?:ha)-ha, (haa)-\1 # 'ha-ha, haa-haa', (?:ha) group will not be captured
(c|r)at|dog # 'cat', 'rat', 'dog'

# 7.
^[0-9] # any line that starts with a number(0-9)
html$ # any lined that ends with 'html'

# 11. 12.
\d+(?=PM) # '3 PM', '4 PM'
(?<=\$)\d+ # '$ 1', '$ 100'