Regex
Basic regex patterns and rules for pattern matching and text processing
0. Regular Expression
Match, Find or Manage Text
Rules
- On default, it will find any matching text
- . -> matches to 'Any Character'
- [...] = Character Sets
- [abc] -> matches to 'One of the Character between []', in this case, a,b,or c
- [^abc] -> opposite to above, it will not match to a text if text contains one of the chracter inside [^...]
- [a-c] -> selects range, characters between a-c will match (inclusive)
- Repetitions -> + * ?
- e* -> matches to '', 'e', 'ee', 'eee', ... -> # of 'e' => 0~
- e+ -> matches to 'e', 'ee', 'eee', ... -> # of 'e' => 1~
- e? -> indicates that 'e' is optional
- { n } => n is integer, charactor in front of {n} needs to be repeated n times
- e{ n, } -> 'e' needs to be repeated 'at least n times'
- e{ n, n+m } -> inclusive range (n~n+m)
- Capture Group -> ( ... )
- \n -> references nth group
- (?:) -> non-capturing groups, can't reference this group with \n
- (c|r) -> captures c or r
- | -> doesn't have to be inside ()
- Line starts, ends -> ^ (start), $ (end)
- Word Character -> \w
- letter, number, and underscore
- no dots, colons, special marks (something like ?!)
- \W -> opposite (excludes any word character)
- Number Character -> \d
- Space Character -> \s
- Lookahead -> looks next word or character
- positive: \d+(?=ab) -> looks for ab after some digits
- negative: \d+(?!ab) -> looks if ab doesn't exist after some digits
- Lookbehind -> looks previous word or character
- positive: (?<=$)\d+ -> looks for digits that comes after '$'
- negative: (?<!$)
- Flags
- global flag: select all matches -> //g
- multiline flag: handle each line seperately -> //m
- case-insensitive flag -> //i
- all of above can be combined -> //gm, //gmi
- greedy & lazy matching (I don't get this)
- .*r -> will find any text that ends with r
- .*?r -> will find the first word that ends with r
# 3.
b[aei]r # this will match with 'bar', 'ber', 'bir'
b[^aei]r # this won't match with 'bar', 'ber', 'bir'
# 4.
be*r # 'br', 'ber', 'beer'
be+r # 'ber', 'beer'
colou?r # 'color', 'colour'
# 5.
[0-9]{4} # any number with 4 digits, 1111, 1234, 5021, ...
# 6.
(ha)-\1, (haa)-\2 # 'ha-ha, haa-haa'
(?:ha)-ha, (haa)-\1 # 'ha-ha, haa-haa', (?:ha) group will not be captured
(c|r)at|dog # 'cat', 'rat', 'dog'
# 7.
^[0-9] # any line that starts with a number(0-9)
html$ # any lined that ends with 'html'
# 11. 12.
\d+(?=PM) # '3 PM', '4 PM'
(?<=\$)\d+ # '$ 1', '$ 100'