Regular Expression

What is the Regular Expression

A regular expression is a sequence of characters that define a search pattern.

What a regular expression can match

  • Characters
  • Position

Meta Characters

symbol description example
\d matches a digit 1
\s matches a space
\w matches an alphanumeric character, including "_" 1 2 w b _
\n matches new-line
\r matches carriage return
\t matches a tab
. matches any character 1234a 0-!@#$%^&*()_

Escape Character

\

preserved chars
* . ? + $ ^ [ ] ( ) { } | \ /

NOT AND OR

OR Operator

symbol description example
[] matches any one of the character in the square bracket c[aoe]n can con cen
[a-z0-9A-Z] matches a to z or 0 ot 9 or A to Z c[a-e]n can cbn ccn cdn cen
| matches either the characters before and after the vertical bar a(bc|de) abc ade

Negation Operator

symbol description example
\W matches the non-alphanumeric character !@#$%^^&**(
\S matches the non-space character
\D matches the non-numeric character abcd
[^] matches the character not in the bracket `[^a] bcd

Quantifiers

symbol description example
* matches any times, from 0 to many abc* ab abc abcc
+ matches at least once, from 1 to many abc+ abc abcc
? matches at most once, from 0 to 1 abc? ab abc
{2} matches exactly 2 times abc{2} abcc
{2,} matches at least 2 time, from 2 to many abc{2,} abcc abccc
{2,5} matches from 2 to 5 times, including 2 and 5 abc{2,5} abcc abcccc

Possessive or Lazy match

Possessive match: match as much as possible, by default
Lazy match: match as least as possible, add a ? at the end of the quantifiers

<.*?>

This is a <div> simple div</div> test

Match Position

symbol description example
\b matches the start or the end position of the word abc\b abc not match abcd
\B matches neighter start or end position of the word abc\B abcd not match abc
^ matches the begin of a line
$ matches the end of a line

Capture Grouping

symbol description example
() matches the pattern in the bracket tel: (\d{8}) tel: 123456789
(?<name>) matches the pattern in the bracket, with a name of the grouping (?<phoneNumber>\d{8}) tel:12345678

Look-ahead & Look-behind Zero-Length Assertions

symbol description example
(?<=pattern1)pattern2 looking for the pattern2 which aheaded by pattern 1, but not capturing pattern 1 (?<=tel:)\d{8} tel:12345678
(?<!pattern1)pattern2 looking for the pattern2 which not aheaded by pattern 1, and not capturing pattern 1 (?<!tel:)\d{8}) abc:12345678
pattern2(?=pattern1) looking for the pattern2 which followed by pattern 1, but not capturing pattern 1 .*(?= is awesome) regex is awesome
pattern2(?!pattern1) looking for the pattern2 which not followed by pattern 1, but not capturing pattern 1 java(?! is awesome) java is great

Flags

flag description
i makes the whole expression case-insensitive
g does not return after the first match, restarting the subsequent searches from the end of the previous match
m when enabled ^ and $ will match the start and end of a line, instead of the whole string

Back-references

symbol description example
\number refrence the group based on the capturing group number <(\w+)>[^<]+<\/\1> <h1>hello world</h1>
\k<groundName> refrence the group based on the group name <(?<tag>\w+)>[^<]+<\/\k<tag>\> <h1>hello world</h1>

Puzzle

how to catch the content of a tag, but whichout capture the tag?

<(?<tag>\w+)>[^<]+<\/\k<tag>\>

<h1>hello world</h1>

(?<=<(?<tag>\w+)>)[^<]+(?=<\/\k<tag>>)

Q & A

Thank You