User guide
×
Menu

RegEx Reference

 
Syntax
Description
Example
Characters
Any character except [\^$.|?*+()
All characters except the listed special characters match a single instance of themselves
a matches a
{ and }
{ and } are literal characters, unless they’re part of a valid regular expression token such as a quantifier {3}
{ matches {
\ followed by any of [\^$.|?*+(){}
A backslash escapes special characters to suppress their special meaning
\* matches *
\Q...\E
Matches the characters between \Q and \E literally, suppressing the meaning of special characters
\Q+-*/\E matches +-*/
\xFF where FF are 2 hexadecimal digits
Matches the character at the specified position in the code page
\xA9 matches © when using the Latin-1 code page
\n\r and \t
Match an LF character, CR character and a tab character respectively
\r\n matches a Windows CRLF line break
\R
Matches any line break, including CRLF as a pair, CR only, LF only, form feed, vertical tab, and any Unicode line break
 
\R
Matches the next line control character U+0085
 
\R
CRLF line breaks are indivisible
\R{2} and \R\R cannot match \r\n
\a
Match the “alert” or “bell” control character (ASCII 0x07)
 
\e
Match the “escape” control character (ASCII 0x1B)
 
\f
Match the “form feed” control character (ASCII 0x0C)
 
\cA through \cZ
Match an ASCII character Control+A through Control+Z, equivalent to \x01 through \x1A
\cM\cJ matches a Windows CRLF line break
\ca through \cz
Match an ASCII character Control+A through Control+Z, equivalent to \x01 through \x1A
\cm\cj matches a Windows CRLF line break
\0
Match the NULL character
 
\o{7777} where 7777 is any octal number
Matches the character at the specified position in the active code page
\o{20254} matches  when using Unicode
\10 through \77
Matches the character at the specified position in the ASCII table
\77 matches ?
\100 through \177
Matches the character at the specified position in the ASCII table
\100 matches @
\200 through \377
Matches the character at the specified position in the active code page
\377 matches ÿ when using the Latin-1 code page
\400 through \777
Matches the character at the specified position in the active code page
\777 matches ǿ when using Unicode
\01 through \07
Matches the character at the specified position in the ASCII table
\07 matches the “bell” character
\010 through \077
Matches the character at the specified position in the ASCII table
\077 matches ?
Basic
. (dot)
Matches any single character except line break characters. Most regex flavors have an option to make the dot match line break characters too.
. matches x or (almost) any other character
\N
Matches any single character except line break characters, like the dot, but is not affected by any options that make the dot match all characters including line breaks.
\N matches x or any other character that is not a line break
| (pipe)
Causes the regex engine to match either the part on the left side, or the part on the right side. Can be strung together into a series of alternatives.
abc|def|xyz matches abcdef or xyz
Alternation returns the first alternative that matches.
a|ab matches a in ab
Character Classes
[
When used outside a character class, [ begins a character class. Inside a character class, different rules apply. Unless otherwise noted, the syntax is only valid inside character classes, while the syntax on all other sections is not valid inside character classes.
 
Any character except ^-]\
All characters except the listed special characters are literal characters that add themselves to the character class.
[abc] matches ab or c
\ (backslash) followed by any of ^-]\
A backslash escapes special characters to suppress their special meaning.
[\^\]] matches ^ or ]
- (hyphen) between two tokens that each specify a single character.
Adds a range of characters to the character class.
[a-zA-Z0-9] matches any ASCII letter or digit
^ (caret) immediately after the opening [
Negates the character class, causing it to match a single character not listed in the character class.
[^a-d] matches x (any character except a, b, c or d)
[
An opening square bracket is a literal character that adds an opening square bracket to the character class.
[ab[cd]ef] matches aef]bef][ef]cef], and def]
\n\r and \t
Add an LF character, a CR character, or a tab character to the character class, respectively.
[\n\r\t] a line feed, a carriage return, or a tab.
\a
Add the “alert” or “bell” control character (ASCII 0x07) to the character class.
[\a\t] matches a bell or a tab character.
\b
Add the “backspace” control character (ASCII 0x08) to the character class.
[\b\t] matches a backspace or a tab character.
\e
Add the “escape” control character (ASCII 0x1B) to the character class.
[\e\t] matches an escape or a tab character.
\f
Add the “form feed” control character (ASCII 0x0C) to the character class.
[\f\t] matches a form feed or a tab character.
[:alpha:]
Matches one character from a POSIX character class.
[[:digit:][:lower:]] matches one of 0 through 9 or a through z
[:^alpha:]
Matches one character that is not part of a specific POSIX character class.
[5[:^digit:]] matches the digit 5 or any other character that is not a digit.
Shorthand
\d
Adds all digits to the character class. Matches a single digit if used outside character classes.
[\d] and/or \d match a character that is a digit
\w
Adds all word characters to the character class. Matches a single word character if used outside character classes.
[\w] and/or \w match any single word character
\s
Adds all whitespace to the character class. Matches a single whitespace character if used outside character classes.
[\s] and/or \s match any single whitespace character
\v
Adds all vertical whitespace to the character class. Matches a single vertical whitespace character if used outside character classes.
[\v] and/or \v match any single vertical whitespace character
\h
Adds all horizontal whitespace to the character class. Matches a single horizontal whitespace character if used outside character classes.
[\h] and/or \h match any single horizontal whitespace character
Anchors
^ (caret)
Matches at the start of the string the regex pattern is applied to.
^. matches a in abc\ndef
$ (dollar)
Matches at the end of the string the regex pattern is applied to.
.$ matches f in abc\ndef
$ (dollar)
Matches before the final line break in the string (if any) in addition to matching at the very end of the string.
.$ matches f in abc\ndef\n
^ (caret)
Matches after each line break in addition to matching at the start of the string, thus matching at the start of each line in the string.
^. matches a and d in abc\ndef
$ (dollar)
Matches before each line break in addition to matching at the end of the string, thus matching at the end of each line in the string.
.$ matches c and f in abc\ndef
\A
Matches at the start of the string the regex pattern is applied to.
\A\w matches only a in abc
\G
Matches at the start of the match attempt.
\G\w matches ab, and c when iterating over all matches in abc def
\z
Matches at the end of the string the regex pattern is applied to.
\w\z matches f in abc\ndef but fails to match abc\ndef\n
\Z
Matches at the end of the string as well as before the final line break in the string (if any).
.\Z matches f in abc\ndef and in abc\ndef\n but fails to match abc\ndef\n\n
Word Boundaries
\b
Matches at a position that is followed by a word character but not preceded by a word character, or that is preceded by a word character but not followed by a word character.
\b. matches a,  , and d in abc def
\B
Matches at a position that is preceded and followed by a word character, or that is not preceded and not followed by a word character.
\B. matches bce, and f in abc def
[[:<:]]
Matches at a position that is followed by a word character but not preceded by a word character.
[[:<:]]. matches a and d in abc def
[[:>:]]
Matches at a position that is preceded by a word character but not followed by a word character.
.[[:>:]] matches c and f in abc def
Quantifiers
? (question mark)
Makes the preceding item optional. Greedy, so the optional item is included in the match if possible.
abc? matches abc or ab
??
Makes the preceding item optional. Lazy, so the optional item is excluded in the match if possible.
abc?? matches ab or abc
?+
Makes the preceding item optional. Possessive, so if the optional item can be matched, then the quantifier won’t give up its match even if the remainder of the regex fails.
abc?+c matches abcc but not abc
* (star)
Repeats the previous item zero or more times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all.
".*" matches "def" "ghi" in abc "def" "ghi" jkl
*?
Repeats the previous item zero or more times. Lazy, so the engine first attempts to skip the previous item, before trying permutations with ever increasing matches of the preceding item.
".*?" matches "def" and "ghi" in abc "def" "ghi" jkl
*+
Repeats the previous item zero or more times. Possessive, so as many items as possible will be matched, without trying any permutations with less matches even if the remainder of the regex fails.
".*+" can never match anything
+ (plus)
Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once.
".+" matches "def" "ghi" in abc "def" "ghi" jkl
+?
Repeats the previous item once or more. Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item.
".+?" matches "def" and "ghi" in abc "def" "ghi" jkl
++
Repeats the previous item once or more. Possessive, so as many items as possible will be matched, without trying any permutations with less matches even if the remainder of the regex fails.
".++" can never match anything
{n} where n is an integer >= 1
Repeats the previous item exactly n times.
a{3} matches aaa
{n,m} where n >= 0 and m >= n
Repeats the previous item between n and m times. Greedy, so repeating m times is tried before reducing the repetition to n times.
a{2,4} matches aaaaaaa or aa
{n,} where n >= 0
Repeats the previous item at least n times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only n times.
a{2,} matches aaaaa in aaaaa
{n,m}? where n >= 0 and m >= n
Repeats the previous item between n and m times. Lazy, so repeating n times is tried before increasing the repetition to m times.
a{2,4}? matches aaaaa or aaaa
{n,}? where n >= 0
Repeats the previous item n or more times. Lazy, so the engine first matches the previous item n times, before trying permutations with ever increasing matches of the preceding item.
a{2,}? matches aa in aaaaa
{n,m}+ where n >= 0 and m >= n
Repeats the previous item between n and m times. Possessive, so as many items as possible up to m will be matched, without trying any permutations with less matches even if the remainder of the regex fails.
a{2,4}+a matches aaaaa but not aaaa
{n,}+ where n >= 0
Repeats the previous item n or more times. Possessive, so as many items as possible will be matched, without trying any permutations with less matches even if the remainder of the regex fails.
a{2,}+a never matches anything
Unicode
\X
Matches a single Unicode grapheme, whether encoded as a single code point or multiple code points using combining marks. A grapheme most closely resembles the everyday concept of a “character”.
\X matches à encoded as U+0061 U+0300, à encoded as U+00E0, ©, etc.
\x{FFFF} where FFFF are 1 to 4 hexadecimal digits
Matches a specific Unicode code point.
\x{E0} matches à encoded as U+00E0 only. \x{A9} matches ©
\pL where L is a Unicode category
Matches a single Unicode code point in the specified Unicode category.
\pL matches à encoded as U+00E0; \pS matches ©
\PL where L is a Unicode category
Matches a single Unicode code point that is not in the specified Unicode category.
\PS matches à encoded as U+00E0; \PL matches ©
\p{L} where L is a Unicode category
Matches a single Unicode code point in the specified Unicode category.
\p{L} matches à encoded as U+00E0; \p{S} matches ©
\p{Script}
Matches a single Unicode code point that is part of the specified Unicode script. Each Unicode code point is part of exactly one script. Scripts never contain unassigned code points.
\p{Greek} matches Ω
\P{Property}
Matches a single Unicode code point that does not have the specified property (category, script, or block).
\P{L} matches ©
\p{^Property}
Matches a single Unicode code point that does not have the specified property (category, script, or block).
\p{^L} matches ©
\P{^Property}
Matches a single Unicode code point that does have the specified property (category, script, or block). Double negative is taken as positive.
\P{^L} matches q
Capturing Groups and Backreferences
(regex)
Parentheses group the regex between them. They capture the text matched by the regex inside them into a numbered group that can be reused with a numbered backreference. They allow you to apply regex operators to the entire grouped regex.
(abc){3} matches abcabcabc. First group matches abc.
(?:regex)
Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything.
(?:abc){3} matches abcabcabc. No groups.
\1 through \99
Substituted with the text matched between the 1st through 99th numbered capturing group.
(abc|def)=\1 matches abc=abc or def=def, but not abc=def or def=abc.
\g1 through \g99
Substituted with the text matched between the 1st through 99th numbered capturing group.
(abc|def)=\g1 matches abc=abc or def=def, but not abc=def or def=abc.
\g{1} through \g{99}
Substituted with the text matched between the 1st through 99th numbered capturing group.
(abc|def)=\g{1} matches abc=abc or def=def, but not abc=def or def=abc.
\g-1\g-2, etc.
Substituted with the text matched by the capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from right to left starting at the backreference.
(a)(b)(c)(d)\g-3 matches abcdb.
\g{-1}\g{-2}, etc.
Substituted with the text matched by the capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from right to left starting at the backreference.
(a)(b)(c)(d)\g{-3} matches abcdb.
Any numbered backreference
Backreferences to groups that did not participate in the match attempt fail to match.
(a)?\1 matches aa but fails to match b.
Any numbered backreference
Backreferences can be used inside the group they reference.
(a\1?){3} matches aaaaaa.
Any numbered backreference
Backreferences can be used before the group they reference.
(\2?(a)){3} matches aaaaaa.
Named Groups and Backreferences
(?<name>regex)
Captures the text matched by “regex” into the group “name”. The name can contain letters and numbers but must start with a letter.
(?<x>abc){3} matches abcabcabc. The group x matches abc.
(?'name'regex)
Captures the text matched by “regex” into the group “name”. The name can contain letters and numbers but must start with a letter.
(?'x'abc){3} matches abcabcabc. The group x matches abc.
(?P<name>regex)
Captures the text matched by “regex” into the group “name”. The name can contain letters and numbers but must start with a letter.
(?P<x>abc){3} matches abcabcabc. The group x matches abc.
Any named group
Two named groups can share the same name.
(?<x>a)|(?<x>b) matches a or b.
Any named group
If a regex has multiple groups with the same name, backreferences using that name point to the leftmost group in the regex with that name.
 
Any named group
If a regex has multiple groups with the same name, backreferences using that name point to the leftmost group with that name that has actually participated in the match attempt when the backreference is evaluated.
 
\k<name>
Substituted with the text matched by the named group “name”.
(?<x>abc|def)=\k<x> matches abc=abc or def=def, but not abc=def or def=abc.
\k'name'
Substituted with the text matched by the named group “name”.
(?'x'abc|def)=\k'x' matches abc=abc or def=def, but not abc=def or def=abc.
\k{name}
Substituted with the text matched by the named group “name”.
(?'x'abc|def)=\k{x} matches abc=abc or def=def, but not abc=def or def=abc.
\g{name}
Substituted with the text matched by the named group “name”.
(?'x'abc|def)=\g{x} matches abc=abc or def=def, but not abc=def or def=abc.
(?P=name)
Substituted with the text matched by the named group “name”.
(?P<x>abc|def)=(?P=x) matches abc=abc or def=def, but not abc=def or def=abc.
Any named backreference
Backreferences to groups that did not participate in the match attempt fail to match.
(?<x>a)?\k<x> matches aa but fails to match b.
Any named backreference
Backreferences can be used inside the group they reference.
(?<x>a\k<x>?){3} matches aaaaaa.
Any named backreference
Backreferences can be used before the group they reference.
(\k<x>?(?<x>a)){3} matches aaaaaa.
Any named capturing group
A number is a valid name for a capturing group.
(?<17>abc){3} matches abcabcabc. The group named “17” matches abc.
Any named backreference
A number is a valid name for a backreference which then points to a group with that number as its name.
(?<17>abc|def)=\k<17> matches abc=abc or def=def, but not abc=def or def=abc.
Special Groups
(?#comment)
Everything between (?# and ) is ignored by the regex engine.
a(?#foobar)b matches ab
(?|regex)
If the regex inside the branch reset group has multiple alternatives with capturing groups, then the capturing group numbers are the same in all the alternatives.
(x)(?|(a)|(bc)|(def))\2 matches xaa, xbcbc, or xdefdef with the first group capturing x and the second group capturing abc, or def
(?>regex)
Atomic groups prevent the regex engine from backtracking back into the group after a match has been found for the group. If the remainder of the regex fails, the engine may backtrack over the group if a quantifier or alternation makes it optional. But it will not backtrack into the group to try other permutations of the group.
a(?>bc|b)c matches abcc but not abc
(?=regex)
Matches at a position where the pattern inside the lookahead can be matched. Matches only the position. It does not consume any characters or expand the match. In a pattern like one(?=two)three, both two and three have to match at the position where the match of one ends.
t(?=s) matches the second t in streets.
(?!regex)
Similar to positive lookahead, except that negative lookahead only succeeds if the regex inside the lookahead fails to match.
t(?!s) matches the first t in streets.
(?<=regex)
Matches at a position if the pattern inside the lookbehind can be matched ending at that position.
(?<=s)t matches the first t in streets.
(?<!regex)
Matches at a position if the pattern inside the lookbehind cannot be matched ending at that position.
(?<!s)t matches the second t in streets.
(?<=regex|longer regex)
Alternatives inside lookbehind can differ in length.
(?<=is|e)t matches the second and fourth t in twisty streets.
\K
The text matched by the part of the regex to the left of the \K is omitted from the overall regex match. Other than that the regex is matched normally from left to right. Capturing groups to the left of the \K capture as usual.
s\Kt matches only the first t in streets.
(?(?=regex)then|else) where (?=regex) is any valid lookaround and then and else are any valid regexes
If the lookaround succeeds, the “then” part must match for the overall regex to match. If the lookaround fails, the “else” part must match for the overall regex to match. The lookaround is zero-length. The “then” and “else” parts consume their matches like normal regexes.
(?(?<=a)b|c) matches the second b and the first c in babxcac
(?(name)then|else) where name is the name of a capturing group and then and else are any valid regexes
If the capturing group with the given name took part in the match attempt thus far, the “then” part must match for the overall regex to match. If the capturing group did not take part in the match thus far, the “else” part must match for the overall regex to match.
(?<one>a)?(?(one)b|c) matches ab, the first c, and the second c in babxcac
(?(<name>)then|else) where name is the name of a capturing group and then and else are any valid regexes
If the capturing group with the given name took part in the match attempt thus far, the “then” part must match for the overall regex to match. If the capturing group did not take part in the match thus far, the “else” part must match for the overall regex to match.
(?<one>a)?(?(<one>)b|c) matches ab, the first c, and the second c in babxcac
(?('name')then|else) where name is the name of a capturing group and then and else are any valid regexes
If the capturing group with the given name took part in the match attempt thus far, the “then” part must match for the overall regex to match. If the capturing group did not take part in the match thus far, the “else” part must match for the overall regex to match.
(?'one'a)?(?('one')b|c) matches ab, the first c, and the second c in babxcac
(?(1)then|else) where 1 is the number of a capturing group and then and else are any valid regexes
If the referenced capturing group took part in the match attempt thus far, the “then” part must match for the overall regex to match. If the capturing group did not take part in the match thus far, the “else” part must match for the overall regex to match.
(a)?(?(1)b|c) matches ab, the first c, and the second c in babxcac
(?(-1)then|else) where -1 is a negative integer and then and else are any valid regexes
Conditional that tests the capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from right to left starting immediately before the conditional. If the referenced capturing group took part in the match attempt thus far, the “then” part must match for the overall regex to match. If the capturing group did not take part in the match thus far, the “else” part must match for the overall regex to match.
(a)?(?(-1)b|c) matches ab, the first c, and the second c in babxcac
(?(+1)then|else) where +1 is a positive integer and then and else are any valid regexes
Conditional that tests the capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from left to right starting at the “then” part of conditional. If the referenced capturing group took part in the match attempt thus far, the “then” part must match for the overall regex to match. If the capturing group did not take part in the match thus far, the “else” part must match for the overall regex to match.
((?(+1)b|c)(d)?){2} matches cc and cdb in bdbdccxcdcxcdb
Mode Modifiers
(?letters) at the start of the regex
A mode modifier at the start of the regex affects the whole regex and overrides any options set outside the regex.
(?i)a matches a and A.
(?letters) in the middle of the regex
A mode modifier in the middle of the regex affects only the part of the regex to the right of the modifier. If the modifier is used inside a group, it only affects the part of the regex inside that group to the right of the modifier. If the regex or group uses alternation, all alternatives to the right of the modifier are affected.
te(?i)st matches test and teST but not TEst or TEST.
(?letters:regex)
Non-capturing group with modifiers that affect only the part of the regex inside the group.
te(?i:st) matches test and test but not TEst or TEST.
(?on-off) and (?on-off:regex)
Modifier letters (if any) before the hyphen are turned on, while modifier letters after the hyphen are turned off.
(?i)te(?-i)st matches test and TEst but not teST or TEST.
(?i)
Turn on case insensitivity.
(?i)a matches a and A.
(?x)
Turn on free-spacing mode to ignore whitespace between regex tokens and allow # comments.
(?x)a#b matches a
(?s)
Make the dot match all characters including line break characters.
(?s).* matches ab\n\ndef in ab\n\ndef
(?m)
Make ^ and $ match at the start and end of each line.
(?m)^. Matches a and d in ab\n\ndef
(?J)
Allow multiple named capturing groups to share the same name.
(?J)(?:(?'x'a)|(?'x'b))\k'x' matches aa or bb
(?U)
Switches the syntax for greedy and lazy quantifiers. It’s use is strongly discouraged because it confuses the meaning of the standard quantifier syntax.
(?U)a* is lazy and (?U)a*? is greedy
(?X)
Treat letters that are escaped with a backslash and that don’t form a regex token as an error instead of as a literal.
(?X)\q is an error while (?-X)\q matches q
Recursion and Balancing Groups
(?R)
Recursion of the entire regular expression.
a(?R)?z matches az, aazz, aaazzz, etc.
(?0)
Recursion of the entire regular expression.
a(?0)?z matches az, aazz, aaazzz, etc.
\g<0>
Recursion of the entire regular expression.
a\g<0>?z matches az, aazz, aaazzz, etc.
\g'0'
Recursion of the entire regular expression.
a\g'0'?z matches az, aazz, aaazzz, etc.
(?1) where 1 is the number of a capturing group
Recursion of a capturing group or subroutine call to a capturing group.
a(b(?1)?y)z matches abyz, abbyyz, abbbyyyz, etc.
\g<1> where 1 is the number of a capturing group
Recursion of a capturing group or subroutine call to a capturing group.
a(b\g<1>?y)z matches abyz, abbyyz, abbbyyyz, etc.
\g'1' where 1 is the number of a capturing group
Recursion of a capturing group or subroutine call to a capturing group.
a(b\g'1'?y)z matches abyz, abbyyz, abbbyyyz, etc.
(?-1) where -1 is a negative integer
Recursion of or subroutine call to a capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from right to left starting at the subroutine call.
a(b(?-1)?y)z matches abyz, abbyyz, abbbyyyz, etc.
\g<-1> where -1 is a negative integer
Recursion of or subroutine call to a capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from right to left starting at the subroutine call.
a(b\g<-1>?y)z matches abyz, abbyyz, abbbyyyz, etc.
\g'-1' where -1 is a negative integer
Recursion of or subroutine call to a capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from right to left starting at the subroutine call.
a(b\g'-1'?y)z matches abyz, abbyyz, abbbyyyz, etc.
(?+1) where +1 is a positive integer
Recursion of or subroutine call to a capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from left to right starting at the subroutine call.
(?+1)x([ab]) matches axa, axb, bxa, and bxb
\g<+1> where +1 is a positive integer
Recursion of or subroutine call to a capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from left to right starting at the subroutine call.
\g<+1>x([ab]) matches axa, axb, bxa, and bxb
\g'+1' where +1 is a positive integer
Recursion of or subroutine call to a capturing group that can be found by counting as many opening parentheses of named or numbered capturing groups as specified by the number from left to right starting at the subroutine call.
\g'+1'x([ab]) matches axa, axb, bxa, and bxb
(?&name) where “name” is the name of a capturing group
Recursion of a capturing group or subroutine call to a capturing group.
a(?<x>b(?&x)?y)z matches abyz, abbyyz, abbbyyyz, etc.
(?P>name) where “name” is the name of a capturing group
Recursion of a capturing group or subroutine call to a capturing group.
a(?P<x>b(?P>x)?y)z matches abyz, abbyyz, abbbyyyz, etc.
\g<name> where “name” is the name of a capturing group
Recursion of a capturing group or subroutine call to a capturing group.
a(?<x>b\g<x>?y)z matches abyz, abbyyz, abbbyyyz, etc.
\g'name' where “name” is the name of a capturing group
Recursion of a capturing group or subroutine call to a capturing group.
a(?'x'b\g'x'?y)z matches abyz, abbyyz, abbbyyyz, etc.
(?(DEFINE)regex) where “regex” is any regex
The DEFINE group does not take part in the matching process. Subroutine calls can be made to capturing groups inside the DEFINE group.
(?(DEFINE)([ab]))
x(?1)y(?1)z matches xayaz, xaybz, xbyaz, and xbybz
Recursion or subroutine call using Ruby-style \g syntax
When the regex engine exits from recursion or a subroutine call, it reverts all capturing groups to the text they had matched prior to entering the recursion or subroutine call.
When (a)(([bc])\1)\g'2' matches abaca the third group stores b after the match
Recursion or subroutine call using syntax other than \g
When the regex engine exits from recursion or a subroutine call, it reverts all capturing groups to the text they had matched prior to entering the recursion or subroutine call.
When (a)(([bc])\1)(?2) matches abaca the third group stores b after the match
Recursion or subroutine call using (?P>…)
Recursion and subroutine calls are atomic. Once the regex engine exits from them, it will not backtrack into it to try different permutations of the recursion or subroutine call.
(a+)(?P>1)(?P>1) can never match anything because the first (?P>1) matches all remaining a’s and the regex engine won’t backtrack into the first (?P>1) when the second one fails
Recursion of the whole regex using syntax other than (?P>0)
Recursion of the whole regex is atomic. Once the regex engine exits from recursion, it will not backtrack into it to try different permutations of the recursion.
aa$|a(?R)a|a matches a in aaa when recursion is atomic; otherwise it would match the whole string.
Subroutine call using syntax other than (?P>…)
Subroutine calls are atomic. Once the regex engine exits from them, it will not backtrack into it to try different permutations of the subroutine call.
(a+)(?1)(?1) can never match anything because the first (?1) matches all remaining a’s and the regex engine won’t backtrack into the first (?1) when the second one fails