README
A parser for a simplified, regular subset of JavaScript regular expressions that doesn’t support capturing.
Because it’s regular, the subset doesn’t support:
- backreferences
Because it doesn’t support capturing, it doesn’t support:
- capturing groups (
(…)
unless(?:…)
) - greediness modifiers
Because it’s simplified, it doesn’t support:
- assertions (anchors, word boundaries,
(?=…)
, and(?!…)
) - escapes with easy alternatives that are obscure (
\cX
), uncommon (\f
,\v
), or syntactically awkward (\0
) - escapes that aren’t necessary in any context
\s
and\S
(what they match is not obvious)
Syntax
When syntactically valid, a pattern has the same meaning as it does in JavaScript (i.e. when passed to the RegExp
constructor) with no flags.
pattern = disjunction
disjunction = alternative [ "|" disjunction ]
alternative = *term
term = atom [ quantifier ]
quantifier =
"*" / ; zero or more
"+" / ; one or more
"?" / ; zero or one
"{" *DIGIT "}" / ; exactly count. counts are at most Number.MAX_SAFE_INTEGER.
"{" *DIGIT ",}" / ; at least count
"{" *DIGIT "," *DIGIT "}" ; at least first count and at most second. must be a non-empty range.
atom =
pattern-character / ; the character itself
"." / ; any character except CR, LF, U+2028, and U+2029
"\" atom-escape /
character-class /
"(?:" pattern ")"
character-class =
"["
[ "^" ] ; indicates a negated character class
*range
"]"
range =
range-character "-" range-character / ; must be a non-empty range
range-character /
"\" predefined-range
range-character =
range-plain-character /
"\" range-escape
character-escape =
"n" / ; LF
"r" / ; CR
"t" / ; tab
"x" 2hex-digit /
"u" 4hex-digit
predefined-range =
"d" / "D" / ; [0-9], [^0-9]
"w" / "W" ; [0-9A-Za-z_], [^…]
atom-escape =
character-escape /
predefined-range /
pattern-metacharacter /
"/"
range-escape =
character-escape /
range-metacharacter /
"/" /
"["
range-metacharacter =
"^" / "\" / "-" / "]"
pattern-metacharacter =
"^" / "