Special Characters:
Dot . |
The dot . matches any single character including letters, digits, spaces, and so on. The only character it is unable to match with is the newline character\\n . |
---|---|
Question mark ? |
Means that the preceding character may be present or absent. |
Backslash \\ |
Escape character, is used to separate key characters from ordinary ones. |
Set of chars [abc] |
Each set corresponds to a single character in the string, but what character exactly it can be is defined by the content of the set. The set is written in square brackets, []. For example, the set "[abc] " means that a single character "a", "b" or "c" can match it. |
Range of chars [0-9] |
You can specify the range designated by the dash symbol -. The character that precedes the dash denotes the starting point of the range, the character that follows it is the last character that falls into the range. [0-9] - this is a set of all digits from 0 to 9. Any value from the set can match. |
Excluding characters in sets [^abc] |
Allow you to define the set of chars that you don’t want. [^abc] - a, b, c - are excluded from permissible values. |
Excluding characters in ranges [^a-c] |
Allow you to define the range of chars that you don’t want. [^a-c] - a, b, c - are excluded from permissible values. |
Alternations ` | ` |
Shorthands - pre-defined shorthands for the commonly used character sets:
\\d = [0-9] |
|
---|---|
\\s = [\\t\\n\\x0B\\f\\r] |
a whitespace character (including tab and newline) |
\\w = [a-zA-Z_0-9] |
is an alphanumeric character (word) |
\\b |
is a word boundary. This one is a bit trickier: it doesn't match any specific character, it rather matches a boundary between an alphanumeric character and a non-alphanumeric character (for example, a white space character) or a boundary of the string (the end or the start of it). This way, "\\ba " matches all words (sequences of alphanumeric characters) starting with "a", "a\\b " matches all words ending with "a", and "\\ba\\b " matches all separate "a" preceded and followed by non-alphanumeric characters. |
\\D = [^0-9] |
is a non-digit |
\\S = [^ \\t\\n\\x0B\\f\\r] |
is a non-whitespace character |
\\W = [^a-zA-Z_0-9] |
is a non-alphanumeric character |
\\B |
is a non-word boundary. It matches the situation opposite to that one of the \\b shorthand: it finds its match every time whenever there is no "gap" between alphanumeric characters. For example, "a\\B " matches all words that start with "a". |
Quantifiers - defines how often another character can occur in a regex pattern:
+ |
matches one or more repetitions of the preceding character |
---|---|
* |
matches zero or more repetitions of the preceding character |
{n} |
matches exactly n repetitions of the preceding character |
{n,m} |
matches at least n but not more than m repetitions of the preceding character. It is important not to include a space after the comma, otherwise it will not work. |
{n,} |
matches at least n repetitions of the preceding character |
{0,m} |
matches no more than m repetitions of the preceding character |
? or {0,1} |
makes the preceding character optional |
Additional useful links about regexp:
Other links: