Computer literacy, assistance and repair

Javascript regular expressions character replacement. Regular Expressions

JavaScript regexp is an object type that is used to match sequences of characters in strings.

Creating the first regular expression

There are two ways to create a regular expression: using a regular expression literal or using a regular expression builder. Each of them represents the same pattern: the symbol " c", followed by " a" and then the symbol " t».

// regular expression literal is enclosed in slashes (/) var option1 = /cat/; // Regular expression constructor var option2 = new RegExp("cat");

As a general rule, if the regular expression is going to be constant, meaning it won't change, it's better to use a regular expression literal. If it will change or depend on other variables, it is better to use a method with a constructor.

RegExp.prototype.test() method

Remember when I said that regular expressions are objects? This means they have a range of methods. The simplest method is JavaScript regexp test which returns a boolean value:

True: The string contains a regular expression pattern.

False: No match found.

console.log(/cat/.test(“the cat says meow”)); // true console.log(/cat/.test(“the dog says bark”)); // incorrect

Regular Expression Basics Cheat Sheet

The secret of regular expressions is to remember common characters and groups. I highly recommend spending a few hours on the chart below and then coming back and studying further.

Symbols

  • . – (dot) matches any single character with the exception of line breaks;
  • *  –  matches the previous expression, which is repeated 0 or more times;
  • +  –  matches a previous expression that is repeated 1 or more times;
  • ? – the previous expression is optional ( matches 0 or 1 time);
  • ^ – corresponds to the beginning of the line;
  • $ – matches the end of the line.

Character groups

  • d– matches any single numeric character.
  • w– matches any character (number, letter or underscore).
  • [XYZ ]– a set of characters. Matches any single character from the set specified in parentheses. You can also specify character ranges, for example, .
  • [XYZ ]+– matches a character from a set that is repeated one or more times.
  • [^A—Z]– within a character set, “^” is used as a negation sign. IN in this example The pattern matches anything that is not an uppercase letter.

Flags:

There are five optional flags in JavaScript regexp. They can be used separately or together, and are placed after the closing slash. For example: /[ A —Z ]/g. Here I will show only two flags.

g– global search.

i– case-insensitive search.

Additional designs

(x)–  capturing parentheses. This expression matches x and remembers that match so you can use it later.

(?:x)– non-capturing parentheses. The expression matches x but does not remember the match.

Matches x only if it is followed by y.

Let's test the material we've studied

First, let's test all of the above. Let's say we want to check a string for any numbers. To do this, you can use the “d” construction.

console.log(/d/.test("12-34")); // right

The above code returns true if there is at least one digit in the string. What to do if you need to check a string for compliance with the format? You can use multiple "d" characters to define the format:

console.log(/dd-dd/.test("12-34")); //true console.log(/dd-dd/.test("1234")); //wrong

If you don't care how the numbers come before and after the "-" sign in JavaScript regexp online, you can use the "+" symbol to show that the "d" pattern occurs one or more times:

console.log(/d+-d+/.test("12-34")); // true console.log(/d+-d+/.test("1-234")); // true console.log(/d+-d+/.test("-34")); // incorrect

For simplicity, you can use parentheses to group expressions. Let's say we have a cat meowing and we want to check the pattern " meow"(meow):

console.log(/me+(ow)+w/.test("meeeeowowoww")); // right

Now let's figure it out.

m => match one letter ‘m’;

e + => match the letter "e" one or more times;

(ow) + => match the letters "ow" one or more times;

w => matches the letter ‘w’;

‘m’ + ‘eeee’ + ‘owowow’ + ‘w’.

When operators like "+" are used immediately after parentheses, they affect the entire contents of the parentheses.

Operator "? " It indicates that the previous character is optional. As you'll see below, both test cases return true because the "s" characters are marked as optional.

console.log(/cats? says?/i.test("the Cat says meow")); //true console.log(/cats? says?/i.test("the Cats say meow")); //right

If you want to find a slash character, you need to escape it using a backslash. The same is true for other characters that have special meaning, such as the question mark. Here's a JavaScript regexp example of how to look for them:

var slashSearch = ///; var questionSearch = /?/;

  • d is the same as : each construction corresponds to a digital symbol.
  • w– this is the same as [ A —Za —z 0-9_]: Both expressions match any single alphanumeric character or underscore.

Example: Adding Spaces to Camel Strings

In this example, we're really tired of the camel style of writing and we need a way to add spaces between words. Here's an example:

removeCc("camelCase") // => should return "camel Case"

There is a simple solution using a regular expression. First, we need to find all capital letters. This can be done using a character set lookup and a global modifier.

This matches the character "C" in "camelCase"

Now, how to add a space before "C"?

We need to use captivating parentheses! They allow you to find a match and remember it to use later! Use catching brackets to remember the capital letter you find:

You can access the captured value later like this:

Above we use $1 to access the captured value. By the way, if we had two sets of capturing parentheses, we would use $1 and $2 to refer to the captured values, and similarly for more capturing parentheses.

If you need to use parentheses but don't need to capture that value, you can use non-capturing parentheses: (?: x ). In this case, a match to x is found, but it is not remembered.

Let's return to the current task. How do we implement capturing parentheses? Using the JavaScript regexp replace method! We pass "$1" as the second argument. It is important to use quotation marks here.

function removeCc(str)( return str.replace(/()/g, "$1"); )

Let's look at the code again. We grab the uppercase letter and then replace it with the same letter. Inside the quotes, insert a space followed by the variable $1 . As a result, we get a space after each capital letter.

function removeCc(str)( return str.replace(/()/g, " $1"); ) removeCc("camelCase") // "camel Case" removeCc("helloWorldItIsMe") // "hello World It Is Me"

Example: removing capital letters

Now we have a string with a bunch of unnecessary capital letters. Have you figured out how to remove them? First, we need to select all capital letters. Then we search for a character set using the global modifier:

We'll use the replace method again, but how do we make the character lowercase this time?

function lowerCase(str)( return str.replace(//g, ???); )

Hint: In the replace() method, you can specify a function as the second parameter.

We will use an arrow function to avoid capturing the value of the match found. When using the function in JavaScript method regexp replace this function will be called after finding matches and the result of the function is used as a replacement string. Even better, if the match is global and multiple matches are found, the function will be called for each match found.

function lowerCase(str)( return str.replace(//g, (u) => u.toLowerCase()); ) lowerCase("camel Case") // "camel case" lowerCase("hello World It Is Me" ) // "hello world it is me"

Regular Expressions

Regular expression is an object that describes a character pattern. The RegExp class in JavaScript represents regular expressions, and the String and RegExp class objects define methods that use regular expressions to perform pattern matching and text search and replacement operations. Regular expression grammar in JavaScript contains a fairly complete subset of the regular expression syntax used in Perl 5, so if you have experience with the Perl language, you can easily describe patterns in JavaScript programs.

Features of Perl regular expressions that are not supported in ECMAScript include the s (single-line mode) and x (extended syntax) flags; escape sequences \a, \e, \l, \u, \L, \U, \E, \Q, \A, \Z, \z and \G and other extended constructs starting with (?.

Defining Regular Expressions

In JavaScript, regular expressions are represented by objects RegExp. RegExp objects can be created using the RegExp() constructor, but more often they are created using a special literal syntax. Just as string literals are specified as characters surrounded by quotation marks, regular expression literals are specified as characters surrounded by a pair of slash characters (/). So your JavaScript code might contain lines like this:

Var pattern = /s$/;

This line creates a new RegExp object and assigns it to the pattern variable. This RegExp object searches for any strings ending with an "s" character. The same regular expression can be defined using the RegExp() constructor:

Var pattern = new RegExp("s$");

A regular expression pattern specification consists of a sequence of characters. Most characters, including all alphanumeric ones, literally describe the characters that must be present. That is, the regular expression /java/ matches all lines containing the substring “java”.

Other characters in regular expressions are not intended to be used to find their exact equivalents, but rather have special meanings. For example, the regular expression /s$/ contains two characters. The first character s denotes a search for a literal character. Second, $ is a special metacharacter that marks the end of a line. So this regular expression matches any string ending with the character s.

The following sections describe the various characters and metacharacters used in regular expressions in JavaScript.

Literal characters

As noted earlier, all alphabetic characters and numbers in regular expressions match themselves. Regular expression syntax in JavaScript also supports the ability to specify certain non-alphabetic characters using escape sequences starting with a backslash (\) character. For example, the sequence \n matches the newline character. These symbols are listed in the table below:

Some punctuation marks have special meanings in regular expressions:

^ $ . * + ? = ! : | \ / () { } -

The meaning of these symbols is explained in the following sections. Some of them have special meaning only in certain regular expression contexts, while in other contexts they are interpreted literally. However, in general, to literally include any of these characters in a regular expression, you must precede it with a backslash character. Other characters, such as quotes and @, have no special meaning and simply match themselves in regular expressions.

If you can't remember exactly which characters should be preceded by a \, you can safely put a backslash in front of any of the characters. However, keep in mind that many letters and numbers take on special meanings when combined with the slash character, so the letters and numbers you are literally looking for should not be preceded by a \ character. To include the backslash character itself in a regular expression, you must obviously precede it with another backslash character. For example, the following regular expression matches any string that contains a backslash character: /\\/.

Character classes

Individual literal characters can be combined into character classes by enclosing them in square brackets. A character class matches any character contained in that class. Therefore, the regular expression // matches one of the characters a, b, or c.

Negative character classes can also be defined to match any character except those specified in parentheses. The negation character class is specified by the ^ character as the first character following the left parenthesis. The regular expression /[^abc]/ matches any character other than a, b, or c. In character classes, a range of characters can be specified using a hyphen. All lowercase Latin characters are found using the // expression, and any letter or number from the Latin character set can be found using the // expression.

Certain character classes are particularly common, so regular expression syntax in JavaScript includes special characters and escape sequences to represent them. Thus, \s matches space, tab, and any Unicode whitespace characters, and \S matches any non-Unicode whitespace characters.

The table below provides a list of these special characters and the syntax of the character classes. (Note that some of the character class escape sequences match only ASCII characters and are not extended to work with Unicode characters. You can explicitly define your own Unicode character classes, for example /[\u0400-\u04FF]/ matches any character Cyrillic alphabet.)

JavaScript Regular Expression Character Classes
Symbol Correspondence
[...] Any of the characters shown in parentheses
[^...] Any of the characters not listed in parentheses
. Any character other than a newline or other Unicode line delimiter
\w Any ASCII text character. Equivalent
\W Any character that is not an ASCII text character. Equivalent to [^a-zA-Z0-9_]
\s Any whitespace character from the Unicode set
\S Any non-whitespace character from the Unicode set. Please note that the characters \w and \S are not the same thing
\d Any ASCII numbers. Equivalent
\D Any character other than ASCII numbers. Equivalent to [^0-9]
[\b] Backspace character literal

Note that class special character escape sequences can be enclosed in square brackets. \s matches any whitespace character and \d matches any digit, hence /[\s\d]/ matches any whitespace character or digit.

Repetition

Given the knowledge of regular expression syntax gained so far, we can describe a two-digit number as /\d\d/ or a four-digit number as /\d\d\d\d/, but we cannot, for example, describe a number consisting of any number of digits, or a string of three letters followed by an optional digit. These more complex patterns use regular expression syntax, which specifies how many times a given regular expression element can be repeated.

Repeat symbols always follow the pattern to which they are applied. Some types of repetitions are used quite often, and special symbols are available to indicate these cases. For example, + matches one or more instances of the previous pattern. The following table provides a summary of the repetition syntax:

The following lines show several examples:

Var pattern = /\d(2,4)/; // Matches a number containing two to four digits pattern = /\w(3)\d?/; // Match exactly three word characters and one optional digit pattern = /\s+java\s+/; // Matches the word "java" with one or more spaces // before and after it pattern = /[^(]*/; // Matches zero or more characters other than the opening parenthesis

Be careful when using repetition characters * and ?. They can match the absence of a pattern specified before them and therefore the absence of characters. For example, the regular expression /a*/ matches the string "bbbb" because it does not contain the character a.

The repetition characters listed in the table represent the maximum possible number of repetitions that will allow subsequent parts of the regular expression to be matched. We say this is greedy repetition. It is also possible to implement repetition performed in a non-greedy manner. It is enough to indicate after the symbol (or symbols) the repetition question mark: ??, +?, *? or even (1.5)?.

For example, the regular expression /a+/ matches one or more instances of the letter a. Applied to the string "aaa", it matches all three letters. On the other hand, the expression /a+?/ matches one or more instances of the letter a and selects the smallest possible number of characters. Applied to the same string, this pattern matches only the first letter a.

“Greedless” repetition does not always give the expected result. Consider the pattern /a+b/, which matches one or more a's followed by b's. When applied to the string "aaab", it corresponds to the entire string.

Now let's check the "non-greedy" version of /a+?b/. One might think that it would match a b preceded by only one a. If applied to the same string, "aaab" would be expected to match the single character a and the last character b. However, this pattern actually matches the entire string, just like the greedy version. The fact is that a regular expression pattern search is performed by finding the first position in the string, starting from which a match becomes possible. Since a match is possible starting from the first character of the string, shorter matches starting from subsequent characters are not even considered.

Alternatives, Grouping and Links

Regular expression grammar includes special characters for defining alternatives, grouping subexpressions, and references to previous subexpressions. Pipe symbol | serves to separate alternatives. For example, /ab|cd|ef/ matches either the string "ab", or the string "cd", or the string "ef", and the pattern /\d(3)|(4)/ matches either three digits or four lowercase letters .

Note that alternatives are processed from left to right until a match is found. If a match is found with the left alternative, the right one is ignored, even if a “better” match can be achieved. Therefore, when the pattern /a|ab/ is applied to the string "ab", it will only match the first character.

Parentheses have multiple meanings in regular expressions. One of them is to group individual elements into one subexpression, so that the elements when using the special characters |, *, +, ? and others are considered as one whole. For example, the pattern /java(script)?/ matches the word "java" followed by the optional word "script", and /(ab|cd)+|ef)/ matches either the string "ef" or one or more repetitions of one from the strings "ab" or "cd".

Another use of parentheses in regular expressions is to define subpatterns within a pattern. When a regular expression match is found in the target string, the portion of the target string that matches any specific subpattern enclosed in parentheses can be extracted.

Suppose you want to find one or more lowercase letters followed by one or more numbers. To do this, you can use the template /+\d+/. But let's also assume that we only want the numbers at the end of each match. If we put this part of the pattern in parentheses (/+(\d+)/), we can extract numbers from any matches we find. How this is done will be described below.

A related use of parenthetical subexpressions is to refer to subexpressions from a previous part of the same regular expression. This is achieved by specifying one or more digits after the \ character. The numbers refer to the position of the parenthesized subexpression within the regular expression. For example, \1 refers to the first subexpression, and \3 refers to the third. Note that subexpressions can be nested within each other, so the position of the left parenthesis is used when counting. For example, in the following regular expression, a nested subexpression (cript) reference would look like \2:

/(ava(cript)?)\sis\s(fun\w*)/

A reference to a previous subexpression does not point to the pattern of that subexpression, but to the text found that matches that pattern. Therefore, references can be used to impose a constraint that selects parts of a string that contain exactly the same characters. For example, the following regular expression matches zero or more characters inside single or double quotes. However, it does not require that the opening and closing quotes match each other (that is, that both quotes be single or double):

/[""][^""]*[""]/

We can require quotation marks to match using a reference like this:

Here \1 matches the first subexpression. In this example, the link imposes a constraint that requires the closing quotation mark to match the opening quotation mark. This regular expression does not allow single quotes inside double quotes, and vice versa.

It is also possible to group elements in a regular expression without creating a numbered reference to those elements. Instead of simply grouping elements between ( and ), start the group with symbols (?: and end it with a symbol). Consider, for example, the following pattern:

/(ava(?:cript)?)\sis\s(fun\w*)/

Here the subexpression (?:cript) is only needed for grouping so that the repetition character ? can be applied to the group. These modified parentheses do not create a link, so in this regular expression, \2 refers to text that matches the pattern (fun\w*).

The following table lists the selection, grouping, and reference operators in regular expressions:

JavaScript selection, grouping, and link regular expression symbols
Symbol Meaning
| Alternative. Matches either the subexpression on the left or the subexpression on the right.
(...) Grouping. Groups elements into a single unit that can be used with the characters *, +, ?, | and so on. Also remembers characters matching this group for use in subsequent references.
(?:...) Only grouping. Groups elements into a single unit, but does not remember the characters corresponding to this group.
\number Matches the same characters that were found when matching group number number. Groups are subexpressions inside (possibly nested) parentheses. Group numbers are assigned by counting left parentheses from left to right. Groups formed using the symbols (?:) are not numbered.

Specifying a Match Position

As described earlier, many elements of a regular expression match a single character in a string. For example, \s matches a single whitespace character. Other regular expression elements match the positions between characters rather than the characters themselves. For example, \b matches a word boundary—the boundary between \w (an ASCII text character) and \W (a non-text character), or the boundary between an ASCII text character and the beginning or end of a line.

Elements such as \b do not specify any characters that must be present in the matched string, but they do specify valid positions for matching. These elements are sometimes called regular expression anchor elements because they anchor the pattern to a specific position in the string. The most commonly used anchor elements are ^ and $, which link patterns to the beginning and end of a line, respectively.

For example, the word "JavaScript" on its own line can be found using the regular expression /^JavaScript$/. To find the single word "Java" (rather than a prefix like "JavaScript"), you can try using the pattern /\sJava\s/, which requires a space before and after the word.

But such a solution raises two problems. First, it will only find the word "Java" if it is surrounded by spaces on both sides, and will not be able to find it at the beginning or end of the line. Secondly, when this pattern does match, the string it returns will contain leading and trailing spaces, which is not exactly what we want. So instead of using a pattern that matches whitespace characters \s, we'll use a pattern (or anchor) that matches word boundaries \b. The result will be the following expression: /\bJava\b/.

The anchor element \B matches a position that is not a word boundary. That is, the pattern /\Bcript/ will match the words “JavaScript” and “postscript” and will not match the words “script” or “Scripting”.

Arbitrary regular expressions can also serve as anchor conditions. If you place an expression between the characters (?= and), it becomes a forward match test against subsequent characters, requiring that those characters match the specified pattern but not be included in the match string.

For example, to match the name of a common programming language followed by a colon, you can use the expression /ava(cript)?(?=\:)/. This pattern matches the word "JavaScript" in the string "JavaScript: The Definitive Guide", but it will not match the word "Java" in the string "Java in a Nutshell" because it is not followed by a colon.

If you enter the condition (?!), then this will be a negative forward check for subsequent characters, requiring that the following characters do not match the specified pattern. For example, the pattern /Java(?!Script)(\w*)/ matches the substring “Java”, followed by a capital letter and any number of ASCII text characters, provided that the substring "Java" is not followed by the substring "Script" It will match the string "JavaBeans" but not the string "Javanese" will match the string "JavaScrip" ", but will not match the strings "JavaScript" or "JavaScripter".

The table below provides a list of regular expression anchor characters:

Regular expression anchor characters
Symbol Meaning
^ Matches the beginning of a string expression or the beginning of a line in a multiline search.
$ Matches the end of a string expression or the end of a line in a multiline search.
\b Matches a word boundary, i.e. matches the position between a \w character and a \W character, or between a \w character and the start or end of a line. (Note, however, that [\b] matches the backspace character.)
\B Matches a position that is not a word boundary.
(?=p) Positive lookahead check for subsequent characters. Requires subsequent characters to match the pattern p, but does not include those characters in the matched string.
(?!p) Negative forward check for subsequent characters. Requires that the following characters do not match the pattern p.

Flags

And one last element of regular expression grammar. Regular expression flags specify high-level pattern matching rules. Unlike the rest of regular expression grammar, flags are specified not between the slash characters, but after the second one. JavaScript supports three flags.

Flag i specifies that pattern matching should be case insensitive, and flag g- that the search should be global, i.e. all matches in the string must be found. Flag m performs a pattern search in multi-line mode. If the string expression being searched contains newlines, then in this mode the anchor characters ^ and $, in addition to matching the beginning and end of the entire string expression, also match the beginning and end of each text line. For example, the pattern /java$/im matches both “java” and “Java\nis fun”.

These flags can be combined in any combination. For example, to search for the first occurrence of the word "java" (or "Java", "JAVA", etc.) in a case-insensitive manner, you can use the case-insensitive regular expression /\bjava\b/i. And to find all occurrences of this word in a string, you can add the g flag: /\bjava\b/gi.

Methods of the String class for searching by pattern

Up to this point, we've discussed the grammar of the regular expressions we create, but we haven't looked at how those regular expressions can actually be used in JavaScript scripts. In this section, we will discuss methods of the String object that use regular expressions for pattern matching and search with replacement. And then we'll continue our conversation about pattern matching with regular expressions by looking at the RegExp object and its methods and properties.

Strings supports four methods using regular expressions. The simplest of them is the method search(). It takes a regular expression as an argument and returns either the position of the first character of the matched substring, or -1 if no match is found. For example, the following call will return 4:

Var result = "JavaScript".search(/script/i); // 4

If the argument to the search() method is not a regular expression, it is first converted by passing it to the RegExp constructor. The search() method does not support global search and ignores the g flag in its argument.

Method replace() performs a search and replace operation. It takes a regular expression as its first argument and a replacement string as its second. The method searches the line on which it is called for a match to the specified pattern.

If the regular expression contains the g flag, the replace() method replaces all matches found with the replacement string. Otherwise, it replaces only the first match found. If the replace() method's first argument is a string rather than a regular expression, then the method performs a literal search for the string rather than converting it to a regular expression using the RegExp() constructor as the search() method does.

As an example, we can use the replace() method to capitalize the word "JavaScript" consistently across an entire line of text:

// Regardless of the case of characters, we replace them with a word in the required case var result = "javascript".replace(/JavaScript/ig, "JavaScript");

The replace() method is more powerful than this example would suggest. Let me remind you that the subexpressions in parentheses within a regular expression are numbered from left to right, and that the regular expression remembers the text corresponding to each of the subexpressions. If the replacement string contains a $ sign followed by a number, the replace() method replaces those two characters with the text that matches the specified subexpression. This is a very useful feature. We can use it, for example, to replace straight quotes in a string with typographic quotes, which are simulated by ASCII characters:

// A quote is a quote followed by any number of characters // other than quotes (which we remember), followed by another quote // var quote = /"([^"]*)"/g; // Replace the straight quotes with typographic ones and leave "$1" unchanged // the contents of the quote stored in $1 var text = ""JavaScript" is an interpreted programming language."; var result = text.replace(quote, ""$1"") ; // "JavaScript" is an interpreted programming language.

An important thing to note is that the second argument to replace() can be a function that dynamically calculates the replacement string.

Method match() is the most general of the String class methods that use regular expressions. It takes a regular expression as its only argument (or converts its argument to a regular expression by passing it to the RegExp() constructor) and returns an array containing the search results. If the g flag is set in the regular expression, the method returns an array of all matches present in the string. For example:

// will return ["1", "2", "3"] var result = "1 plus 2 equals 3".match(/\d+/g);

If the regular expression does not contain the g flag, the match() method does not perform a global search; it just looks for the first match. However, match() returns an array even when the method does not perform a global search. In this case, the first element of the array is the substring found, and all remaining elements are subexpressions of the regular expression. Therefore, if match() returns an array arr, then arr will contain the entire string found, arr the substring corresponding to the first subexpression, etc. Drawing a parallel with the replace() method, we can say that the contents of $n are entered into arr[n].

For example, take a look at the following code that parses a URL:

Var url = /(\w+):\/\/([\w.]+)\/(\S*)/; var text = "Visit our website http://www..php"; var result = text.match(url); if (result != null) ( var fullurl = result; // Contains "http://www..php" var protocol = result; // Contains "http" var host = result; // Contains "www..php " )

It should be noted that for a regular expression that does not have the global search flag g set, the match() method returns the same value as the regular expression's exec() method: the returned array has the index and input properties, as described in the discussion of the exec( ) below.

The last of the String object methods that uses regular expressions is split(). This method splits the string on which it is called into an array of substrings, using the argument as a delimiter. For example:

"123,456,789".split(","); // Return ["123","456","789"]

The split() method can also take a regular expression as an argument. This makes the method more powerful. For example, you can specify a delimiter that allows an arbitrary number of whitespace characters on both sides:

"1, 2, 3 , 4 , 5".split(/\s*,\s*/); // Return ["1","2","3","4","5"]

RegExp object

As mentioned, regular expressions are represented as RegExp objects. In addition to the RegExp() constructor, RegExp objects support three methods and several properties.

The RegExp() constructor takes one or two string arguments and creates a new RegExp object. The first argument to the constructor is a string containing the body of the regular expression, i.e. text that must appear between slash characters in a regular expression literal. Note that string literals and regular expressions use the \ character to represent escape sequences, so when passing the regular expression as a string literal to the RegExp() constructor, you must replace each \ character with a pair of \\ characters.

The second argument to RegExp() may be missing. If specified, it defines the regular expression flags. It must be one of the characters g, i, m, or a combination of these characters. For example:

// Finds all five-digit numbers in a string. Note // the use of symbols in this example \\ var zipcode = new RegExp("\\d(5)", "g");

The RegExp() constructor is useful when the regular expression is generated dynamically and therefore cannot be represented using regular expression literal syntax. For example, to find a string entered by the user, you need to create a regular expression at runtime using RegExp().

RegExp Properties

Each RegExp object has five properties. Property source- a read-only string containing the text of the regular expression. Property global is a read-only boolean value that specifies whether the g flag is present in the regular expression. Property ignoreCase is a read-only boolean value that determines whether the i flag is present in the regular expression. Property multiline is a read-only boolean value that determines whether the m flag is present in the regular expression. And the last property lastIndex is an integer that can be read and written. For patterns with the g flag, this property contains the position number in the line at which the next search should begin. As described below, it is used by the exec() and test() methods.

RegExp Methods

RegExp objects define two methods that perform pattern matching; they behave similarly to the String class methods described above. The main method of the RegExp class used for pattern matching is exec(). It is similar to the previously mentioned match() method of the String class, except that it is a RegExp class method that takes a string as an argument, rather than a String class method that takes a RegExp argument.

The exec() method executes the regular expression for the specified string, i.e. looks for a match in a string. If no match is found, the method returns null. However, if a match is found, it returns the same array as the array returned by the match() method for searching without the g flag. The zero element of the array contains the string that matches the regular expression, and all subsequent elements contain substrings that match all subexpressions. In addition, the property index contains the position number of the character with which the corresponding fragment begins, and the property input refers to the line that was searched.

Unlike match(), the exec() method returns an array whose structure does not depend on the presence of the g flag in the regular expression. Let me remind you that when passing a global regular expression, the match() method returns an array of matches found. And exec() always returns one match, but provides complete information about it. When exec() is called on a regular expression containing the g flag, the method sets the lastIndex property of the regular expression object to the position number of the character immediately following the found substring.

When exec() is called a second time on the same regular expression, it begins the search at the character whose position is specified in the lastIndex property. If exec() does not find a match, the lastIndex property is set to 0. (You can also set lastIndex to zero at any time, which should be done in all cases where the search ends before the last match in a single row is found, and the search begins on another line with the same RegExp object.) This special behavior allows exec() to be called repeatedly to iterate over all matches of the regular expression in the line. For example:

Var pattern = /Java/g; var text = "JavaScript is more fun than Java!"; var result; while((result = pattern.exec(text)) != null) ( console.log("Found "" + result + """ + " at position " + result.index + "; next search will start at " + pattern .lastIndex); )

Another method of the RegExp object is test(), which is much simpler than the exec() method. It takes a string and returns true if the string matches the regular expression:

Var pattern = /java/i; pattern.test("JavaScript"); // Return true

Calling test() is equivalent to calling exec(), which returns true if exec() returns something other than null. For this reason, the test() method behaves in the same way as the exec() method when called on a global regular expression: it begins searching for the specified string at the position specified by the lastIndex property, and if it finds a match, sets the lastIndex property to the character position number directly next to the found match. Therefore, using the test() method, you can create a line traversal loop in the same way as using the exec() method.

The RegExp constructor creates a regular expression object for matching text with a pattern.

For an introduction to regular expressions, read the Regular Expressions chapter in the JavaScript Guide.

The source for this interactive example is stored in a GitHub repository. If you"d like to contribute to the interactive examples project, please clone https://github.com/mdn/interactive-examples and send us a pull request.

Syntax

Literal, constructor, and factory notations are possible:

/ pattern / flags new RegExp(pattern [, flags ]) RegExp(pattern [, flags ])

Parameters

pattern The text of the regular expression; or as of ES5, another RegExp object or literal (for the two RegExp constructor notations only). Patterns can include so they can match a wider range of values ​​than would a literal string. flags

If specified, flags is a string that contains the flags to add; or if an object is supplied for the pattern, the flags string will replace any of that object"s flags (and lastIndex will be reset to 0) (as of ES2015). If flags is not specified and a regular expressions object is supplied, that object"s flags (and lastIndex value) will be copied over.

flags may contain any combination of the following characters:

G global match; find all matches rather than stopping after the first match. i ignore case; if u flag is also enabled, use Unicode case folding. m multiline; treat beginning and end characters (^ and $) as working over multiple lines (i.e., match the beginning or end of each line (delimited by \n or \r), not only the very beginning or end of the whole input string). s "dotAll"; allows. to match newlines. u Unicode; treat pattern as a sequence of Unicode code points. (See also Binary strings). y sticky; matches only from the index indicated by the lastIndex property of this regular expression in the target string (and does not attempt to match from any later indexes).

Description

There are two ways to create a RegExp object: a literal notation and a constructor.

  • The literal notation"s parameters are enclosed between slashes and do not use quotation marks.
  • The constructor function"s parameters are not enclosed between slashes, but do use quotation marks.

The following expressions create the same regular expression:

/ab+c/i new RegExp(/ab+c/, "i") // literal notation new RegExp("ab+c", "i") // constructor

The literal notation provides a compilation of the regular expression when the expression is evaluated. Use literal notation when the regular expression will remain constant. For example, if you use literal notation to construct a regular expression used in a loop, the regular expression won't be recompiled on each iteration.

The constructor of the regular expression object, for example, new RegExp("ab+c") , provides runtime compilation of the regular expression. Use the constructor function when you know the regular expression pattern will be changing, or you don"t know the pattern and are getting it from another source, such as user input.

Starting with ECMAScript 6, new RegExp(/ab+c/, "i") no longer throws a TypeError ("can"t supply flags when constructing one RegExp from another") when the first argument is a RegExp and the second flags argument is present. A new RegExp from the arguments is created instead.

When using the constructor function, the normal string escape rules (preceding special characters with \ when included in a string) are necessary. For example, the following are equivalent:

Let re = /\w+/ let re = new RegExp("\\w+")

Properties

RegExp.prototype Allows the addition of properties to all objects. RegExp.length The value of RegExp.length is 2 . get RegExp[@@species] The constructor function that is used to create derived objects. RegExp.lastIndex The index at which to start the next match.

Methods

The global RegExp object has no methods of its own. However, it does inherit some methods through the prototype chain.

RegExp prototype objects and instances

Properties

Examples

Using a regular expression to change data format

let str = "#foo#" let regex = /foo/y regex.lastIndex = 1 regex.test(str) // true regex.lastIndex = 5 regex.test(str) // false (lastIndex is taken into account with sticky flag) regex.lastIndex // 0 (reset after match failure)

Regular expression and Unicode characters

As mentioned above, \w or \W only matches ASCII based characters; for example, a to z , A to Z , 0 to 9 , and _ .

To match characters from other languages ​​such as Cyrillic or Hebrew, use \u hhhh, where hhhh is the character"s Unicode value in hexadecimal. This example demonstrates how one can separate out Unicode characters from a word.

Let text = "Sample text in Russian" let regex = /[\u0400-\u04FF]+/g let match = regex.exec(text) console.log(match) // logs "Sample" console.log(regex .lastIndex) // logs "7" let match2 = regex.exec(text) console.log(match2) // logs "on" console.log(regex.lastIndex) // logs "15" // and so on

Extracting sub-domain name from URL

let url = "http://xxx.domain.com" console.log(/[^.]+/.exec(url).substr(7)) // logs "xxx"

Specifications

Specification Status Comment
ECMAScript 3rd Edition (ECMA-262) Standard Initial definition. Implemented in JavaScript 1.1.
ECMAScript 5.1 (ECMA-262)
Standard
ECMAScript 2015 (6th Edition, ECMA-262)
The definition of "RegExp" in that specification.
Standard The RegExp constructor no longer throws when the first argument is a RegExp and the second argument is present. Introduces Unicode and sticky flags.
ECMAScript Latest Draft (ECMA-262)
The definition of "RegExp" in that specification.
Draft

Browser compatibility

The compatibility table on this page is generated from structured data. If you"d like to contribute to the data, please check out https://github.com/mdn/browser-compat-data and send us a pull request.

Regular Expressions is a language that describes string patterns based on metacharacters. A metacharacter is a character in a regular expression that describes some class of characters in a string, indicates the position of a substring, indicates the number of repetitions, or groups characters into a substring. For example, the metacharacter \d describes digits, and $ denotes the end of a line. A regular expression can also contain ordinary characters that describe themselves. The set and meaning of metacharacters in regular expressions is described by the PCRE standard, most of whose features are supported in JS.

Scope of regular expressions

Regular expressions are typically used for the following tasks:

  • Comparison. The goal of this task will be to find out whether a certain text matches a given regular expression.
  • Search. Using regular expressions, it is convenient to find the corresponding substrings and extract them from the text.
  • Replacement. Regular expressions often help not only to find, but also to replace a substring in the text that matches the regular expression.

Ultimately, using regular expressions you can, for example:

  • Check that the user data in the form is filled out correctly.
  • Find a link to an image in the text entered by the user so that it can be automatically attached to the message.
  • Remove html tags from the text.
  • Check code before compilation for simple syntax errors.

Features of regular expressions in JS. Regular Expression Literals

The main feature of regular expressions in JS is that there is a separate type of literal for them. Just as string literals are surrounded by quotation marks, regular expression literals are surrounded by slashes (/). Thus, JS code can contain expressions like:

console.log(typeof /tcoder/); // object

In fact, the regular expression that is defined in the line

var pattern = new RegExp("tcoder");

This creation method is usually used when you need to use variables in a regular expression, or create a regular expression dynamically. In all other cases, regular expression literals are used due to the shorter syntax and the absence of the need to additionally escape some characters.

Characters in regular expressions

All alphanumeric characters in regular expressions are not metacharacters and describe themselves. This means that the regular expression /tcoder/ will match the substring tcoder. In regular expressions, you can also specify non-alphabetic characters, such as line feed (\n), tab (\t) and so on. All these symbols also correspond to themselves. Preceding an alphabetic character with a backslash (\) will make it a metacharacter, if there is one. For example, the alphabetic character "d" will become a metacharacter describing numbers if it is preceded by a slash (\d).

Character classes

Single characters in regular expressions can be grouped into classes using square brackets. The class created in this way corresponds to any of the symbols included in it. For example, the regular expression // the letters “t”, “c”, “o”, “d”, “e”, “r” will correspond.

In classes you can also specify a range of characters using a hyphen. For example, a class corresponds to a class. Note that some metacharacters in regular expressions already describe character classes. For example, the \d metacharacter is equivalent to the class . Note that metacharacters describing character classes can also be included in the classes. For example, the class [\da-f] corresponds to the numbers and letters “a”, “b”, “d”, “e”, “f”, that is, any hexadecimal character.

It is also possible to describe a character class by specifying characters that should not be included in it. This is done using the metacharacter ^. For example, the class [^\d] will match any character other than a number.

Repetitions

Now we can describe, say, a decimal number of any given length, simply by writing in a row as many \d metacharacters as there are digits in this number. Agree that this approach is not very convenient. In addition, we cannot describe the range of required repetitions. For example, we cannot describe a number with one or two digits. Fortunately, regular expressions provide the ability to describe repetition ranges using metacharacters. To do this, after the symbol, simply indicate the range of repetitions in curly braces. For example, the regular expression /tco(1, 3)der/ the strings "tcoder", "tcooder" and "tcoooder" will match. If you omit the maximum number of repetitions, leaving a comma and a minimum number of repetitions, you can specify a number of repetitions greater than the specified one. For example, the regular expression /bo(2,)bs/ will match the strings “boobs”, “booobs”, “boooobs” and so on with any number of “o” letters, at least two.

If you omit the comma in the curly brackets and simply indicate one number, then it will indicate the exact number of repetitions. For example, the regular expression /\d(5)/ correspond to five-digit numbers.

Some repetition ranges are used quite often and have their own metacharacters to denote them.

Greedy repetitions

The above syntax describes the maximum number of repetitions, that is, from all possible numbers of repetitions, the number of which lies in the specified range, the maximum is selected. Such repetitions are called greedy. This means that the regular expression /\d+/ in the string yeah!!111 will match the substring “111”, not “11” or “1”, although the metacharacter “+” describes one or more repetitions.

If you want to implement non-greedy repetition, that is, select the minimum possible number of repetitions from the specified range, then simply put the “?” after the rep range. For example, the regular expression /\d+?/ in the string “yeah!!111” the substring “1” will match, and the regular expression /\d(2,)/ in the same line the substring “11” will match.

It is worth paying attention to an important feature of non-greedy repetition. Consider the regular expression /bo(2,)?bs/. In the line “i like big boooobs” it will be matched, as with greedy repetition, by the substring boooobs, and not boobs, as one might think. The fact is that a regular expression cannot match several substrings located in different places in the line in one match. That is, our regular expression cannot match the substrings “boo” and “bs” merged into one line.

Alternatives

In regular expressions, you can also use alternatives - to describe a set of strings that matches either one or the other part of the regular expression. Such parts are called alternatives and are separated by a vertical line. For example, the regular expression /two|twice|\2/ either the substring “two”, or the substring “twice”, or the substring “2” can match. The chain of alternatives is processed from left to right until the first match and can only be matched by a substring that is described by only one alternative. For example, the regular expression /java|script/ in the string “I like javascript” only the substring “java” will match.

Groups

To treat multiple characters as a single unit when using repetition ranges, character classes, and everything in between, simply put them in parentheses. For example, the regular expression /true(coder)?/ the strings "truecoder" and "true" will match.

Links

In addition to the fact that parentheses combine characters in a regular expression into a single whole, the corresponding substring can be referenced by simply specifying after the slash the number of the left parenthesis from the pair of parentheses framing it. Brackets are numbered from left to right starting with one. For example, in a regular expression /(one(two)(three))(four)/\1 refers to one, \2 to "two", \3 to "three", \4 to "four". As an example of using such links, we give a regular expression /(\d)\1/, which corresponds to two-digit numbers with the same digits. An important limitation of using backlinks is the impossibility of using them in classes, that is, for example, describing a two-digit number with different digits using a regular expression /(\d)[^\1]/ it is forbidden.

Unmemorable parentheses

Often you just want to group the symbols, but there is no need to create a link. In this case, you can write ?: immediately after the left grouping bracket. For example, in a regular expression /(one)(?:two)(three)/\2 will indicate "three".

Such parentheses are sometimes called non-remembering. They have another important feature, which we will talk about in the next lesson.

Specifying a position

In regular expressions, there are also metacharacters that indicate a certain position in the string. The most commonly used symbols are ^ and $, indicating the beginning and end of a line. For example, the regular expression /\..+$/ extensions in file names will match, and the regular expression /^\d/ the first digit in the line, if there is one.

Positive and negative forward checks

Using regular expressions, you can also describe a substring that is followed or not followed by a substring described by another pattern. For example, we need to find the word java only if it is followed by “script”. This problem can be solved using a regular expression /java(?=script)/. If we need to describe the substring “java” that is not followed by script, we can use a regular expression /java(?!script)/.

Let's collect everything we talked about above into one table.

Symbol Meaning
a|b Matches either a or i.
(…) Grouping brackets. You can also refer to the substring corresponding to the pattern in brackets.
(?:…) Only grouping, without the ability to link.
\n Link to a substring matching the nth pattern.
^ The beginning of the input data or the beginning of the line.
$ End of input or end of line.
a(?=b) Matches the substring described by pattern a only if it is followed by the substring described by pattern b.
a(?!b) Matches the substring described by pattern a only if followed by Not follows the substring described by pattern b.

Flags

And finally, the last element of regular expression syntax. Flags specify matching rules that apply to the entire regular expression. Unlike all other elements in regular expression syntax, they are written immediately after the regular expression literal, or passed in line as the second parameter to the object's constructor RegExp.

There are only three regular expression flags in JavaScript:

i– when specifying this flag, case is not taken into account, that is, for example, a regular expression \javascript\i will match the strings "javascript", "JavaScript", "JAVASCRIPT", "jAvAScript", etc.

m– this flag enables multi-line search. This means that if the text contains line feed characters and this flag is set, then the symbols ^ and $, in addition to the beginning and end of the entire text, will also correspond to the beginning and end of each line in the text. For example, the regular expression /line$/m matches the substring “line”, both in the string “first line” and in the string “one\nsecond line\ntwo”.

g– enables a global search, that is, a regular expression, if this flag is enabled, will match all substrings that match it, and not just the first, as is the case if this flag is not present.

Flags can be combined with each other in any order, that is \tcoder\mig, \tcoder\gim, \tocder\gmi etc., it's the same thing. The order of the flags also does not matter if they are passed in a line as the second argument to the object constructor RegExp, that is new RegExp("tcoder", "im") And new RegExp("tcoder", "im") just the same thing.

ZY

Regular expressions are a very powerful and convenient tool for working with strings, allowing you to reduce hundreds of lines of code into a single expression. Unfortunately, their syntax is sometimes too complex and difficult to read, and even the most experienced developer can forget what a rather complex regular expression he wrote a couple of days ago meant if he did not comment on it. For these reasons, sometimes it is still worth abandoning regular expressions in favor of regular methods for working with strings.

Regular Expressions allow you to perform a flexible search for words and expressions in texts in order to delete, extract or replace them.

Syntax:

//First option for creating a regular expression var regexp=new RegExp( sample,modifiers); //Second option for creating a regular expression var regexp=/ sample/modifiers;

sample allows you to specify a character pattern for the search.

modifiers allow you to customize search behavior:

  • i- search without taking into account the case of letters;
  • g- global search (all matches in the document will be found, not just the first);
  • m- multi-line search.

Search for words and expressions

The simplest use of regular expressions is to search for words and expressions in various texts.

Here is an example of using search using modifiers:

//Set the regular expression rv1 rv1=/Russia/; //Specify the regular expression rv2 rv2=/Russia/g; //Specify the regular expression rv3 rv3=/Russia/ig; //Bold indicates where matches will be found in the text when using //the expression rv1: Russia is the largest state in the world. Russia borders on 18 countries. RUSSIA is a successor state of the USSR. //Bold indicates where matches will be found in the text when using //the expression rv2: Russia is the largest state in the world. Russia borders on 18 countries. RUSSIA is a successor state of the USSR."; //Bold font indicates where matches will be found in the text when using //the expression rv3: Russia is the largest state in the world. Russia borders on 18 countries. RUSSIA is a successor state of the USSR.";

Special symbols

In addition to regular characters, regular expression patterns can use Special symbols(metacharacters). Special characters with descriptions are shown in the table below:

Special character Description
. Matches any character except the end of line character.
\w Matches any alphabetic character.
\W Matches any non-alphabetic character.
\d Matches characters that are numbers.
\D Matches characters that are not numbers.
\s Matches whitespace characters.
\S Matches non-whitespace characters.
\b Matches will only be found at word boundaries (beginning or ending).
\B Matches will be searched only on non-word boundaries.
\n Matches the newline character.

/* The reg1 expression will find all words starting with two arbitrary letters and ending with "vet". Since the words in the sentence are separated by a space, we will add a special character \s at the beginning and at the end */ reg1=/\s..vet\s/g; txt="hello covenant corduroy closet"; document.write(txt.match(reg1) + "
"); /* The reg2 expression will find all words starting with three arbitrary letters and ending with "vet" */ reg2=/\s...vet\s/g; document.write(txt.match(reg2) + "
"); txt1=" hi2hello hi 1hello "; /* The reg3 expression will find all words that start with "at" followed by 1 digit and end with "vet" */ var reg3=/at\dvet/g; document .write(txt1.match(reg3) + "
"); // The expression reg4 will find all the numbers in the text var reg4=/\d/g; txt2="5 years of study, 3 years of sailing, 9 years of shooting." document.write(txt2.match(reg4) + "
");

Quick view

Symbols in square brackets

Using square brackets [keyu] You can specify a group of characters to search for.

The ^ character before a group of characters in square brackets [^kwg] indicates that you need to search for all characters of the alphabet except the specified ones.

Using a dash (-) between characters in square brackets [a-z] You can specify a range of characters to search for.

You can also search for numbers using square brackets.

//Set the regular expression reg1 reg1=/\sko[tdm]\s/g; //Set a text string txt1 txt1=" cat braid code chest of drawers com carpet "; //Using the regular expression reg1, search for the string txt1 document.write(txt1.match(reg1) + "
"); reg2=/\sslo[^tg]/g; txt2=" slot elephant syllable "; document.write(txt2.match(reg2) + "
"); reg3=//g; txt3="5 years of study, 3 years of swimming, 9 years of shooting"; document.write(txt3.match(reg3));

Quick view

Quantifiers

Quantifier- this is a construction that allows you to specify how many times the preceding character or group of characters should appear in a match.

Syntax:

//Preceding character must occur x - times (x)//The preceding character must occur from x to y times inclusive (x,y)//The preceding character must appear at least x times (x,)//Specifies that the preceding character must occur 0 or more times * //Specifies that the preceding character must occur 1 or more times + //Specifies that the preceding character must occur 0 or 1 time ?


//Specify the regular expression rv1 rv1=/ko(5)shka/g //Specify the regular expression rv2 rv2=/ko(3,)shka/g //Specify the regular expression rv3 rv3=/ko+shka/g //Specify regular expression rv4 rv4=/ko?shka/g //Set the regular expression rv5 rv5=/ko*shka/g //Bold font shows where in the text matches will be found when using //the expression rv1: kshka cat kooshka koooshka kooooshka kooooshka kooooooshka kooooooshka //Bold indicates where in the text matches will be found when using //the rv2 expression: kshka cat kooshka kooooshka kooooshka kooooooshka kooooooshka kooooooshka//Bold indicates where in the text matches will be found when using //the expression rv3: kshka cat kooshka kooooshka kooooshka kooooshka kooooooshka kooooooshka//Bold indicates where in the text matches will be found when using //the rv4 expression: kshka cat kooshka koooshka kooooshka koooooshka koooooshka kooooooshka //Bold indicates where in the text matches will be found when using //the rv5 expression: kshka cat kooshka kooooshka kooooshka kooooshka kooooooshka kooooooshka

Note: if you want to use any special character(such as . * + ? or ()) as usual you should put \ in front of it.

Using parentheses

By enclosing part of a regular expression pattern in parentheses, you tell the expression to remember the match found by that part of the pattern. The saved match can be used later in your code.

For example, the regular expression /(Dmitry)\sVasiliev/ will find the string “Dmitry Vasiliev” and remember the substring “Dmitry”.

In the example below, we use the replace() method to change the order of words in the text. We use $1 and $2 to access stored matches.

Var regexp = /(Dmitry)\s(Vasiliev)/; var text = "Dmitry Vasiliev"; var newtext = text.replace(regexp, "$2 $1"); document.write(newtext);

Quick view

Parentheses can be used to group characters before quantifiers.

Related publications