From 278ab91df569d338973bae957974425dcdb698fe Mon Sep 17 00:00:00 2001
From: Matthias Lang This module contains functions for regular expression
- matching for strings and binaries. This module contains regular expression matching functions for
+ strings and binaries. The regular expression syntax and semantics resemble that of
- Perl. This library in many ways replaces the old regexp library
- written purely in Erlang, as it has a richer syntax as well as
- many more options. The library is also faster than the
- older regexp implementation. Although the library's matching algorithms are currently based
- on the PCRE library, it is not to be viewed as an Erlang to PCRE
- mapping. Only parts of the PCRE library is interfaced and the re
- library in some ways extend PCRE. The PCRE documentation contains
- many parts of no interest to the Erlang programmer, why only the
- relevant part of the documentation is included here. There should
- bee no need to go directly to the PCRE library documentation.
The library's matching algorithms are currently based on the + PCRE library, but not all of the PCRE library is interfaced and + some parts of the library go beyond what PCRE offers. The sections of + the PCRE documentation which are relevant to this module are included + here.
The Erlang literal syntax for strings give special - meaning to the "\" (backslash) character. To literally write - a regular expression or a replacement string containing a - backslash in your code or in the shell, two backslashes have to be written: - "\\".
+The Erlang literal syntax for strings uses the "\" + (backslash) character as an escape code. You need to escape + backslashes in literal strings, both in your code and in the shell, + with an additional backslash, i.e.: "\\".
unicode_binary() = binary() with characters encoded in UTF-8 coding standard
- unicode_char() = integer() representing valid unicode codepoint
+ unicode_char() = integer() representing a valid unicode codepoint
chardata() = charlist() | unicode_binary()
@@ -82,9 +77,9 @@
mp() = Opaque datatype containing a compiled regular expression.
- The mp() is guaranteed to be a tuple() having the atom
- 're_pattern' as it's first element, to allow for matching in
+ 're_pattern' as its first element, to allow for matching in
guards. The arity of the tuple() or the content of the other fields
- is however not to be trusted.
+ may change in future releases.
When compilation is involved, the exception
When compilation is involved, the exception
If the regular expression is previously compiled, the option
list can only contain the options
If the capture options describe that no substring capturing
at all is to be done (
A description of all the options relevant for execution follows:
+The options relevant for execution are:
Implements global (repetitive) search as the
Implements global (repetitive) search (the
When the regular expression matches an empty string, the
- behaviour might seem non-intuitive, why the behaviour requites
- some clarifying. With the global option,
The interaction of the global option with a regular
+ expression which matches an empty string surprises some users.
+ When the global option is given,
re:run("cat","(|at)",[global]).
- The matching will be performed as following:
+The following matching will be performed:
a?b?
is applied to a string not beginning with "a" or "b", it
- matches the empty string at the start of the subject. With
-
Perl has no direct equivalent of
The value list is a list of indexes for the subpatterns to return, where index 0 is for all of the pattern, and 1 is for the first explicit capturing subpattern in the regular expression, and so forth. When using named captured subpatterns (see below) in the regular expression, one can use
The value list is a list of indexes for the subpatterns to return, where index 0 is for all of the pattern, and 1 is for the first explicit capturing subpattern in the regular expression, and so forth. When using named captured subpatterns (see below) in the regular expression, one can use
".*(abcd).*"
matched against the string ""ABCabcdABC", capturing only the "abcd" part (the first explicit subpattern):
re:run("ABCabcdABC",".*(abcd).*",[{capture,[1]}]).
@@ -455,7 +450,7 @@ This option makes it possible to include comments inside complicated patterns. N
".*(?<FOO>abcd).*"
With this expression, we could still give the index of the subpattern with the following call:
re:run("ABCabcdABC",".*(?<FOO>abcd).*",[{capture,[1]}]).
- giving the same result as before. But as the subpattern is named, we can also give its name in the value list:
+giving the same result as before. But, since the subpattern is named, we can also specify its name in the value list:
re:run("ABCabcdABC",".*(?<FOO>abcd).*",[{capture,['FOO']}]).
which would yield the same result as the earlier examples, namely:
{match,[{3,4}]}
@@ -473,15 +468,15 @@ This option makes it possible to include comments inside complicated patterns. N
Optionally specifies how captured substrings are to be returned. If omitted, the default of
In general, subpatterns that got assigned no value in the match are returned as the tuple
In general, subpatterns that were not assigned a value in the match are returned as the tuple
".*((?<FOO>abdd)|a(..d)).*"
There are three explicitly capturing subpatterns, where the opening parenthesis position determines the order in the result, hence
"ABCabcdABC"
@@ -533,8 +528,8 @@ This option makes it possible to include comments inside complicated patterns. N
Replaces the matched part of the
Options are given as to the
Replaces the matched part of the
The permissible options are the same as for
The replacement string can contain the special character
To insert an
re:replace("abcd","c","[&]",[{return,list}]).
@@ -611,7 +606,7 @@ This option makes it possible to include comments inside complicated patterns. N
a Unicode The result is given as a list of "strings", the
preferred datatype given in the
Here the regular expression matched first the "l", causing "Er" to be the first part in the result. When the regular expression matched, the (only) subexpression was - bound to the "l", why the "l" is inserted + bound to the "l", so the "l" is inserted in the group together with "Er". The next match is of the "n", making "a" the next part to be - returned. As the subexpression is bound to the substring + returned. Since the subexpression is bound to the substring "n" in this case, the "n" is inserted into this group. The last group consists of the rest of the string, as no more matches are found.
By default, all parts of the string, including the empty - strings are returned from the function. As an example:
+ strings, are returned from the function. For example: re:split("Erlang","[lg]",[{return,list}]).
- The result will be:
+will return:
["Er","an",[]]
- as the matching of the "g" in the end of the string +
since the matching of the "g" in the end of the string leaves an empty rest which is also returned. This behaviour differs from the default behaviour of the split function in Perl, where empty strings at the end are by default removed. To @@ -701,10 +696,10 @@ This option makes it possible to include comments inside complicated patterns. N
Note that the last part is "ang", not
"an", as we only specified splitting into two parts,
- and the splitting stops when enough parts are given, why the
- result differs from that of
More than three parts are not possible with this indata, why
+More than three parts are not possible with this indata, so
re:split("Erlang","[lg]",[{return,list},{parts,4}]).
@@ -745,7 +740,7 @@ This option makes it possible to include comments inside complicated patterns. N
the parts of the string matching the subexpressions of the
regexp.
The return value from the function will in this case be a
-
The following sections contain reference material for the regular expressions used by this module. The regular expression - reference is taken from the PCRE documentation, but converted as - needed.
-The documentation is altered where appropriate and where the re - module behaves differently than the PCRE library.
+ reference is based on the PCRE documentation, with changes in + cases where the re module behaves differently to the PCRE library.