Age | Commit message (Collapse) | Author |
|
Fix bug with unrecognised 'unicode' option in re:split/2,3 &
re:replace/3,4 when using pre-compiled regex.
|
|
|
|
The line
%-opaque mp() :: {re_pattern, _, _, _, _}.
has been removed.
The mp() tuple is called 'opaque' in re(3), but it is not an opaque
type. The out-commented -opaque declaration was confusing.
|
|
Added to re:run and sets the corresponding fields in 'extra' struct
for the PCRE match engine. The result can be viewed by also
setting 'report_errors' when matching.
Some housekeeping was also done...
The offset option also did not properly check for offset's >= 0.
Change nomatch to BADARG when pre-compiled mp() is faked:
By constructing a 5-tuple with faked content but the right data types,
you could do a re:run which returned nomatch when in fact the mp() was
bad. The cheapest solution is to check the return from pcre_exec
better.
Remove unreachable code in erts_bif_re.c:
Replaced tests for things that logically simply
cannot happen with ASSERT.
|
|
|
|
The following compile options are documented:
no_start_optimize
ucp
never_utf
The following run options are documented:
notempty_atstart
{capture, all_names}
The following new functions are documented:
re:inspect/2
|
|
Add notempty_atstart, no_start_optimize, ucp and
never_utf options from new PCRE version.
Use the new notempty_atstart in global matching.
Add inspect/2 function
Correctly handle dupnames when capturing a name, as
in Perl, get the leftmost matching occurence.
Also added all_names, to get all the names in the pattern
in alphabetical (name) order.
To be able to use this in global matching, an inspect
function that can dig out a namelist was added.
|
|
|
|
* vs/re_back_reference:
extend re back reference syntax with \g escape sequence
OTP-10455
|
|
Add the \gN and \g{N} syntax for back references in re:replace/3,4
to allow use with numeric replacement strings.
|
|
|
|
|
|
This commit is a preparation for introducing location information
(filename/line number) in stacktraces in exceptions. Currently
a stack trace looks like:
[{Mod1,Function1,Arity1},
.
.
.
{ModN,FunctionN,ArityN}]
Add a forth element to each tuple that can be used indication
the filename and line number of the source file:
[{Mod1,Function1,Arity1,Location1},
.
.
.
{ModN,FunctionN,ArityN,LocationN}]
In this commit, the fourth element will just be an empty list,
and we will change all code that look at or manipulate stacktraces.
|
|
|
|
|
|
|
|
|
|
* rb/stdlib_re_unicode_fixes:
Fix lost unicode option in re:compile()
Refactor out repeated block in re module
Fix re:replace/4 to handle unicode charlist Replacement argument
Fix re:replace/4 to handle unicode charlist RE argument
Fix re:replace/4 to handle binary unicode output when nothing replaced
OTP-8394 A number of bugs concerning re and unicode are corrected:
- re:compile no longer loses unicode option, which also fixes bug
in re:split.
- re:replace now handles unicode charlist replacement argument
- re:replace now handles unicode RE charlist argument correctly
- re:replace now handles binary unicode output correctly when
nothing is replaced.
Most code, testcases and error isolation done by Rory Byrne.
|
|
Noticed-by: Rory Byrne
|
|
|
|
A bug in re:replace/4 causes a badarg exception to be thrown when the
Replacement argument is a charlist containing non-ascii codepoints.
The problem is that the code incorrectly assumes that the Replacement
text is iodata() and calls iolist_to_binary/1 on it. This patch fixes
it to obey the 'unicode' option and handle charlist() Replacement
arguments correctly.
|
|
A bug with re:replace/4 causes an exception when: (a) it's given a
unicode charlist as input; (b) it's set to {return,binary}; and
(c) it finds nothing to replace.
The problem is: when re:replace/4 does not find anything to replace
in its Subject input, it calls iolist_to_binary on this data. This
fails if the original input is a charlist with non-ascii codepoints.
|
|
|