Age | Commit message (Collapse) | Author |
|
|
|
|
|
Added to re:run and sets the corresponding fields in 'extra' struct
for the PCRE match engine. The result can be viewed by also
setting 'report_errors' when matching.
Some housekeeping was also done...
The offset option also did not properly check for offset's >= 0.
Change nomatch to BADARG when pre-compiled mp() is faked:
By constructing a 5-tuple with faked content but the right data types,
you could do a re:run which returned nomatch when in fact the mp() was
bad. The cheapest solution is to check the return from pcre_exec
better.
Remove unreachable code in erts_bif_re.c:
Replaced tests for things that logically simply
cannot happen with ASSERT.
|
|
|
|
Add notempty_atstart, no_start_optimize, ucp and
never_utf options from new PCRE version.
Use the new notempty_atstart in global matching.
Add inspect/2 function
Correctly handle dupnames when capturing a name, as
in Perl, get the leftmost matching occurence.
Also added all_names, to get all the names in the pattern
in alphabetical (name) order.
To be able to use this in global matching, an inspect
function that can dig out a namelist was added.
|
|
The relevant testoutputNN files were copied from the
PCRE distribution and some corrections were done to
run_pcre_tests.erl.
Also made test generator be more compiler friendly
The re_testoutput1_replacement_test and
re_testoutput1_split_test modules that are
generated by run_pcre_tests.erl (offline,
when a new version of PCRE is integrated in the VM)
took forever to compile, as one single huge function
contained all the tests. The autogenerated tests are now
split into ~50 functions, which reduces compile time to
approximately a third.
New automatic test suites are also generated from the
new testoutputNN files, and checked in.
|
|
|
|
|
|
Add the \gN and \g{N} syntax for back references in re:replace/3,4
to allow use with numeric replacement strings.
|
|
|
|
|
|
This commit is a preparation for introducing location information
(filename/line number) in stacktraces in exceptions. Currently
a stack trace looks like:
[{Mod1,Function1,Arity1},
.
.
.
{ModN,FunctionN,ArityN}]
Add a forth element to each tuple that can be used indication
the filename and line number of the source file:
[{Mod1,Function1,Arity1,Location1},
.
.
.
{ModN,FunctionN,ArityN,LocationN}]
In this commit, the fourth element will just be an empty list,
and we will change all code that look at or manipulate stacktraces.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The patch is from:
http://vcs.pcre.org/viewvc?revision=360&view=revision
Test case:
re:compile(<<"(?i)[\xc3\xa9\xc3\xbd]|[\xc3\xa9\xc3\xbdA]">>, [unicode]).
An option change at the start of a pattern that had top-level
alternatives could cause overwriting and/or a crash.
This potential security problem was recorded as CVE-2008-2371.
|
|
* rb/stdlib_re_unicode_fixes:
Fix lost unicode option in re:compile()
Refactor out repeated block in re module
Fix re:replace/4 to handle unicode charlist Replacement argument
Fix re:replace/4 to handle unicode charlist RE argument
Fix re:replace/4 to handle binary unicode output when nothing replaced
OTP-8394 A number of bugs concerning re and unicode are corrected:
- re:compile no longer loses unicode option, which also fixes bug
in re:split.
- re:replace now handles unicode charlist replacement argument
- re:replace now handles unicode RE charlist argument correctly
- re:replace now handles binary unicode output correctly when
nothing is replaced.
Most code, testcases and error isolation done by Rory Byrne.
|
|
A bug in re:replace/4 causes a badarg exception to be thrown when the
Replacement argument is a charlist containing non-ascii codepoints.
The problem is that the code incorrectly assumes that the Replacement
text is iodata() and calls iolist_to_binary/1 on it. This patch fixes
it to obey the 'unicode' option and handle charlist() Replacement
arguments correctly.
|
|
The real problem is in the re:run/3 BIF.
Noticed-by: Rory Byrne
Tests-by: Rory Byrne
|
|
A bug with re:replace/4 causes an exception when: (a) it's given a
unicode charlist as input; (b) it's set to {return,binary}; and
(c) it finds nothing to replace.
The problem is: when re:replace/4 does not find anything to replace
in its Subject input, it calls iolist_to_binary on this data. This
fails if the original input is a charlist with non-ascii codepoints.
|
|
|