Age | Commit message (Collapse) | Author |
|
Avoid unicode_util module call for ASCII strings
|
|
Do not invoke the internal string:lenght/1 function
when the list length is wanted.
Fixes backwards compatibility for old string functions.
|
|
Prior to this patch, the normalization functions in the
unicode module would raise a function clause error for
non-utf8 binaries.
This patch changes it so it returns {error, SoFar, Invalid}
as characters_to_binary and characters_to_list does in
the unicode module.
Note string:next_codepoint/1 and string:next_grapheme had
to be changed accordingly and also return an error tuple.
|
|
|
|
|
|
Works with unicode:chardata() as input as was decided on OTP board
meeting as response to EEP-35 a long time ago.
Works on graphemes clusters as base, with a few exceptions, does not
handle classic (nor nfd'ified) Hangul nor the extended grapheme
clusters such as the prepend class. That would make handling binaries
as input/output very slow.
List input => list output, binary input => binary output and
mixed input => mixed output for all find/split functions.
So that results can be post-processed without the need to invoke
unicode:characters_to_list|binary for intermediate data.
pad functions return lists of unicode:chardata() for performance.
|
|
|
|
|
|
We can save some time by reversing the original string
before starting the tokenization.
When there is only one separator, we can save even more time
by treating that case specially so that we don't have to call
lists:member/2 for each character.
|
|
|
|
files as delimiters.
While working on a tool that processes Erlang code and testing it against this repo,
I found out about those little sneaky 0xff. I thought it may be of help to other
people build such tools to remove non-conforming-to-standard characters.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Using a float for the number of copies results in an infinite loop.
Check that the argument is an integer.
Reported-By: Eric Pailleau
|
|
|