Age | Commit message (Collapse) | Author |
|
|
|
|
|
Works with unicode:chardata() as input as was decided on OTP board
meeting as response to EEP-35 a long time ago.
Works on graphemes clusters as base, with a few exceptions, does not
handle classic (nor nfd'ified) Hangul nor the extended grapheme
clusters such as the prepend class. That would make handling binaries
as input/output very slow.
List input => list output, binary input => binary output and
mixed input => mixed output for all find/split functions.
So that results can be post-processed without the need to invoke
unicode:characters_to_list|binary for intermediate data.
pad functions return lists of unicode:chardata() for performance.
|
|
|
|
|
|
We can save some time by reversing the original string
before starting the tokenization.
When there is only one separator, we can save even more time
by treating that case specially so that we don't have to call
lists:member/2 for each character.
|
|
|
|
files as delimiters.
While working on a tool that processes Erlang code and testing it against this repo,
I found out about those little sneaky 0xff. I thought it may be of help to other
people build such tools to remove non-conforming-to-standard characters.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Using a float for the number of copies results in an infinite loop.
Check that the argument is an integer.
Reported-By: Eric Pailleau
|
|
|