This module provides functions for string processing.
A string in this module is represented by
"abcd" is a valid string
<<"abcd">> is a valid string
["abcd"] is a valid string
<<"abc..åäö"/utf8>> is a valid string
<<"abc..åäö">> is NOT a valid string,
but a binary with Latin-1-encoded codepoints
[<<"abc">>, "..åäö"] is a valid string
[atom] is NOT a valid string
This module operates on grapheme clusters. A grapheme cluster is a user-perceived character, which can be represented by several codepoints.
"å" [229] or [97, 778]
"e̊" [101, 778]
The string length of "ß↑e̊" is 3, even though it is represented by the
codepoints
Grapheme clusters for codepoints of class
Splitting and appending strings is to be done on grapheme clusters borders. There is no verification that the results of appending strings are valid or normalized.
Most of the functions expect all input to be normalized to one form,
see for example
Language or locale specific handling of input is not considered in any function.
The functions can crash for non-valid input strings. For example, the functions expect UTF-8 binaries but not all functions verify that all binaries are encoded correctly.
Unless otherwise specified the return value type is the same as the input type. That is, binary input returns binary output, list input returns a list output, and mixed input can return a mixed output.
1> string:trim(" sarah ").
"sarah"
2> string:trim(<<" sarah ">>).
<<"sarah">>
3> string:lexemes("foo bar", " ").
["foo","bar"]
4> string:lexemes(<<"foo bar">>, " ").
[<<"foo">>,<<"bar">>]
This module has been reworked in Erlang/OTP 20 to
handle
A user-perceived character, consisting of one or more codepoints.
Converts
Example:
1> string:casefold("Ω and ẞ SHARP S"). "ω and ss sharp s"
Returns a string where any trailing
Example:
182> string:chomp(<<"\nHello\n\n">>). <<"\nHello">> 183> string:chomp("\nHello\r\r\n"). "\nHello\r"
Returns
If
If
By default,
Example:
1> string:equal("åäö", <<"åäö"/utf8>>). true 2> string:equal("åäö", unicode:characters_to_nfd_binary("åäö")). false 3> string:equal("åäö", unicode:characters_to_nfd_binary("ÅÄÖ"), true, nfc). true
Removes anything before
By default,
Example:
1> string:find("ab..cd..ef", "."). "..cd..ef" 2> string:find(<<"ab..cd..ef">>, "..", trailing). <<"..ef">> 3> string:find(<<"ab..cd..ef">>, "x", leading). nomatch 4> string:find("ab..cd..ef", "x", trailing). nomatch
Returns
Example:
1> string:is_empty("foo"). false 2> string:is_empty(["",<<>>]). true
Returns the number of grapheme clusters in
Example:
1> string:length("ß↑e̊"). 3 2> string:length(<<195,159,226,134,145,101,204,138>>). 3
Returns a list of lexemes in
Notice that, as shown in this example, two or more
adjacent separator graphemes clusters in
Notice that
Example:
1> string:lexemes("abc de̊fxxghix jkl\r\nfoo", "x e" ++ [[$\r,$\n]]). ["abc","de̊f","ghi","jkl","foo"] 2> string:lexemes(<<"abc de̊fxxghix jkl\r\nfoo"/utf8>>, "x e" ++ [$\r,$\n]). [<<"abc">>,<<"de̊f"/utf8>>,<<"ghi">>,<<"jkl\r\nfoo">>]
Converts
Notice that function
Example:
2> string:lowercase(string:uppercase("Michał")). "michał"
Returns the first codepoint in
Example:
1> string:next_codepoint(unicode:characters_to_binary("e̊fg")). [101|<<"̊fg"/utf8>>]
Returns the first grapheme cluster in
Example:
1> string:next_grapheme(unicode:characters_to_binary("e̊fg")). ["e̊"|<<"fg">>]
Returns lexeme number
Example:
1> string:nth_lexeme("abc.de̊f.ghiejkl", 3, ".e"). "ghi"
Pads
By default,
Example:
1> string:pad(<<"He̊llö"/utf8>>, 8). [<<72,101,204,138,108,108,195,182>>,32,32,32] 2> io:format("'~ts'~n",[string:pad("He̊llö", 8, leading)]). ' He̊llö' 3> io:format("'~ts'~n",[string:pad("He̊llö", 8, both)]). ' He̊llö '
If
Example:
1> string:prefix(<<"prefix of string">>, "pre"). <<"fix of string">> 2> string:prefix("pre", "prefix"). nomatch
Replaces
Can be implemented as:
lists:join(Replacement, split(String, SearchPattern, Where)).
Example:
1> string:replace(<<"ab..cd..ef">>, "..", "*"). [<<"ab">>,"*",<<"cd..ef">>] 2> string:replace(<<"ab..cd..ef">>, "..", "*", all). [<<"ab">>,"*",<<"cd">>,"*",<<"ef">>]
Returns the reverse list of the grapheme clusters in
Example:
1> Reverse = string:reverse(unicode:characters_to_nfd_binary("ÅÄÖ")). [[79,776],[65,776],[65,778]] 2> io:format("~ts~n",[Reverse]). ÖÄÅ
Returns a substring of
By default,
Example:
1> string:slice(<<"He̊llö Wörld"/utf8>>, 4). <<"ö Wörld"/utf8>> 2> string:slice(["He̊llö ", <<"Wörld"/utf8>>], 4,4). "ö Wö" 3> string:slice(["He̊llö ", <<"Wörld"/utf8>>], 4,50). "ö Wörld"
Splits
Example:
0> string:split("ab..bc..cd", ".."). ["ab","bc..cd"] 1> string:split(<<"ab..bc..cd">>, "..", trailing). [<<"ab..bc">>,<<"cd">>] 2> string:split(<<"ab..bc....cd">>, "..", all). [<<"ab">>,<<"bc">>,<<>>,<<"cd">>]
Takes characters from
Example:
5> string:take("abc0z123", lists:seq($a,$z)). {"abc","0z123"} 6> string:take(<<"abc0z123">>, lists:seq($0,$9), true, leading). {<<"abc">>,<<"0z123">>} 7> string:take("abc0z123", lists:seq($0,$9), false, trailing). {"abc0z","123"} 8> string:take(<<"abc0z123">>, lists:seq($a,$z), true, trailing). {<<"abc0z">>,<<"123">>}
Converts
Example:
1> string:titlecase("ß is a SHARP s"). "Ss is a SHARP s"
Argument
Example:
> {F1,Fs} = string:to_float("1.0-1.0e-1"), > {F2,[]} = string:to_float(Fs), > F1+F2. 0.9 > string:to_float("3/2=1.5"). {error,no_float} > string:to_float("-1.5eX"). {-1.5,"eX"}
Argument
Example:
> {I1,Is} = string:to_integer("33+22"), > {I2,[]} = string:to_integer(Is), > I1-I2. 11 > string:to_integer("0.5"). {0,".5"} > string:to_integer("x=2"). {error,no_integer}
Converts
Example:
1> string:to_graphemes("ß↑e̊"). [223,8593,[101,778]] 2> string:to_graphemes(<<"ß↑e̊"/utf8>>). [223,8593,[101,778]]
Returns a string, where leading or trailing, or both,
Default
Notice that
Example:
1> string:trim("\t Hello \n"). "Hello" 2> string:trim(<<"\t Hello \n">>, leading). <<"Hello \n">> 3> string:trim(<<".Hello.\n">>, trailing, "\n."). <<".Hello">>
Converts
See also
Example:
1> string:uppercase("Michał"). "MICHAŁ"