This module provides functions for string processing.
A string in this module is represented by
"abcd" is a valid string
<<"abcd">> is a valid string
["abcd"] is a valid string
<<"abc..åäö"/utf8>> is a valid string
<<"abc..åäö">> is NOT a valid string,
but a binary with Latin-1-encoded codepoints
[<<"abc">>, "..åäö"] is a valid string
[atom] is NOT a valid string
This module operates on grapheme clusters. A grapheme cluster is a user-perceived character, which can be represented by several codepoints.
"å" [229] or [97, 778]
"e̊" [101, 778]
The string length of "ß↑e̊" is 3, even though it is represented by the
codepoints
Grapheme clusters for codepoints of class
Splitting and appending strings is to be done on grapheme clusters borders. There is no verification that the results of appending strings are valid or normalized.
Most of the functions expect all input to be normalized to one form,
see for example
Language or locale specific handling of input is not considered in any function.
The functions can crash for non-valid input strings. For example, the functions expect UTF-8 binaries but not all functions verify that all binaries are encoded correctly.
Unless otherwise specified the return value type is the same as the input type. That is, binary input returns binary output, list input returns a list output, and mixed input can return a mixed output.
1> string:trim(" sarah ").
"sarah"
2> string:trim(<<" sarah ">>).
<<"sarah">>
3> string:lexemes("foo bar", " ").
["foo","bar"]
4> string:lexemes(<<"foo bar">>, " ").
[<<"foo">>,<<"bar">>]
This module has been reworked in Erlang/OTP 20 to
handle
A user-perceived character, consisting of one or more codepoints.
Converts
Example:
1> string:casefold("Ω and ẞ SHARP S"). "ω and ss sharp s"
Returns a string where any trailing
Example:
182> string:chomp(<<"\nHello\n\n">>). <<"\nHello">> 183> string:chomp("\nHello\r\r\n"). "\nHello\r"
Returns
If
If
By default,
Example:
1> string:equal("åäö", <<"åäö"/utf8>>). true 2> string:equal("åäö", unicode:characters_to_nfd_binary("åäö")). false 3> string:equal("åäö", unicode:characters_to_nfd_binary("ÅÄÖ"), true, nfc). true
Removes anything before
By default,
Example:
1> string:find("ab..cd..ef", "."). "..cd..ef" 2> string:find(<<"ab..cd..ef">>, "..", trailing). <<"..ef">> 3> string:find(<<"ab..cd..ef">>, "x", leading). nomatch 4> string:find("ab..cd..ef", "x", trailing). nomatch
Returns
Example:
1> string:is_empty("foo"). false 2> string:is_empty(["",<<>>]). true
Returns the number of grapheme clusters in
Example:
1> string:length("ß↑e̊"). 3 2> string:length(<<195,159,226,134,145,101,204,138>>). 3
Returns a list of lexemes in
Notice that, as shown in this example, two or more
adjacent separator graphemes clusters in
Notice that
Example:
1> string:lexemes("abc de̊fxxghix jkl\r\nfoo", "x e" ++ [[$\r,$\n]]). ["abc","de̊f","ghi","jkl","foo"] 2> string:lexemes(<<"abc de̊fxxghix jkl\r\nfoo"/utf8>>, "x e" ++ [$\r,$\n]). [<<"abc">>,<<"de̊f"/utf8>>,<<"ghi">>,<<"jkl\r\nfoo">>]
Converts
Notice that function
Example:
2> string:lowercase(string:uppercase("Michał")). "michał"
Returns the first codepoint in
Example:
1> string:next_codepoint(unicode:characters_to_binary("e̊fg")). [101|<<"̊fg"/utf8>>]
Returns the first grapheme cluster in
Example:
1> string:next_grapheme(unicode:characters_to_binary("e̊fg")). ["e̊"|<<"fg">>]
Returns lexeme number
Example:
1> string:nth_lexeme("abc.de̊f.ghiejkl", 3, ".e"). "ghi"
Pads
By default,
Example:
1> string:pad(<<"He̊llö"/utf8>>, 8). [<<72,101,204,138,108,108,195,182>>,32,32,32] 2> io:format("'~ts'~n",[string:pad("He̊llö", 8, leading)]). ' He̊llö' 3> io:format("'~ts'~n",[string:pad("He̊llö", 8, both)]). ' He̊llö '
If
Example:
1> string:prefix(<<"prefix of string">>, "pre"). <<"fix of string">> 2> string:prefix("pre", "prefix"). nomatch
Replaces
Can be implemented as:
lists:join(Replacement, split(String, SearchPattern, Where)).
Example:
1> string:replace(<<"ab..cd..ef">>, "..", "*"). [<<"ab">>,"*",<<"cd..ef">>] 2> string:replace(<<"ab..cd..ef">>, "..", "*", all). [<<"ab">>,"*",<<"cd">>,"*",<<"ef">>]
Returns the reverse list of the grapheme clusters in
Example:
1> Reverse = string:reverse(unicode:characters_to_nfd_binary("ÅÄÖ")). [[79,776],[65,776],[65,778]] 2> io:format("~ts~n",[Reverse]). ÖÄÅ
Returns a substring of
By default,
Example:
1> string:slice(<<"He̊llö Wörld"/utf8>>, 4). <<"ö Wörld"/utf8>> 2> string:slice(["He̊llö ", <<"Wörld"/utf8>>], 4,4). "ö Wö" 3> string:slice(["He̊llö ", <<"Wörld"/utf8>>], 4,50). "ö Wörld"
Splits
Example:
0> string:split("ab..bc..cd", ".."). ["ab","bc..cd"] 1> string:split(<<"ab..bc..cd">>, "..", trailing). [<<"ab..bc">>,<<"cd">>] 2> string:split(<<"ab..bc....cd">>, "..", all). [<<"ab">>,<<"bc">>,<<>>,<<"cd">>]
Takes characters from
Example:
5> string:take("abc0z123", lists:seq($a,$z)). {"abc","0z123"} 6> string:take(<<"abc0z123">>, lists:seq($0,$9), true, leading). {<<"abc">>,<<"0z123">>} 7> string:take("abc0z123", lists:seq($0,$9), false, trailing). {"abc0z","123"} 8> string:take(<<"abc0z123">>, lists:seq($a,$z), true, trailing). {<<"abc0z">>,<<"123">>}
Converts
Example:
1> string:titlecase("ß is a SHARP s"). "Ss is a SHARP s"
Argument
Example:
> {F1,Fs} = string:to_float("1.0-1.0e-1"), > {F2,[]} = string:to_float(Fs), > F1+F2. 0.9 > string:to_float("3/2=1.5"). {error,no_float} > string:to_float("-1.5eX"). {-1.5,"eX"}
Argument
Example:
> {I1,Is} = string:to_integer("33+22"), > {I2,[]} = string:to_integer(Is), > I1-I2. 11 > string:to_integer("0.5"). {0,".5"} > string:to_integer("x=2"). {error,no_integer}
Converts
Example:
1> string:to_graphemes("ß↑e̊"). [223,8593,[101,778]] 2> string:to_graphemes(<<"ß↑e̊"/utf8>>). [223,8593,[101,778]]
Returns a string, where leading or trailing, or both,
Default
Notice that
Example:
1> string:trim("\t Hello \n"). "Hello" 2> string:trim(<<"\t Hello \n">>, leading). <<"Hello \n">> 3> string:trim(<<".Hello.\n">>, trailing, "\n."). <<".Hello">>
Converts
See also
Example:
1> string:uppercase("Michał"). "MICHAŁ"
Here follows the function of the old API. These functions only work on a list of Latin-1 characters.
The functions are kept for backward compatibility, but are not recommended. They will be deprecated in Erlang/OTP 21.
Any undocumented functions in
Returns a string, where
This function is
Returns a string consisting of
This function is
Returns the index of the first occurrence of
This function is
Concatenates
This function is
Returns a string containing
This function is
Returns the length of the maximum initial segment of
This function is
Example:
> string:cspan("\t abcdef", " \t").
0
Returns a string with the elements of
This function is
Example:
> join(["one", "two", "three"], ", ").
"one, two, three"
Returns
This function is
Example:
> string:left("Hello",10,$.).
"Hello....."
Returns the number of characters in
This function is
Returns the index of the last occurrence of
This function is
Returns
This function is
Example:
> string:right("Hello", 10, $.).
".....Hello"
Returns the position where the last occurrence of
This function is
Example:
> string:rstr(" Hello Hello World World ", "Hello World").
8
Returns the length of the maximum initial segment of
This function is
Example:
> string:span("\t abcdef", " \t").
5
Returns the position where the first occurrence of
This function is
Example:
> string:str(" Hello Hello World World ", "Hello World").
8
Returns a string, where leading or trailing, or both, blanks or a
number of
This function is
Example:
> string:strip("...Hello.....", both, $.).
"Hello"
Returns a substring of
This function is
Example:
sub_string("Hello World", 4, 8).
"lo Wo"
Returns a substring of
This function is
Example:
> substr("Hello World", 4, 5).
"lo Wo"
Returns the word in position
This function is
Example:
> string:sub_word(" Hello old boy !",3,$o).
"ld b"
The specified string or character is case-converted. Notice that the supported character set is ISO/IEC 8859-1 (also called Latin 1); all values outside this set are unchanged
This function is
Returns a list of tokens in
Example:
> tokens("abc defxxghix jkl", "x ").
["abc", "def", "ghi", "jkl"]
Notice that, as shown in this example, two or more
adjacent separator characters in
This function is
Returns the number of words in
This function is
Example:
> words(" Hello old boy!", $o).
4
Some of the general string functions can seem to overlap each other. The reason is that this string package is the combination of two earlier packages and all functions of both packages have been retained.