This module contains functions for converting between different character representations. It converts between ISO Latin-1 characters and Unicode characters, but it can also convert between different Unicode encodings (like UTF-8, UTF-16, and UTF-32).
The default Unicode encoding in Erlang is in binaries UTF-8, which is also the format in which built-in functions and libraries in OTP expect to find binary Unicode data. In lists, Unicode data is encoded as integers, each integer representing one character and encoded simply as the Unicode code point for the character.
Other Unicode encodings than integers representing code points or UTF-8 in binaries are referred to as "external encodings". The ISO Latin-1 encoding is in binaries and lists referred to as latin1-encoding.
It is recommended to only use external encodings for communication with external entities where this is required. When working inside the Erlang/OTP environment, it is recommended to keep binaries in UTF-8 when representing Unicode characters. ISO Latin-1 encoding is supported both for backward compatibility and for communication with external entities not supporting Unicode character sets.
Programs should always operate on a normalized form and compare
  canonical-equivalent Unicode characters as equal. All characters
  should thus be normalized to one form once on the system borders.
  One of the following functions can convert characters to their
  normalized forms 
A 
A 
A 
An 
Same as 
Same as 
Checks for a UTF Byte Order Mark (BOM) in the beginning of a
          binary. If the supplied binary 
If no BOM is found, the function returns 
Same as 
Same as 
Behaves as 
Options:
An alias for 
An alias for 
An alias for 
The atoms 
Errors and exceptions occur as in
          
Same as 
Converts a possibly deep list of integers and binaries into a list of integers representing Unicode characters. The binaries in the input can have characters encoded as one of the following:
ISO Latin-1 (0-255, one character per byte). Here,
              case parameter 
One of the UTF-encodings, which is specified as parameter 
              
Only when 
If 
The purpose of the function is mainly to convert
          combinations of Unicode characters into a pure Unicode
          string in list representation for further processing. For
          writing the data to an external entity, the reverse function
          
Option 
If the data cannot be converted, either
          because of illegal Unicode/ISO Latin-1 characters in the list, 
          or because of invalid UTF encoding in any binaries, an error
          tuple is returned. The error tuple contains the tag
          
However, if the input 
Errors occur for the following reasons:
Integers out of range.
If 
If 
An integer > 16#10FFFF (the maximum Unicode character)
An integer in the range 16#D800 to 16#DFFF (invalid range reserved for UTF-16 surrogate pairs)
Incorrect UTF encoding.
If 
Errors can occur for various reasons, including the following:
"Pure" decoding errors (like the upper bits of the bytes being wrong).
The bytes are decoded to a too large number.
The bytes are decoded to a code point in the invalid Unicode range.
Encoding is "overlong", meaning that a number should have been encoded in fewer bytes.
The case of a truncated UTF is handled specially, see the paragraph about incomplete binaries below.
If 
A special type of error is when no actual invalid integers or
          bytes are found, but a trailing 
If one UTF character is split over two consecutive binaries in
          the 
Example:
decode_data(Data) ->
   case unicode:characters_to_list(Data,unicode) of
      {incomplete,Encoded, Rest} ->
            More = get_some_more_data(),
            Encoded ++ decode_data([Rest, More]);
      {error,Encoded,Rest} ->
            handle_error(Encoded,Rest);
      List ->
            List
   end.
        However, bit strings that are not whole bytes are not allowed, so a UTF character must be split along 8-bit boundaries to ever be decoded.
A 
Converts a possibly deep list of characters and binaries into a Normalized Form of canonical equivalent Composed characters according to the Unicode standard.
Any binaries in the input must be encoded with utf8 encoding.
The result is a list of characters.
3> unicode:characters_to_nfc_list([<<"abc..a">>,[778],$a,[776],$o,[776]]).
"abc..åäö"
      Converts a possibly deep list of characters and binaries into a Normalized Form of canonical equivalent Composed characters according to the Unicode standard.
Any binaries in the input must be encoded with utf8 encoding.
The result is an utf8 encoded binary.
4> unicode:characters_to_nfc_binary([<<"abc..a">>,[778],$a,[776],$o,[776]]).
<<"abc..åäö"/utf8>>
      Converts a possibly deep list of characters and binaries into a Normalized Form of canonical equivalent Decomposed characters according to the Unicode standard.
Any binaries in the input must be encoded with utf8 encoding.
The result is a list of characters.
1> unicode:characters_to_nfd_list("abc..åäö").
[97,98,99,46,46,97,778,97,776,111,776]
      Converts a possibly deep list of characters and binaries into a Normalized Form of canonical equivalent Decomposed characters according to the Unicode standard.
Any binaries in the input must be encoded with utf8 encoding.
The result is an utf8 encoded binary.
2> unicode:characters_to_nfd_binary("abc..åäö").
<<97,98,99,46,46,97,204,138,97,204,136,111,204,136>>
      Converts a possibly deep list of characters and binaries into a Normalized Form of compatibly equivalent Composed characters according to the Unicode standard.
Any binaries in the input must be encoded with utf8 encoding.
The result is a list of characters.
3> unicode:characters_to_nfkc_list([<<"abc..a">>,[778],$a,[776],$o,[776],[65299,65298]]).
"abc..åäö32"
      Converts a possibly deep list of characters and binaries into a Normalized Form of compatibly equivalent Composed characters according to the Unicode standard.
Any binaries in the input must be encoded with utf8 encoding.
The result is an utf8 encoded binary.
4> unicode:characters_to_nfkc_binary([<<"abc..a">>,[778],$a,[776],$o,[776],[65299,65298]]).
<<"abc..åäö32"/utf8>>
      Converts a possibly deep list of characters and binaries into a Normalized Form of compatibly equivalent Decomposed characters according to the Unicode standard.
Any binaries in the input must be encoded with utf8 encoding.
The result is a list of characters.
1> unicode:characters_to_nfkd_list(["abc..åäö",[65299,65298]]).
[97,98,99,46,46,97,778,97,776,111,776,51,50]
      Converts a possibly deep list of characters and binaries into a Normalized Form of compatibly equivalent Decomposed characters according to the Unicode standard.
Any binaries in the input must be encoded with utf8 encoding.
The result is an utf8 encoded binary.
2> unicode:characters_to_nfkd_binary(["abc..åäö",[65299,65298]]).
<<97,98,99,46,46,97,204,138,97,204,136,111,204,136,51,50>>
      Creates a UTF Byte Order Mark (BOM) as a binary from the
          supplied 
The function returns 
Notice that the BOM for UTF-8 is seldom used, and it is really not a byte order mark. There are obviously no byte order issues with UTF-8, so the BOM is only there to differentiate UTF-8 encoding from other UTF formats.