diff options
author | Hans Bolinder <[email protected]> | 2014-03-17 12:05:18 +0100 |
---|---|---|
committer | Hans Bolinder <[email protected]> | 2014-03-17 12:19:03 +0100 |
commit | 6be3e658d9999274d5c0c702bf799b90a110c745 (patch) | |
tree | f5768b4cf40ca1f02a80b2b5f7ee5fd94bf6be96 /system/doc/reference_manual/character_set.xml | |
parent | 237264bc018b0cc17afeac5d3f6030073f314f9d (diff) | |
download | otp-6be3e658d9999274d5c0c702bf799b90a110c745.tar.gz otp-6be3e658d9999274d5c0c702bf799b90a110c745.tar.bz2 otp-6be3e658d9999274d5c0c702bf799b90a110c745.zip |
Clarify the reference manual regarding source file encoding
Diffstat (limited to 'system/doc/reference_manual/character_set.xml')
-rw-r--r-- | system/doc/reference_manual/character_set.xml | 132 |
1 files changed, 132 insertions, 0 deletions
diff --git a/system/doc/reference_manual/character_set.xml b/system/doc/reference_manual/character_set.xml new file mode 100644 index 0000000000..884898eb34 --- /dev/null +++ b/system/doc/reference_manual/character_set.xml @@ -0,0 +1,132 @@ +<?xml version="1.0" encoding="utf-8" ?> +<!DOCTYPE chapter SYSTEM "chapter.dtd"> + +<chapter> + <header> + <copyright> + <year>2014</year><year>2014</year> + <holder>Ericsson AB. All Rights Reserved.</holder> + </copyright> + <legalnotice> + The contents of this file are subject to the Erlang Public License, + Version 1.1, (the "License"); you may not use this file except in + compliance with the License. You should have received a copy of the + Erlang Public License along with this software. If not, it can be + retrieved online at http://www.erlang.org/. + + Software distributed under the License is distributed on an "AS IS" + basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See + the License for the specific language governing rights and limitations + under the License. + + </legalnotice> + + <title>Character Set and Source File Encoding</title> + <prepared></prepared> + <docno></docno> + <date></date> + <rev></rev> + <file>character_set.xml</file> + </header> + + <section> + <title>Character Set</title> + <p>In Erlang 4.8/OTP R5A the syntax of Erlang tokens was extended to + allow the use of the full ISO-8859-1 (Latin-1) character set. This + is noticeable in the following ways:</p> + <list type="bulleted"> + <item> + <p>All the Latin-1 printable characters can be used and are + shown without the escape backslash convention.</p> + </item> + <item> + <p>Atoms and variables can use all Latin-1 letters.</p> + </item> + </list> + <table> + <row> + <cell align="left" valign="middle"><em>Octal</em></cell> + <cell align="left" valign="middle"><em>Decimal</em></cell> + <cell align="left" valign="middle"> </cell> + <cell align="left" valign="middle"><em>Class</em></cell> + </row> + <row> + <cell align="left" valign="middle">200 - 237</cell> + <cell align="left" valign="middle">128 - 159</cell> + <cell align="left" valign="middle"> </cell> + <cell align="left" valign="middle">Control characters</cell> + </row> + <row> + <cell align="left" valign="middle">240 - 277</cell> + <cell align="left" valign="middle">160 - 191</cell> + <cell align="right" valign="middle">- ¿</cell> + <cell align="left" valign="middle">Punctuation characters</cell> + </row> + <row> + <cell align="left" valign="middle">300 - 326</cell> + <cell align="left" valign="middle">192 - 214</cell> + <cell align="center" valign="middle">À - Ö</cell> + <cell align="left" valign="middle">Uppercase letters</cell> + </row> + <row> + <cell align="center" valign="middle">327</cell> + <cell align="center" valign="middle">215</cell> + <cell align="center" valign="middle">×</cell> + <cell align="left" valign="middle">Punctuation character</cell> + </row> + <row> + <cell align="left" valign="middle">330 - 336</cell> + <cell align="left" valign="middle">216 - 222</cell> + <cell align="center" valign="middle">Ø - Þ</cell> + <cell align="left" valign="middle">Uppercase letters</cell> + </row> + <row> + <cell align="left" valign="middle">337 - 366</cell> + <cell align="left" valign="middle">223 - 246</cell> + <cell align="center" valign="middle">ß - ö</cell> + <cell align="left" valign="middle">Lowercase letters</cell> + </row> + <row> + <cell align="center" valign="middle">367</cell> + <cell align="center" valign="middle">247</cell> + <cell align="center" valign="middle">÷</cell> + <cell align="left" valign="middle">Punctuation character</cell> + </row> + <row> + <cell align="left" valign="middle">370 - 377</cell> + <cell align="left" valign="middle">248 - 255</cell> + <cell align="center" valign="middle">ø - ÿ</cell> + <cell align="left" valign="middle">Lowercase letters</cell> + </row> + <tcaption>Character Classes.</tcaption> + </table> + <p>In Erlang/OTP R16B the syntax of Erlang tokens was extended to + handle Unicode. To begin with the support is limited to + strings, but Erlang/OTP 18 is expected to handle Unicode atoms + as well. More about the usage of Unicode in Erlang source files + can be found in <seealso + marker="stdlib:unicode_usage#unicode_in_erlang">STDLIB's User's + Guide</seealso>.</p> + </section> + <section> + <title>Source File Encoding</title> + <p>The Erlang source file <marker + id="encoding">encoding</marker> is selected by a + comment in one of the first two lines of the source file. The + first string that matches the regular expression + <c>coding\s*[:=]\s*([-a-zA-Z0-9])+</c> selects the encoding. If + the matching string is not a valid encoding it is ignored. The + valid encodings are <c>Latin-1</c> and <c>UTF-8</c> where the + case of the characters can be chosen freely.</p> + <p>The following example selects UTF-8 as default encoding:</p> + <pre> +%% coding: utf-8</pre> + <p>Two more examples, both selecting Latin-1 as default encoding:</p> + <pre> +%% For this file we have chosen encoding = Latin-1</pre> + <pre> +%% -*- coding: latin-1 -*-</pre> + <p>The default encoding for Erlang source files was changed from + Latin-1 to UTF-8 in Erlang OTP 17.0.</p> + </section> +</chapter> |