aboutsummaryrefslogtreecommitdiffstats
path: root/system/doc/reference_manual/character_set.xml
diff options
context:
space:
mode:
authorHans Bolinder <[email protected]>2014-03-17 12:05:18 +0100
committerHans Bolinder <[email protected]>2014-03-17 12:19:03 +0100
commit6be3e658d9999274d5c0c702bf799b90a110c745 (patch)
treef5768b4cf40ca1f02a80b2b5f7ee5fd94bf6be96 /system/doc/reference_manual/character_set.xml
parent237264bc018b0cc17afeac5d3f6030073f314f9d (diff)
downloadotp-6be3e658d9999274d5c0c702bf799b90a110c745.tar.gz
otp-6be3e658d9999274d5c0c702bf799b90a110c745.tar.bz2
otp-6be3e658d9999274d5c0c702bf799b90a110c745.zip
Clarify the reference manual regarding source file encoding
Diffstat (limited to 'system/doc/reference_manual/character_set.xml')
-rw-r--r--system/doc/reference_manual/character_set.xml132
1 files changed, 132 insertions, 0 deletions
diff --git a/system/doc/reference_manual/character_set.xml b/system/doc/reference_manual/character_set.xml
new file mode 100644
index 0000000000..884898eb34
--- /dev/null
+++ b/system/doc/reference_manual/character_set.xml
@@ -0,0 +1,132 @@
+<?xml version="1.0" encoding="utf-8" ?>
+<!DOCTYPE chapter SYSTEM "chapter.dtd">
+
+<chapter>
+ <header>
+ <copyright>
+ <year>2014</year><year>2014</year>
+ <holder>Ericsson AB. All Rights Reserved.</holder>
+ </copyright>
+ <legalnotice>
+ The contents of this file are subject to the Erlang Public License,
+ Version 1.1, (the "License"); you may not use this file except in
+ compliance with the License. You should have received a copy of the
+ Erlang Public License along with this software. If not, it can be
+ retrieved online at http://www.erlang.org/.
+
+ Software distributed under the License is distributed on an "AS IS"
+ basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
+ the License for the specific language governing rights and limitations
+ under the License.
+
+ </legalnotice>
+
+ <title>Character Set and Source File Encoding</title>
+ <prepared></prepared>
+ <docno></docno>
+ <date></date>
+ <rev></rev>
+ <file>character_set.xml</file>
+ </header>
+
+ <section>
+ <title>Character Set</title>
+ <p>In Erlang 4.8/OTP R5A the syntax of Erlang tokens was extended to
+ allow the use of the full ISO-8859-1 (Latin-1) character set. This
+ is noticeable in the following ways:</p>
+ <list type="bulleted">
+ <item>
+ <p>All the Latin-1 printable characters can be used and are
+ shown without the escape backslash convention.</p>
+ </item>
+ <item>
+ <p>Atoms and variables can use all Latin-1 letters.</p>
+ </item>
+ </list>
+ <table>
+ <row>
+ <cell align="left" valign="middle"><em>Octal</em></cell>
+ <cell align="left" valign="middle"><em>Decimal</em></cell>
+ <cell align="left" valign="middle">&nbsp;</cell>
+ <cell align="left" valign="middle"><em>Class</em></cell>
+ </row>
+ <row>
+ <cell align="left" valign="middle">200 - 237</cell>
+ <cell align="left" valign="middle">128 - 159</cell>
+ <cell align="left" valign="middle">&nbsp;</cell>
+ <cell align="left" valign="middle">Control characters</cell>
+ </row>
+ <row>
+ <cell align="left" valign="middle">240 - 277</cell>
+ <cell align="left" valign="middle">160 - 191</cell>
+ <cell align="right" valign="middle">- &iquest;</cell>
+ <cell align="left" valign="middle">Punctuation characters</cell>
+ </row>
+ <row>
+ <cell align="left" valign="middle">300 - 326</cell>
+ <cell align="left" valign="middle">192 - 214</cell>
+ <cell align="center" valign="middle">&Agrave; - &Ouml;</cell>
+ <cell align="left" valign="middle">Uppercase letters</cell>
+ </row>
+ <row>
+ <cell align="center" valign="middle">327</cell>
+ <cell align="center" valign="middle">215</cell>
+ <cell align="center" valign="middle">&times;</cell>
+ <cell align="left" valign="middle">Punctuation character</cell>
+ </row>
+ <row>
+ <cell align="left" valign="middle">330 - 336</cell>
+ <cell align="left" valign="middle">216 - 222</cell>
+ <cell align="center" valign="middle">&Oslash; - &THORN;</cell>
+ <cell align="left" valign="middle">Uppercase letters</cell>
+ </row>
+ <row>
+ <cell align="left" valign="middle">337 - 366</cell>
+ <cell align="left" valign="middle">223 - 246</cell>
+ <cell align="center" valign="middle">&szlig; - &ouml;</cell>
+ <cell align="left" valign="middle">Lowercase letters</cell>
+ </row>
+ <row>
+ <cell align="center" valign="middle">367</cell>
+ <cell align="center" valign="middle">247</cell>
+ <cell align="center" valign="middle">&divide;</cell>
+ <cell align="left" valign="middle">Punctuation character</cell>
+ </row>
+ <row>
+ <cell align="left" valign="middle">370 - 377</cell>
+ <cell align="left" valign="middle">248 - 255</cell>
+ <cell align="center" valign="middle">&oslash; - &yuml;</cell>
+ <cell align="left" valign="middle">Lowercase letters</cell>
+ </row>
+ <tcaption>Character Classes.</tcaption>
+ </table>
+ <p>In Erlang/OTP R16B the syntax of Erlang tokens was extended to
+ handle Unicode. To begin with the support is limited to
+ strings, but Erlang/OTP 18 is expected to handle Unicode atoms
+ as well. More about the usage of Unicode in Erlang source files
+ can be found in <seealso
+ marker="stdlib:unicode_usage#unicode_in_erlang">STDLIB's User's
+ Guide</seealso>.</p>
+ </section>
+ <section>
+ <title>Source File Encoding</title>
+ <p>The Erlang source file <marker
+ id="encoding">encoding</marker> is selected by a
+ comment in one of the first two lines of the source file. The
+ first string that matches the regular expression
+ <c>coding\s*[:=]\s*([-a-zA-Z0-9])+</c> selects the encoding. If
+ the matching string is not a valid encoding it is ignored. The
+ valid encodings are <c>Latin-1</c> and <c>UTF-8</c> where the
+ case of the characters can be chosen freely.</p>
+ <p>The following example selects UTF-8 as default encoding:</p>
+ <pre>
+%% coding: utf-8</pre>
+ <p>Two more examples, both selecting Latin-1 as default encoding:</p>
+ <pre>
+%% For this file we have chosen encoding = Latin-1</pre>
+ <pre>
+%% -*- coding: latin-1 -*-</pre>
+ <p>The default encoding for Erlang source files was changed from
+ Latin-1 to UTF-8 in Erlang OTP 17.0.</p>
+ </section>
+</chapter>