1 files changed, 403 insertions, 0 deletions
diff --git a/system/doc/extensions/bit_syntax.xml b/system/doc/extensions/bit_syntax.xml
new file mode 100644
index 0000000000..d86f73cd9a
--- /dev/null
+++ b/system/doc/extensions/bit_syntax.xml
@@ -0,0 +1,403 @@
+<?xml version="1.0" encoding="latin1" ?>
+<!DOCTYPE chapter SYSTEM "chapter.dtd">
+
+<chapter>
+  <header>
+    <copyright>
+      <year>2000</year><year>2009</year>
+      <holder>Ericsson AB. All Rights Reserved.</holder>
+    </copyright>
+    <legalnotice>
+      The contents of this file are subject to the Erlang Public License,
+      Version 1.1, (the "License"); you may not use this file except in
+      compliance with the License. You should have received a copy of the
+      Erlang Public License along with this software. If not, it can be
+      retrieved online at http://www.erlang.org/.
+    
+      Software distributed under the License is distributed on an "AS IS"
+      basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
+      the License for the specific language governing rights and limitations
+      under the License.
+    
+    </legalnotice>
+
+    <title>The Bit Syntax</title>
+    <prepared>Bj&ouml;rn Gustavsson</prepared>
+    <responsible>Bjarne D&auml;cker</responsible>
+    <docno>1</docno>
+    <approved>Bjarne D&auml;Ker</approved>
+    <checked></checked>
+    <date>00-06-21</date>
+    <rev>PA1</rev>
+    <file>bit_syntax.sgml</file>
+  </header>
+  <p>This section describes the "bit syntax" which was added to
+    the Erlang language in release 5.0 (R7).
+    Compared to the original bit syntax prototype
+    by Claes Wikstr&ouml;m and Tony Rogvall (presented on the
+    Erlang User's Conference 1999), this implementation differs
+    primarily in the following respects,
+    </p>
+  <list type="ordered">
+    <item>
+      <p>the character pairs '&lt;&lt;' and '&gt;&gt;' are used to delimit
+        a binary patterns and constructor (not '&lt;' and '&gt;' as in
+        the prototype),
+        </p>
+    </item>
+    <item>
+      <p>the tail syntax ('|Variable') has been eliminated,
+        </p>
+    </item>
+    <item>
+      <p>all size expressions must be bound,
+        </p>
+    </item>
+    <item>
+      <p>a type <c>unit:U</c> has been added,
+        </p>
+    </item>
+    <item>
+      <p>lists and tuples cannot be generated
+        </p>
+    </item>
+    <item>
+      <p>there are no paddings whatsoever.
+        </p>
+    </item>
+  </list>
+
+  <section>
+    <title>Introduction</title>
+    <p>In Erlang a Bin is used for constructing binaries and 
+      matching binary patterns. A Bin is written with the
+      following syntax:</p>
+    <code type="none"><![CDATA[
+      <<E1, E2, ... En>>
+    ]]></code>
+    <p>A Bin is a low-level sequence of bytes. The purpose of a
+      Bin is to be able to, from a high level,
+      <em>construct</em> a binary,
+      </p>
+    <code type="none"><![CDATA[
+      Bin = <<E1, E2, ... En>>
+    ]]></code>
+    <p>in which case all elements must be bound, or to
+      <em>match</em> a binary, 
+      </p>
+    <code type="none"><![CDATA[
+      <<E1, E2, ... En>> = Bin 
+    ]]></code>
+    <p>where <c>Bin</c> is bound, and where the elements are bound or unbound,
+      as in any match.
+      </p>
+    <p>Each element specifies a certain <em>segment</em> of the binary.
+      A segment is is a set of contiguous bits of the binary (not
+      necessarily on a byte boundary). The first element specifies
+      the initial segment, the second element specifies the following
+      segment etc.
+      </p>
+    <p>The following examples illustrate how binaries are constructed
+      or matched, and how elements and tails are specified.
+      </p>
+
+    <section>
+      <title>Examples</title>
+      <p><em>Example 1: </em>A binary can be constructed from a set of
+        constants or a string literal:</p>
+      <code type="none"><![CDATA[
+        Bin11 = <<1, 17, 42>>,
+        Bin12 = <<"abc">>
+      ]]></code>
+      <p>yields binaries of size 3; <c>binary_to_list(Bin11)</c>
+        evaluates to <c>[1, 17, 42]</c>, and
+        <c>binary_to_list(Bin12)</c> evaluates to <c>[97, 98, 99]</c>.
+        </p>
+      <p><em>Example 2: </em>Similarly, a binary can be constructed
+        from a set of bound variables:</p>
+      <code type="none"><![CDATA[
+        A = 1, B = 17, C = 42,
+        Bin2 = <<A, B, C:16>>
+      ]]></code>
+      <p>yields a binary of size 4, and <c>binary_to_list(Bin2)</c>
+        evaluates to <c>[1, 17, 00, 42]</c> too. Here we used a
+        <em>size expression</em> for the variable <c>C</c> in order to
+        specify a 16-bits segment of <c>Bin2</c>.
+        </p>
+      <p><em>Example 3: </em>A Bin can also be used for matching: if
+        <c>D</c>, <c>E</c>, and <c>F</c> are unbound variables, and
+        <c>Bin2</c> is bound as in the former example,</p>
+      <code type="none"><![CDATA[
+        <<D:16, E, F/binary>> = Bin2
+      ]]></code>
+      <p>yields <c>D = 273</c>, <c>E = 00</c>, and F binds to a binary
+        of size 1: <c>binary_to_list(F) = [42]</c>.
+        </p>
+      <p><em>Example 4: </em>The following is a more elaborate example
+        of matching, where <c>Dgram</c> is bound to the consecutive
+        bytes of an IP datagram of IP protocol version 4, and where we
+        want to extract the header and the data of the datagram:</p>
+      <code type="none"><![CDATA[
+        -define(IP_VERSION, 4).
+        -define(IP_MIN_HDR_LEN, 5).
+
+        DgramSize = byte_size(Dgram),
+        case Dgram of 
+        <<?IP_VERSION:4, HLen:4, SrvcType:8, TotLen:16, 
+        ID:16, Flgs:3, FragOff:13,
+        TTL:8, Proto:8, HdrChkSum:16,
+        SrcIP:32,
+        DestIP:32, RestDgram/binary>> when HLen >= 5, 4*HLen =< DgramSize ->
+        OptsLen = 4*(HLen - ?IP_MIN_HDR_LEN),
+        <<Opts:OptsLen/binary,Data/binary>> = RestDgram,
+        ...
+        end.
+      ]]></code>
+      <p>Here the segment corresponding to the <c>Opts</c> variable
+        has a <em>type modifier</em> specifying that <c>Opts</c> should
+        bind to a binary. All other variables have the default type
+        equal to unsigned integer.
+        </p>
+      <p>An IP datagram header is of variable length, and its length -
+        measured in the number of 32-bit words - is given in the segment
+        corresponding to <c>HLen</c>, the minimum value of which is
+        5. It is the segment corresponding to <c>Opts</c> that is
+        variable: if <c>HLen</c> is equal to 5, <c>Opts</c> will be an
+        empty binary.
+        </p>
+      <p>The tail variables <c>RestDgram</c> and <c>Data</c> bind to
+        binaries, as all tail variables do. Both may bind to empty
+        binaries.
+        </p>
+      <p>If the first 4-bits segment of <c>Dgram</c> is not equal to
+        4, or if <c>HLen</c> is less than 5, or if the size of
+        <c>Dgram</c> is less than <c>4*HLen</c>, the match of
+        <c>Dgram</c> fails.
+        </p>
+    </section>
+  </section>
+
+  <section>
+    <title>A Lexical Note</title>
+    <p>Note that "<c><![CDATA[B=<<1>>]]></c>" will be interpreted as
+      "<c><![CDATA[B =< ;<1>>]]></c>", which is a syntax error.
+      The correct way to write the expression is "<c><![CDATA[B = <<1>>]]></c>".</p>
+  </section>
+
+  <section>
+    <title>Segments</title>
+    <p>Each segment has the following general syntax:</p>
+    <p><c>Value:Size/TypeSpecifierList</c></p>
+    <p>Both the <c>Size</c> and the <c>TypeSpecifier</c> or both may be
+      omitted; thus the following variations are allowed:
+      </p>
+    <p><c>Value</c></p>
+    <p><c>Value:Size</c></p>
+    <p><c>Value/TypeSpecifierList</c></p>
+    <p>Default values will be used for missing specifications. The default
+      values are described in the section "Defaults" below.
+      </p>
+    <p>Used in binary construction, the <c>Value</c> part is any expression.
+      Used in binary matching, the <c>Value</c> part must be a literal or
+      variable. You can read more about the <c>Value</c> part in the
+      sections about constructing binaries and matching binaries.
+      </p>
+    <p>The <c>Size</c> part of the segment multiplied by the unit in the
+      <c>TypeSpecifierList</c> (described below) gives the number of bits
+      for the segment. In construction, <c>Size</c> is any expression that
+      evaluates to an integer. In matching, <c>Size</c> must be a constant
+      expression or a variable.
+      </p>
+    <p>The <c>TypeSpecifierList</c> is a list of type specifiers separated by
+      hyphens.
+      </p>
+    <taglist>
+      <tag>Type</tag>
+      <item>The type can be <c>integer</c>, <c>float</c>, or <c>binary</c>.</item>
+      <tag>Signedness</tag>
+      <item>The signedness specification can be either <c>signed</c>
+       or <c>unsigned</c>. Note that signedness only matters for matching.</item>
+      <tag>Endianness</tag>
+      <item>The endianness specification can be either <c>big</c>,
+      <c>little</c>, or <c>native</c>. Native-endian means that
+       the endian will be resolved at load time to be either
+       big-endian or little-endian, depending on what is "native"
+       for the CPU that the Erlang machine is run on.</item>
+      <tag>Unit</tag>
+      <item>The unit size is given as <c>unit:IntegerLiteral</c>.
+       The allowed range is 1-256. It will be multiplied by the <c>Size</c>
+       specifier to give the effective size of the segment.</item>
+    </taglist>
+    <p>Example:
+      </p>
+    <code type="none">
+X:4/little-signed-integer-unit:8
+    </code>
+    <p>This element has a total size of 4*8 = 32 bits, and it contains a
+      signed integer in little-endian order.</p>
+  </section>
+
+  <section>
+    <title>Defaults</title>
+    <p>The default type for a segment is <c>integer</c>. The default type
+      does <em>not</em> depend on the value, even if the value is a literal.
+      For instance, the default type in '<c><![CDATA[<<3.14>>]]></c>' is <c>integer</c>,
+      not <c>float</c>.
+      </p>
+    <p>The default <c>Size</c> depends on the type.
+      For <c>integer</c> it is 8. For <c>float</c> it is 64.
+      For <c>binary</c> it is all of the binary. In matching, this default
+      value is only valid for the very last element. All other binary elements
+      in matching must have a size specification.
+      </p>
+    <p>The default unit depends on the the type. 
+      For <c>integer</c> and <c>float</c> it is 1.
+      For <c>binary</c> it is 8.
+      </p>
+    <p>The default signedness is <c>unsigned</c>.
+      </p>
+    <p>The default endianness is <c>big</c>.</p>
+  </section>
+
+  <section>
+    <title>Constructing Binaries</title>
+    <p>This section describes the rules for constructing binaries using
+      the bit syntax. Unlike when constructing lists or tuples, the construction
+      of a binary can fail with a <c>badarg</c> exception.
+      </p>
+    <p>There can be zero or more segments in a binary to be constructed.
+      The expression '<c><![CDATA[<<>>]]></c>' constructs a zero length binary.
+      </p>
+    <p>Each segment in a binary can consist of zero or more bits.
+      There are no alignment rules for individual segments, but the total
+      number of bits in all segments must be evenly divisible by 8,
+      or in other words, the resulting binary must consist of a whole number
+      of bytes. An <c>badarg</c> exception will be thrown if the resulting
+      binary is not byte-aligned. Example:
+      </p>
+    <code type="none"><![CDATA[
+<<X:1,Y:6>>
+    ]]></code>
+    <p>The total number of bits is 7, which is not evenly divisible by 8;
+      thus, there will be <c>badarg</c> exception (and a compiler warning
+      as well). The following example
+      </p>
+    <code type="none"><![CDATA[
+<<X:1,Y:6,Z:1>>
+    ]]></code>
+    <p>will successfully construct a binary of 8 bits, or one byte. (Provided
+      that all of X, Y and Z are integers.)
+      </p>
+    <p>As noted earlier, segments have the following general syntax:
+      </p>
+    <p><c>Value:Size/TypeSpecifierList</c></p>
+    <p>When constructing binaries, <c>Value</c> and <c>Size</c> can be
+      any Erlang expression. However, for syntactical reasons,
+      both <c>Value</c> and <c>Size</c> must be enclosed in parenthesis
+      if the expression consists of anything more than a single literal
+      or variable. The following gives a compiler syntax error:
+      </p>
+    <code type="none"><![CDATA[
+<<X+1:8>>
+    ]]></code>
+    <p>This expression must be rewritten to
+      </p>
+    <code type="none"><![CDATA[
+<<(X+1):8>>
+    ]]></code>
+    <p>in order to be accepted by the compiler.
+      </p>
+
+    <section>
+      <title>Including Literal Strings</title>
+      <p>As syntactic sugar, an literal string may be written instead of a
+        element.</p>
+      <code type="none"><![CDATA[
+<<"hello">>      ]]></code>
+      <p>which is syntactic sugar for</p>
+      <code type="none"><![CDATA[
+<<$h,$e,$l,$l,$o>>      ]]></code>
+    </section>
+  </section>
+
+  <section>
+    <title>Matching Binaries</title>
+    <p>This section describes the rules for matching binaries using the
+      bit syntax.
+      </p>
+    <p>There can be zero or more segments in a binary binary pattern.
+      A binary pattern can occur in every place patterns are allowed, also
+      inside other patterns. Binary patterns cannot be nested.
+      </p>
+    <p>The pattern '<c><![CDATA[<<>>]]></c>' matches a zero length binary.
+      </p>
+    <p>Each segment in a binary can consist of zero or more bits.
+      </p>
+    <p>A segment of type <c>binary</c> must have a size evenly divisible by 8.
+      </p>
+    <p>This means that the following head will never match:</p>
+    <code type="none"><![CDATA[
+foo(<<X:7/binary,Y:1/binary>>) ->    ]]></code>
+    <p>As noted earlier, segments have the following general syntax:
+      </p>
+    <p><c>Value:Size/TypeSpecifierList</c></p>
+    <p>When matching <c>Value</c> value must be either a variable or an integer
+      or floating point literal. Expressions are not allowed.
+      </p>
+    <p><c>Size</c> must be an integer literal, or a previously bound variable.
+      Note that the following is not allowed:</p>
+    <code type="none"><![CDATA[
+foo(N, <<X:N,T/binary>>) ->
+   {X,T}.    ]]></code>
+    <p>The two occurrences of <c>N</c> are not related. The compiler
+      will complain that the <c>N</c> in the size field is unbound.
+      </p>
+    <p>The correct way to write this example is like this:</p>
+    <code type="none"><![CDATA[
+foo(N, Bin) ->
+   <<X:N,T/binary>> = Bin,
+   {X,T}.    ]]></code>
+
+    <section>
+      <title>Getting the Rest of the Binary</title>
+      <p>To match out the rest of binary, specify a binary field without size:</p>
+      <code type="none"><![CDATA[
+foo(<<A:8,Rest/binary>>) ->      ]]></code>
+      <p>As always, the size of the tail must be evenly divisible by 8.
+        </p>
+    </section>
+  </section>
+
+  <section>
+    <title>Traps and Pitfalls</title>
+    <p>Assume that we need a function that creates a binary out of a
+      list of triples of integers. A first (inefficient) version of such
+      a function could look like this:</p>
+    <code type="none"><![CDATA[
+triples_to_bin(T) ->
+    triples_to_bin(T, <<>>).
+
+triples_to_bin([{X,Y,Z} | T], Acc) ->
+    triples_to_bin(T, <<Acc/binary, X:32, Y:32, Z:32>>);   % inefficient
+triples_to_bin([], Acc) -> 
+    Acc.    ]]></code>
+    <p>The reason for the inefficiency of this function is that for
+      each triple, the binary constructed so far (<c>Acc</c>) is copied.
+      (Note: The original bit syntax prototype avoided the copy operation
+      by using segmented binaries, which are not implemented in R7.)
+      </p>
+    <p>The efficient way to write this function in R7 is:</p>
+    <code type="none"><![CDATA[
+triples_to_bin(T) ->
+    triples_to_bin(T, []).
+
+triples_to_bin([{X,Y,Z} | T], Acc) ->
+    triples_to_bin(T, [<<X:32, Y:32, Z:32>> | Acc]);
+triples_to_bin([], Acc) -> 
+    list_to_binary(lists:reverse(Acc)).    ]]></code>
+    <p>Note that <c>list_to_binary/1</c> handles deep lists of binaries
+      and small integers. (This fact was previously undocumented.)
+      </p>
+  </section>
+</chapter>
+