diff options
Diffstat (limited to 'system/doc/extensions/bit_syntax.xml')
-rw-r--r-- | system/doc/extensions/bit_syntax.xml | 403 |
1 files changed, 0 insertions, 403 deletions
diff --git a/system/doc/extensions/bit_syntax.xml b/system/doc/extensions/bit_syntax.xml deleted file mode 100644 index d86f73cd9a..0000000000 --- a/system/doc/extensions/bit_syntax.xml +++ /dev/null @@ -1,403 +0,0 @@ -<?xml version="1.0" encoding="latin1" ?> -<!DOCTYPE chapter SYSTEM "chapter.dtd"> - -<chapter> - <header> - <copyright> - <year>2000</year><year>2009</year> - <holder>Ericsson AB. All Rights Reserved.</holder> - </copyright> - <legalnotice> - The contents of this file are subject to the Erlang Public License, - Version 1.1, (the "License"); you may not use this file except in - compliance with the License. You should have received a copy of the - Erlang Public License along with this software. If not, it can be - retrieved online at http://www.erlang.org/. - - Software distributed under the License is distributed on an "AS IS" - basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See - the License for the specific language governing rights and limitations - under the License. - - </legalnotice> - - <title>The Bit Syntax</title> - <prepared>Björn Gustavsson</prepared> - <responsible>Bjarne Däcker</responsible> - <docno>1</docno> - <approved>Bjarne DäKer</approved> - <checked></checked> - <date>00-06-21</date> - <rev>PA1</rev> - <file>bit_syntax.sgml</file> - </header> - <p>This section describes the "bit syntax" which was added to - the Erlang language in release 5.0 (R7). - Compared to the original bit syntax prototype - by Claes Wikström and Tony Rogvall (presented on the - Erlang User's Conference 1999), this implementation differs - primarily in the following respects, - </p> - <list type="ordered"> - <item> - <p>the character pairs '<<' and '>>' are used to delimit - a binary patterns and constructor (not '<' and '>' as in - the prototype), - </p> - </item> - <item> - <p>the tail syntax ('|Variable') has been eliminated, - </p> - </item> - <item> - <p>all size expressions must be bound, - </p> - </item> - <item> - <p>a type <c>unit:U</c> has been added, - </p> - </item> - <item> - <p>lists and tuples cannot be generated - </p> - </item> - <item> - <p>there are no paddings whatsoever. - </p> - </item> - </list> - - <section> - <title>Introduction</title> - <p>In Erlang a Bin is used for constructing binaries and - matching binary patterns. A Bin is written with the - following syntax:</p> - <code type="none"><![CDATA[ - <<E1, E2, ... En>> - ]]></code> - <p>A Bin is a low-level sequence of bytes. The purpose of a - Bin is to be able to, from a high level, - <em>construct</em> a binary, - </p> - <code type="none"><![CDATA[ - Bin = <<E1, E2, ... En>> - ]]></code> - <p>in which case all elements must be bound, or to - <em>match</em> a binary, - </p> - <code type="none"><![CDATA[ - <<E1, E2, ... En>> = Bin - ]]></code> - <p>where <c>Bin</c> is bound, and where the elements are bound or unbound, - as in any match. - </p> - <p>Each element specifies a certain <em>segment</em> of the binary. - A segment is is a set of contiguous bits of the binary (not - necessarily on a byte boundary). The first element specifies - the initial segment, the second element specifies the following - segment etc. - </p> - <p>The following examples illustrate how binaries are constructed - or matched, and how elements and tails are specified. - </p> - - <section> - <title>Examples</title> - <p><em>Example 1: </em>A binary can be constructed from a set of - constants or a string literal:</p> - <code type="none"><![CDATA[ - Bin11 = <<1, 17, 42>>, - Bin12 = <<"abc">> - ]]></code> - <p>yields binaries of size 3; <c>binary_to_list(Bin11)</c> - evaluates to <c>[1, 17, 42]</c>, and - <c>binary_to_list(Bin12)</c> evaluates to <c>[97, 98, 99]</c>. - </p> - <p><em>Example 2: </em>Similarly, a binary can be constructed - from a set of bound variables:</p> - <code type="none"><![CDATA[ - A = 1, B = 17, C = 42, - Bin2 = <<A, B, C:16>> - ]]></code> - <p>yields a binary of size 4, and <c>binary_to_list(Bin2)</c> - evaluates to <c>[1, 17, 00, 42]</c> too. Here we used a - <em>size expression</em> for the variable <c>C</c> in order to - specify a 16-bits segment of <c>Bin2</c>. - </p> - <p><em>Example 3: </em>A Bin can also be used for matching: if - <c>D</c>, <c>E</c>, and <c>F</c> are unbound variables, and - <c>Bin2</c> is bound as in the former example,</p> - <code type="none"><![CDATA[ - <<D:16, E, F/binary>> = Bin2 - ]]></code> - <p>yields <c>D = 273</c>, <c>E = 00</c>, and F binds to a binary - of size 1: <c>binary_to_list(F) = [42]</c>. - </p> - <p><em>Example 4: </em>The following is a more elaborate example - of matching, where <c>Dgram</c> is bound to the consecutive - bytes of an IP datagram of IP protocol version 4, and where we - want to extract the header and the data of the datagram:</p> - <code type="none"><![CDATA[ - -define(IP_VERSION, 4). - -define(IP_MIN_HDR_LEN, 5). - - DgramSize = byte_size(Dgram), - case Dgram of - <<?IP_VERSION:4, HLen:4, SrvcType:8, TotLen:16, - ID:16, Flgs:3, FragOff:13, - TTL:8, Proto:8, HdrChkSum:16, - SrcIP:32, - DestIP:32, RestDgram/binary>> when HLen >= 5, 4*HLen =< DgramSize -> - OptsLen = 4*(HLen - ?IP_MIN_HDR_LEN), - <<Opts:OptsLen/binary,Data/binary>> = RestDgram, - ... - end. - ]]></code> - <p>Here the segment corresponding to the <c>Opts</c> variable - has a <em>type modifier</em> specifying that <c>Opts</c> should - bind to a binary. All other variables have the default type - equal to unsigned integer. - </p> - <p>An IP datagram header is of variable length, and its length - - measured in the number of 32-bit words - is given in the segment - corresponding to <c>HLen</c>, the minimum value of which is - 5. It is the segment corresponding to <c>Opts</c> that is - variable: if <c>HLen</c> is equal to 5, <c>Opts</c> will be an - empty binary. - </p> - <p>The tail variables <c>RestDgram</c> and <c>Data</c> bind to - binaries, as all tail variables do. Both may bind to empty - binaries. - </p> - <p>If the first 4-bits segment of <c>Dgram</c> is not equal to - 4, or if <c>HLen</c> is less than 5, or if the size of - <c>Dgram</c> is less than <c>4*HLen</c>, the match of - <c>Dgram</c> fails. - </p> - </section> - </section> - - <section> - <title>A Lexical Note</title> - <p>Note that "<c><![CDATA[B=<<1>>]]></c>" will be interpreted as - "<c><![CDATA[B =< ;<1>>]]></c>", which is a syntax error. - The correct way to write the expression is "<c><![CDATA[B = <<1>>]]></c>".</p> - </section> - - <section> - <title>Segments</title> - <p>Each segment has the following general syntax:</p> - <p><c>Value:Size/TypeSpecifierList</c></p> - <p>Both the <c>Size</c> and the <c>TypeSpecifier</c> or both may be - omitted; thus the following variations are allowed: - </p> - <p><c>Value</c></p> - <p><c>Value:Size</c></p> - <p><c>Value/TypeSpecifierList</c></p> - <p>Default values will be used for missing specifications. The default - values are described in the section "Defaults" below. - </p> - <p>Used in binary construction, the <c>Value</c> part is any expression. - Used in binary matching, the <c>Value</c> part must be a literal or - variable. You can read more about the <c>Value</c> part in the - sections about constructing binaries and matching binaries. - </p> - <p>The <c>Size</c> part of the segment multiplied by the unit in the - <c>TypeSpecifierList</c> (described below) gives the number of bits - for the segment. In construction, <c>Size</c> is any expression that - evaluates to an integer. In matching, <c>Size</c> must be a constant - expression or a variable. - </p> - <p>The <c>TypeSpecifierList</c> is a list of type specifiers separated by - hyphens. - </p> - <taglist> - <tag>Type</tag> - <item>The type can be <c>integer</c>, <c>float</c>, or <c>binary</c>.</item> - <tag>Signedness</tag> - <item>The signedness specification can be either <c>signed</c> - or <c>unsigned</c>. Note that signedness only matters for matching.</item> - <tag>Endianness</tag> - <item>The endianness specification can be either <c>big</c>, - <c>little</c>, or <c>native</c>. Native-endian means that - the endian will be resolved at load time to be either - big-endian or little-endian, depending on what is "native" - for the CPU that the Erlang machine is run on.</item> - <tag>Unit</tag> - <item>The unit size is given as <c>unit:IntegerLiteral</c>. - The allowed range is 1-256. It will be multiplied by the <c>Size</c> - specifier to give the effective size of the segment.</item> - </taglist> - <p>Example: - </p> - <code type="none"> -X:4/little-signed-integer-unit:8 - </code> - <p>This element has a total size of 4*8 = 32 bits, and it contains a - signed integer in little-endian order.</p> - </section> - - <section> - <title>Defaults</title> - <p>The default type for a segment is <c>integer</c>. The default type - does <em>not</em> depend on the value, even if the value is a literal. - For instance, the default type in '<c><![CDATA[<<3.14>>]]></c>' is <c>integer</c>, - not <c>float</c>. - </p> - <p>The default <c>Size</c> depends on the type. - For <c>integer</c> it is 8. For <c>float</c> it is 64. - For <c>binary</c> it is all of the binary. In matching, this default - value is only valid for the very last element. All other binary elements - in matching must have a size specification. - </p> - <p>The default unit depends on the the type. - For <c>integer</c> and <c>float</c> it is 1. - For <c>binary</c> it is 8. - </p> - <p>The default signedness is <c>unsigned</c>. - </p> - <p>The default endianness is <c>big</c>.</p> - </section> - - <section> - <title>Constructing Binaries</title> - <p>This section describes the rules for constructing binaries using - the bit syntax. Unlike when constructing lists or tuples, the construction - of a binary can fail with a <c>badarg</c> exception. - </p> - <p>There can be zero or more segments in a binary to be constructed. - The expression '<c><![CDATA[<<>>]]></c>' constructs a zero length binary. - </p> - <p>Each segment in a binary can consist of zero or more bits. - There are no alignment rules for individual segments, but the total - number of bits in all segments must be evenly divisible by 8, - or in other words, the resulting binary must consist of a whole number - of bytes. An <c>badarg</c> exception will be thrown if the resulting - binary is not byte-aligned. Example: - </p> - <code type="none"><![CDATA[ -<<X:1,Y:6>> - ]]></code> - <p>The total number of bits is 7, which is not evenly divisible by 8; - thus, there will be <c>badarg</c> exception (and a compiler warning - as well). The following example - </p> - <code type="none"><![CDATA[ -<<X:1,Y:6,Z:1>> - ]]></code> - <p>will successfully construct a binary of 8 bits, or one byte. (Provided - that all of X, Y and Z are integers.) - </p> - <p>As noted earlier, segments have the following general syntax: - </p> - <p><c>Value:Size/TypeSpecifierList</c></p> - <p>When constructing binaries, <c>Value</c> and <c>Size</c> can be - any Erlang expression. However, for syntactical reasons, - both <c>Value</c> and <c>Size</c> must be enclosed in parenthesis - if the expression consists of anything more than a single literal - or variable. The following gives a compiler syntax error: - </p> - <code type="none"><![CDATA[ -<<X+1:8>> - ]]></code> - <p>This expression must be rewritten to - </p> - <code type="none"><![CDATA[ -<<(X+1):8>> - ]]></code> - <p>in order to be accepted by the compiler. - </p> - - <section> - <title>Including Literal Strings</title> - <p>As syntactic sugar, an literal string may be written instead of a - element.</p> - <code type="none"><![CDATA[ -<<"hello">> ]]></code> - <p>which is syntactic sugar for</p> - <code type="none"><![CDATA[ -<<$h,$e,$l,$l,$o>> ]]></code> - </section> - </section> - - <section> - <title>Matching Binaries</title> - <p>This section describes the rules for matching binaries using the - bit syntax. - </p> - <p>There can be zero or more segments in a binary binary pattern. - A binary pattern can occur in every place patterns are allowed, also - inside other patterns. Binary patterns cannot be nested. - </p> - <p>The pattern '<c><![CDATA[<<>>]]></c>' matches a zero length binary. - </p> - <p>Each segment in a binary can consist of zero or more bits. - </p> - <p>A segment of type <c>binary</c> must have a size evenly divisible by 8. - </p> - <p>This means that the following head will never match:</p> - <code type="none"><![CDATA[ -foo(<<X:7/binary,Y:1/binary>>) -> ]]></code> - <p>As noted earlier, segments have the following general syntax: - </p> - <p><c>Value:Size/TypeSpecifierList</c></p> - <p>When matching <c>Value</c> value must be either a variable or an integer - or floating point literal. Expressions are not allowed. - </p> - <p><c>Size</c> must be an integer literal, or a previously bound variable. - Note that the following is not allowed:</p> - <code type="none"><![CDATA[ -foo(N, <<X:N,T/binary>>) -> - {X,T}. ]]></code> - <p>The two occurrences of <c>N</c> are not related. The compiler - will complain that the <c>N</c> in the size field is unbound. - </p> - <p>The correct way to write this example is like this:</p> - <code type="none"><![CDATA[ -foo(N, Bin) -> - <<X:N,T/binary>> = Bin, - {X,T}. ]]></code> - - <section> - <title>Getting the Rest of the Binary</title> - <p>To match out the rest of binary, specify a binary field without size:</p> - <code type="none"><![CDATA[ -foo(<<A:8,Rest/binary>>) -> ]]></code> - <p>As always, the size of the tail must be evenly divisible by 8. - </p> - </section> - </section> - - <section> - <title>Traps and Pitfalls</title> - <p>Assume that we need a function that creates a binary out of a - list of triples of integers. A first (inefficient) version of such - a function could look like this:</p> - <code type="none"><![CDATA[ -triples_to_bin(T) -> - triples_to_bin(T, <<>>). - -triples_to_bin([{X,Y,Z} | T], Acc) -> - triples_to_bin(T, <<Acc/binary, X:32, Y:32, Z:32>>); % inefficient -triples_to_bin([], Acc) -> - Acc. ]]></code> - <p>The reason for the inefficiency of this function is that for - each triple, the binary constructed so far (<c>Acc</c>) is copied. - (Note: The original bit syntax prototype avoided the copy operation - by using segmented binaries, which are not implemented in R7.) - </p> - <p>The efficient way to write this function in R7 is:</p> - <code type="none"><![CDATA[ -triples_to_bin(T) -> - triples_to_bin(T, []). - -triples_to_bin([{X,Y,Z} | T], Acc) -> - triples_to_bin(T, [<<X:32, Y:32, Z:32>> | Acc]); -triples_to_bin([], Acc) -> - list_to_binary(lists:reverse(Acc)). ]]></code> - <p>Note that <c>list_to_binary/1</c> handles deep lists of binaries - and small integers. (This fact was previously undocumented.) - </p> - </section> -</chapter> - |