diff options
Diffstat (limited to 'system/doc/programming_examples/bit_syntax.xml')
-rw-r--r-- | system/doc/programming_examples/bit_syntax.xml | 327 |
1 files changed, 327 insertions, 0 deletions
diff --git a/system/doc/programming_examples/bit_syntax.xml b/system/doc/programming_examples/bit_syntax.xml new file mode 100644 index 0000000000..3306365c0e --- /dev/null +++ b/system/doc/programming_examples/bit_syntax.xml @@ -0,0 +1,327 @@ +<?xml version="1.0" encoding="latin1" ?> +<!DOCTYPE chapter SYSTEM "chapter.dtd"> + +<chapter> + <header> + <copyright> + <year>2003</year><year>2009</year> + <holder>Ericsson AB. All Rights Reserved.</holder> + </copyright> + <legalnotice> + The contents of this file are subject to the Erlang Public License, + Version 1.1, (the "License"); you may not use this file except in + compliance with the License. You should have received a copy of the + Erlang Public License along with this software. If not, it can be + retrieved online at http://www.erlang.org/. + + Software distributed under the License is distributed on an "AS IS" + basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See + the License for the specific language governing rights and limitations + under the License. + + </legalnotice> + + <title>Bit Syntax</title> + <prepared></prepared> + <docno></docno> + <date></date> + <rev></rev> + <file>bit_syntax.xml</file> + </header> + + <section> + <title>Introduction</title> + <p>In Erlang a Bin is used for constructing binaries and matching + binary patterns. A Bin is written with the following syntax:</p> + <code type="none"><![CDATA[ + <<E1, E2, ... En>>]]></code> + <p>A Bin is a low-level sequence of bits or bytes. The purpose of a Bin is + to be able to, from a high level, construct a binary,</p> + <code type="none"><![CDATA[ +Bin = <<E1, E2, ... En>>]]></code> + <p>in which case all elements must be bound, or to match a binary,</p> + <code type="none"><![CDATA[ +<<E1, E2, ... En>> = Bin ]]></code> + <p>where <c>Bin</c> is bound, and where the elements are bound or + unbound, as in any match.</p> + <p>In R12B, a Bin need not consist of a whole number of bytes.</p> + + <p>A <em>bitstring</em> is a sequence of zero or more bits, where + the number of bits doesn't need to be divisible by 8. If the number + of bits is divisible by 8, the bitstring is also a binary.</p> + <p>Each element specifies a certain <em>segment</em> of the bitstring. + A segment is a set of contiguous bits of the binary (not + necessarily on a byte boundary). The first element specifies + the initial segment, the second element specifies the following + segment etc.</p> + <p>The following examples illustrate how binaries are constructed + or matched, and how elements and tails are specified.</p> + + <section> + <title>Examples</title> + <p><em>Example 1: </em>A binary can be constructed from a set of + constants or a string literal:</p> + <code type="none"><![CDATA[ +Bin11 = <<1, 17, 42>>, +Bin12 = <<"abc">>]]></code> + <p>yields binaries of size 3; <c>binary_to_list(Bin11)</c> + evaluates to <c>[1, 17, 42]</c>, and + <c>binary_to_list(Bin12)</c> evaluates to <c>[97, 98, 99]</c>.</p> + <p><em>Example 2: </em>Similarly, a binary can be constructed + from a set of bound variables:</p> + <code type="none"><![CDATA[ +A = 1, B = 17, C = 42, +Bin2 = <<A, B, C:16>>]]></code> + <p>yields a binary of size 4, and <c>binary_to_list(Bin2)</c> + evaluates to <c>[1, 17, 00, 42]</c> too. Here we used a + <em>size expression</em> for the variable <c>C</c> in order to + specify a 16-bits segment of <c>Bin2</c>.</p> + <p><em>Example 3: </em>A Bin can also be used for matching: if + <c>D</c>, <c>E</c>, and <c>F</c> are unbound variables, and + <c>Bin2</c> is bound as in the former example,</p> + <code type="none"><![CDATA[ +<<D:16, E, F/binary>> = Bin2]]></code> + <p>yields <c>D = 273</c>, <c>E = 00</c>, and F binds to a binary + of size 1: <c>binary_to_list(F) = [42]</c>.</p> + <p><em>Example 4:</em> The following is a more elaborate example + of matching, where <c>Dgram</c> is bound to the consecutive + bytes of an IP datagram of IP protocol version 4, and where we + want to extract the header and the data of the datagram:</p> + <code type="none"><![CDATA[ +-define(IP_VERSION, 4). +-define(IP_MIN_HDR_LEN, 5). + +DgramSize = byte_size(Dgram), +case Dgram of + <<?IP_VERSION:4, HLen:4, SrvcType:8, TotLen:16, + ID:16, Flgs:3, FragOff:13, + TTL:8, Proto:8, HdrChkSum:16, + SrcIP:32, + DestIP:32, RestDgram/binary>> when HLen>=5, 4*HLen=<DgramSize -> + OptsLen = 4*(HLen - ?IP_MIN_HDR_LEN), + <<Opts:OptsLen/binary,Data/binary>> = RestDgram, + ... +end.]]></code> + <p>Here the segment corresponding to the <c>Opts</c> variable + has a <em>type modifier</em> specifying that <c>Opts</c> should + bind to a binary. All other variables have the default type + equal to unsigned integer.</p> + <p>An IP datagram header is of variable length, and its length - + measured in the number of 32-bit words - is given in + the segment corresponding to <c>HLen</c>, the minimum value of + which is 5. It is the segment corresponding to <c>Opts</c> + that is variable: if <c>HLen</c> is equal to 5, <c>Opts</c> + will be an empty binary.</p> + <p>The tail variables <c>RestDgram</c> and <c>Data</c> bind to + binaries, as all tail variables do. Both may bind to empty + binaries.</p> + <p>If the first 4-bits segment of <c>Dgram</c> is not equal to + 4, or if <c>HLen</c> is less than 5, or if the size of + <c>Dgram</c> is less than <c>4*HLen</c>, the match of + <c>Dgram</c> fails.</p> + </section> + </section> + + <section> + <title>A Lexical Note</title> + <p>Note that "<c><![CDATA[B=<<1>>]]></c>" will be interpreted as + "<c><![CDATA[B =< <1>>]]></c>", which is a syntax error. + The correct way to write the expression is + "<c><![CDATA[B = <<1>>]]></c>".</p> + </section> + + <section> + <title>Segments</title> + <p>Each segment has the following general syntax:</p> + <p><c>Value:Size/TypeSpecifierList</c></p> + <p>Both the <c>Size</c> and the <c>TypeSpecifier</c> or both may be + omitted; thus the following variations are allowed:</p> + <p><c>Value</c></p> + <p><c>Value:Size</c></p> + <p><c>Value/TypeSpecifierList</c></p> + <p>Default values will be used for missing specifications. + The default values are described in the section + <seealso marker="#Defaults">Defaults</seealso>.</p> + <p>Used in binary construction, the <c>Value</c> part is any + expression. Used in binary matching, the <c>Value</c> part must + be a literal or variable. You can read more about + the <c>Value</c> part in the section about constructing + binaries and matching binaries.</p> + <p>The <c>Size</c> part of the segment multiplied by the unit in + the <c>TypeSpecifierList</c> (described below) gives the number + of bits for the segment. In construction, <c>Size</c> is any + expression that evaluates to an integer. In matching, + <c>Size</c> must be a constant expression or a variable.</p> + <p>The <c>TypeSpecifierList</c> is a list of type specifiers + separated by hyphens.</p> + <taglist> + <tag>Type</tag> + <item>The type can be <c>integer</c>, <c>float</c>, or + <c>binary</c>.</item> + <tag>Signedness</tag> + <item>The signedness specification can be either <c>signed</c> + or <c>unsigned</c>. Note that signedness only matters for + matching.</item> + <tag>Endianness</tag> + <item>The endianness specification can be either <c>big</c>, + <c>little</c>, or <c>native</c>. Native-endian means that + the endian will be resolved at load time to be either + big-endian or little-endian, depending on what is "native" + for the CPU that the Erlang machine is run on.</item> + <tag>Unit</tag> + <item>The unit size is given as <c>unit:IntegerLiteral</c>. + The allowed range is 1-256. It will be multiplied by + the <c>Size</c> specifier to give the effective size of + the segment. In R12B, the unit size specifies the alignment + for binary segments without size (examples will follow).</item> + </taglist> + <p>Example:</p> + <code type="none"> +X:4/little-signed-integer-unit:8</code> + <p>This element has a total size of 4*8 = 32 bits, and it contains + a signed integer in little-endian order.</p> + </section> + + <section> + <title>Defaults</title> + <p><marker id="Defaults"></marker>The default type for a segment is integer. The default + type does not depend on the value, even if the value is a + literal. For instance, the default type in '<c><![CDATA[<<3.14>>]]></c>' is + integer, not float.</p> + <p>The default <c>Size</c> depends on the type. For integer it is + 8. For float it is 64. For binary it is all of the binary. In + matching, this default value is only valid for the very last + element. All other binary elements in matching must have a size + specification.</p> + <p>The default unit depends on the the type. For <c>integer</c>, + <c>float</c>, and <c>bitstring</c> it is 1. For binary it is 8.</p> + <p>The default signedness is <c>unsigned</c>.</p> + <p>The default endianness is <c>big</c>.</p> + </section> + + <section> + <title>Constructing Binaries and Bitstrings</title> + <p>This section describes the rules for constructing binaries using + the bit syntax. Unlike when constructing lists or tuples, + the construction of a binary can fail with a <c>badarg</c> + exception.</p> + <p>There can be zero or more segments in a binary to be + constructed. The expression '<c><![CDATA[<<>>]]></c>' constructs a zero + length binary.</p> + <p>Each segment in a binary can consist of zero or more bits. + There are no alignment rules for individual segments of type + <c>integer</c> and <c>float</c>. For binaries and bitstrings + without size, the unit specifies the alignment. Since the default + alignment for the <c>binary</c> type is 8, the size of a binary + segment must be a multiple of 8 bits (i.e. only whole bytes). + Example:</p> + <code type="none"><![CDATA[ +<<Bin/binary,Bitstring/bitstring>>]]></code> + <p>The variable <c>Bin</c> must contain a whole number of bytes, + because the <c>binary</c> type defaults to <c>unit:8</c>. + A <c>badarg</c> exception will be generated if <c>Bin</c> would + consist of (for instance) 17 bits.</p> + + <p>On the other hand, the variable <c>Bitstring</c> may consist of + any number of bits, for instance 0, 1, 8, 11, 17, 42, and so on, + because the default <c>unit</c> for bitstrings is 1.</p> + + <warning><p>For clarity, it is recommended not to change the unit + size for binaries, but to use <c>binary</c> when you need byte + alignment, and <c>bitstring</c> when you need bit alignment.</p></warning> + + <p>The following example</p> + <code type="none"><![CDATA[ +<<X:1,Y:6>>]]></code> + <p>will successfully construct a bitstring of 7 bits. + (Provided that all of X and Y are integers.)</p> + <p>As noted earlier, segments have the following general syntax:</p> + <p><c>Value:Size/TypeSpecifierList</c></p> + <p>When constructing binaries, <c>Value</c> and <c>Size</c> can be + any Erlang expression. However, for syntactical reasons, both + <c>Value</c> and <c>Size</c> must be enclosed in parenthesis if + the expression consists of anything more than a single literal + or variable. The following gives a compiler syntax error:</p> + <code type="none"><![CDATA[ +<<X+1:8>>]]></code> + <p>This expression must be rewritten to</p> + <code type="none"><![CDATA[ +<<(X+1):8>>]]></code> + <p>in order to be accepted by the compiler.</p> + + <section> + <title>Including Literal Strings</title> + <p>As syntactic sugar, an literal string may be written instead + of a element.</p> + <code type="none"><![CDATA[ +<<"hello">>]]></code> + <p>which is syntactic sugar for</p> + <code type="none"><![CDATA[ +<<$h,$e,$l,$l,$o>>]]></code> + </section> + </section> + + <section> + <title>Matching Binaries</title> + <p>This section describes the rules for matching binaries using + the bit syntax.</p> + <p>There can be zero or more segments in a binary pattern. + A binary pattern can occur in every place patterns are allowed, + also inside other patterns. Binary patterns cannot be nested.</p> + <p>The pattern '<c><![CDATA[<<>>]]></c>' matches a zero length binary.</p> + <p>Each segment in a binary can consist of zero or more bits.</p> + <p>A segment of type <c>binary</c> must have a size evenly + divisible by 8 (or divisible by the unit size, if the unit size has been changed).</p> + <p>A segment of type <c>bitstring</c> has no restrictions on the size.</p> + <p>As noted earlier, segments have the following general syntax:</p> + <p><c>Value:Size/TypeSpecifierList</c></p> + <p>When matching <c>Value</c> value must be either a variable or + an integer or floating point literal. Expressions are not + allowed.</p> + <p><c>Size</c> must be an integer literal, or a previously bound + variable. Note that the following is not allowed:</p> + <code type="none"><![CDATA[ +foo(N, <<X:N,T/binary>>) -> + {X,T}.]]></code> + <p>The two occurrences of <c>N</c> are not related. The compiler + will complain that the <c>N</c> in the size field is unbound.</p> + <p>The correct way to write this example is like this:</p> + <code type="none"><![CDATA[ +foo(N, Bin) -> + <<X:N,T/binary>> = Bin, + {X,T}.]]></code> + + <section> + <title>Getting the Rest of the Binary or Bitstring</title> + <p>To match out the rest of a binary, specify a binary field + without size:</p> + <code type="none"><![CDATA[ +foo(<<A:8,Rest/binary>>) ->]]></code> + <p>The size of the tail must be evenly divisible by 8.</p> + + <p>To match out the rest of a bitstring, specify a field + without size:</p> + <code type="none"><![CDATA[ +foo(<<A:8,Rest/bitstring>>) ->]]></code> + <p>There is no restriction on the number of bits in the tail.</p> + </section> + </section> + + <section> + <title>Appending to a Binary</title> + <p>In R12B, the following function for creating a binary out of + a list of triples of integers is now efficient:</p> + <code type="none"><![CDATA[ +triples_to_bin(T) -> + triples_to_bin(T, <<>>). + +triples_to_bin([{X,Y,Z} | T], Acc) -> + triples_to_bin(T, <<Acc/binary,X:32,Y:32,Z:32>>); % inefficient before R12B +triples_to_bin([], Acc) -> + Acc.]]></code> + <p>In previous releases, this function was highly inefficient, because + the binary constructed so far (<c>Acc</c>) was copied in each recursion step. + That is no longer the case. See the Efficiency Guide for more information.</p> + </section> +</chapter> + |