The Abstract Format

20012015 Ericsson AB. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. The Abstract Format Arndt Jonasson Kenneth Lundin 1 Jultomten 00-12-01 A absform.xml

This document describes the standard representation of parse trees for Erlang programs as Erlang terms. This representation is known as the abstract format. Functions dealing with such parse trees are and functions in the modules , , , , , and . They are also used as input and output for parse transforms (see the module ).

We use the function to denote the mapping from an Erlang source construct to its abstract format representation , and write .

The word below represents an integer, and denotes the number of the line in the source file where the construction occurred. Several instances of in the same construction may denote different lines.

Since operators are not terms in their own right, when operators are mentioned below, the representation of an operator should be taken to be the atom with a printname consisting of the same characters as the operator.

Module Declarations and Forms

A module declaration consists of a sequence of forms that are either function declarations or attributes.

If D is a module declaration consisting of the forms , ..., , then Rep(D) = . If F is an attribute , then Rep(F) = . If F is an attribute , then Rep(F) = . If F is an attribute , then Rep(F) = . If F is an attribute , then Rep(F) = . If F is an attribute , then Rep(F) = . If F is an attribute , then Rep(F) = . If F is an attribute , then Rep(F) = . If F is an attribute , then Rep(F) = . If F is a record declaration -record(Name,{V_1, ..., V_k}), then Rep(F) = {attribute,LINE,record,{Name,[Rep(V_1), ..., Rep(V_k)]}}. For Rep(V), see below. If F is a type declaration -Type Name(V_1, ..., V_k) :: T, where Type is either the atom type or the atom opaque, each V_i is a variable, and T is a type, then Rep(F) = {attribute,LINE,Type,{Name,Rep(T),[Rep(V_1), ..., Rep(V_k)]}}. If F is a function specification -Spec Name Ft_1; ...; Ft_k, where Spec is either the atom spec or the atom callback, and each Ft_i is a possibly constrained function type with an argument sequence of the same length Arity, then Rep(F) = {attribute,Line,Spec,{{Name,Arity},[Rep(Ft_1), ..., Rep(Ft_k)]}}. If F is a function specification -spec Mod:Name Ft_1; ...; Ft_k, where each Ft_i is a possibly constrained function type with an argument sequence of the same length Arity, then Rep(F) = {attribute,Line,spec,{{Mod,Name,Arity},[Rep(Ft_1), ..., Rep(Ft_k)]}}. If F is a wild attribute , then Rep(F) = .

If F is a function declaration Name Fc_1 ; ... ; Name Fc_k, where each Fc_i is a function clause with a pattern sequence of the same length Arity, then Rep(F) = {function,LINE,Name,Arity,[Rep(Fc_1), ...,Rep(Fc_k)]}.

Record Fields

Each field in a record declaration may have an optional explicit default initializer expression, as well as an optional type.

If V is , then Rep(V) = . If V is , where E is an expression, then Rep(V) = . If V is A :: T, where T is a type and it does not contain undefined syntactically, then Rep(V) = {typed_record_field,{record_field,LINE,Rep(A)},Rep(undefined | T)}. If V is A :: T, where T is a type, then Rep(V) = {typed_record_field,{record_field,LINE,Rep(A)},Rep(T)}. If V is A = E :: T, where E is an expression and T is a type, then Rep(V) = {typed_record_field,{record_field,LINE,Rep(A),Rep(E)},Rep(T)}.

Representation of Parse Errors and End-of-file

In addition to the representations of forms, the list that represents a module declaration (as returned by functions in erl_parse and epp) may contain tuples {error,E} and {warning,W}, denoting syntactically incorrect forms and warnings, and {eof,LINE}, denoting an end-of-stream encountered before a complete form had been parsed.

Atomic Literals

There are five kinds of atomic literals, which are represented in the same way in patterns, expressions and guards:

If L is an integer or character literal, then Rep(L) = . If L is a float literal, then Rep(L) = . If L is a string literal consisting of the characters , ..., , then Rep(L) = . If L is an atom literal, then Rep(L) = .

Note that negative integer and float literals do not occur as such; they are parsed as an application of the unary negation operator.

Patterns

If is a sequence of patterns , then Rep(Ps) = . Such sequences occur as the list of arguments to a function or fun.

Individual patterns are represented as follows:

If P is an atomic literal L, then Rep(P) = Rep(L). If P is a compound pattern , then Rep(P) = . If P is a variable pattern , then Rep(P) = , where A is an atom with a printname consisting of the same characters as . If P is a universal pattern , then Rep(P) = . If P is a tuple pattern , then Rep(P) = . If P is a nil pattern , then Rep(P) = . If P is a cons pattern , then Rep(P) = . If E is a binary pattern >]]>, then Rep(E) = . For Rep(TSL), see below. An omitted is represented by . An omitted (type specifier list) is represented by . If P is , where is a binary operator (this is either an occurrence of applied to a literal string or character list, or an occurrence of an expression that can be evaluated to a number at compile time), then Rep(P) = . If P is , where is a unary operator (this is an occurrence of an expression that can be evaluated to a number at compile time), then Rep(P) = . If P is a record pattern , then Rep(P) = . If P is , then Rep(P) = . If P is , then Rep(P) = , that is, patterns cannot be distinguished from their bodies.

Note that every pattern has the same source form as some expression, and is represented the same way as the corresponding expression.

Expressions

A body B is a sequence of expressions , and Rep(B) = .

An expression E is one of the following alternatives:

If P is an atomic literal , then Rep(P) = Rep(L). If E is , then Rep(E) = . If E is a variable , then Rep(E) = , where is an atom with a printname consisting of the same characters as . If E is a tuple skeleton , then Rep(E) = . If E is , then Rep(E) = . If E is a cons skeleton , then Rep(E) = . If E is a binary constructor >]]>, then Rep(E) = . For Rep(TSL), see below. An omitted is represented by . An omitted (type specifier list) is represented by . If E is , where is a binary operator, then Rep(E) = . If E is , where is a unary operator, then Rep(E) = . If E is , then Rep(E) = . If E is , then Rep(E) = . If E is , then Rep(E) = . If E is , then Rep(E) = . If E is where each is a map assoc or exact field, then Rep(E) = . For Rep(W), see below. If E is where is a map assoc or exact field, then Rep(E) = . For Rep(W), see below. If E is , then Rep(E) = . If E is , then Rep(E) = . If E is , then Rep(E) = . If E is a list comprehension , where each is a generator or a filter, then Rep(E) = . For Rep(W), see below. If E is a binary comprehension >]]>, where each is a generator or a filter, then Rep(E) = . For Rep(W), see below. If E is , where is a body, then Rep(E) = . If E is , where each is an if clause then Rep(E) = . If E is , where is an expression and each is a case clause then Rep(E) = . If E is , where is a body and each is a catch clause then Rep(E) = . If E is , where is a body, each is a case clause and each is a catch clause then Rep(E) = . If E is , where and are bodies then Rep(E) = . If E is , where and are a bodies and each is a case clause then Rep(E) = . If E is , where and are bodies and each is a catch clause then Rep(E) = . If E is , where and are a bodies, each is a case clause and each is a catch clause then Rep(E) = . If E is , where each is a case clause then Rep(E) = . If E is B_t end]]>, where each is a case clause, is an expression and is a body, then Rep(E) = . If E is , then Rep(E) = . If E is , then Rep(E) = . (Before the R15 release: Rep(E) = .) If E is where each is a function clause then Rep(E) = . If E is where is a variable and each is a function clause then Rep(E) = . If E is , then Rep(E) = Rep(E_0), that is, parenthesized expressions cannot be distinguished from their bodies.

Generators and Filters

When W is a generator or a filter (in the body of a list or binary comprehension), then:

If W is a generator , where is a pattern and is an expression, then Rep(W) = . If W is a generator , where is a pattern and is an expression, then Rep(W) = . If W is a filter , which is an expression, then Rep(W) = .

Binary Element Type Specifiers

A type specifier list TSL for a binary element is a sequence of type specifiers . Rep(TSL) = .

When TS is a type specifier for a binary element, then:

If TS is an atom , then Rep(TS) = . If TS is a couple where is an atom and is an integer, then Rep(TS) = {A,Value}.

Map Assoc and Exact Fields

When W is an assoc or exact field (in the body of a map), then:

If W is an assoc field V]]>, where and are both expressions, then Rep(W) = . If W is an exact field , where and are both expressions, then Rep(W) = .

Clauses

There are function clauses, if clauses, case clauses and catch clauses.

A clause is one of the following alternatives:

If C is a function clause B]]> where is a pattern sequence and is a body, then Rep(C) = . If C is a function clause B]]> where is a pattern sequence, is a guard sequence and is a body, then Rep(C) = . If C is an if clause B]]> where is a guard sequence and is a body, then Rep(C) = . If C is a case clause B]]> where is a pattern and is a body, then Rep(C) = . If C is a case clause B]]> where is a pattern, is a guard sequence and is a body, then Rep(C) = . If C is a catch clause B]]> where is a pattern and is a body, then Rep(C) = . If C is a catch clause B]]> where is an atomic literal or a variable pattern, is a pattern and is a body, then Rep(C) = . If C is a catch clause B]]> where is a pattern, is a guard sequence and is a body, then Rep(C) = . If C is a catch clause B]]> where is an atomic literal or a variable pattern, is a pattern, is a guard sequence and is a body, then Rep(C) = .

Guards

A guard sequence Gs is a sequence of guards , and Rep(Gs) = . If the guard sequence is empty, Rep(Gs) = .

A guard G is a nonempty sequence of guard tests , and Rep(G) = .

A guard test is one of the following alternatives:

If Gt is an atomic literal L, then Rep(Gt) = Rep(L). If Gt is a variable pattern , then Rep(Gt) = , where A is an atom with a printname consisting of the same characters as . If Gt is a tuple skeleton , then Rep(Gt) = . If Gt is , then Rep(Gt) = . If Gt is a cons skeleton , then Rep(Gt) = . If Gt is a binary constructor >]]>, then Rep(Gt) = . For Rep(TSL), see above. An omitted is represented by . An omitted (type specifier list) is represented by . If Gt is , where is a binary operator, then Rep(Gt) = . If Gt is , where is a unary operator, then Rep(Gt) = . If Gt is , then Rep(E) = . If Gt is , then Rep(Gt) = . If Gt is , then Rep(Gt) = . If Gt is , where is an atom, then Rep(Gt) = . If Gt is , where is the atom and is an atom or an operator, then Rep(Gt) = . If Gt is , where is the atom and is an atom or an operator, then Rep(Gt) = . If Gt is , then Rep(Gt) = , that is, parenthesized guard tests cannot be distinguished from their bodies.

Note that every guard test has the same source form as some expression, and is represented the same way as the corresponding expression.

Types If T is an annotated type Anno :: Type, where Anno is a variable and Type is a type, then Rep(T) = {ann_type,LINE,[Rep(Anno),Rep(Type)]}. If T is an atom or integer literal L, then Rep(T) = Rep(L). If T is L Op R, where Op is a binary operator and L and R are types (this is an occurrence of an expression that can be evaluated to an integer at compile time), then Rep(T) = {op,LINE,Op,Rep(L),Rep(R)}. If T is Op A, where Op is a unary operator and A is a type (this is an occurrence of an expression that can be evaluated to an integer at compile time), then Rep(T) = {op,LINE,Op,Rep(A)}. If T is a bitstring type <<_:M,_:_*N>>, where M and N are singleton integer types, then Rep(T) = {type,LINE,binary,[Rep(M),Rep(N)]}. If T is the empty list type [], then Rep(T) = {type,Line,nil,[]}. If T is a fun type fun(), then Rep(T) = {type,LINE,'fun',[]}. If T is a fun type fun((...) -> B), where B is a type, then Rep(T) = {type,LINE,'fun',[{type,LINE,any},Rep(B)]}. If T is a fun type fun(Ft), where Ft is a function type, then Rep(T) = Rep(Ft). If T is an integer range type L .. H, where L and H are singleton integer types, then Rep(T) = {type,LINE,range,[Rep(L),Rep(H)]}. If T is a map type map(), then Rep(T) = {type,LINE,map,any}. If T is a map type #{P_1, ..., P_k}, where each P_i is a map pair type, then Rep(T) = {type,LINE,map,[Rep(P_1), ..., Rep(P_k)]}. If T is a map pair type K => V, where K and V are types, then Rep(T) = {type,LINE,map_field_assoc,[Rep(K),Rep(V)]}. If T is a predefined (or built-in) type N(A_1, ..., A_k), where each A_i is a type, then Rep(T) = {type,LINE,N,[Rep(A_1), ..., Rep(A_k)]}. If T is a record type #Name{F_1, ..., F_k}, where each F_i is a record field type, then Rep(T) = {type,LINE,record,[Rep(Name),Rep(F_1), ..., Rep(F_k)]}. If T is a record field type Name :: Type, where Type is a type, then Rep(T) = {type,LINE,field_type,[Rep(Name),Rep(Type)]}. If T is a remote type M:N(A_1, ..., A_k), where each A_i is a type, then Rep(T) = {remote_type,LINE,[Rep(M),Rep(N),[Rep(A_1), ..., Rep(A_k)]]}. If T is a tuple type tuple(), then Rep(T) = {type,LINE,tuple,any}. If T is a tuple type {A_1, ..., A_k}, where each A_i is a type, then Rep(T) = {type,LINE,tuple,[Rep(A_1), ..., Rep(A_k)]}. If T is a type union T_1 | ... | T_k, where each T_i is a type, then Rep(T) = {type,LINE,union,[Rep(T_1), ..., Rep(T_k)]}. If T is a type variable V, then Rep(T) = {var,LINE,A}, where A is an atom with a printname consisting of the same characters as V. A type variable is any variable except underscore (_). If T is a user-defined type N(A_1, ..., A_k), where each A_i is a type, then Rep(T) = {user_type,LINE,N,[Rep(A_1), ..., Rep(A_k)]}. If T is ( T_0 ), then Rep(T) = Rep(T_0), that is, parenthesized types cannot be distinguished from their bodies.

Function Types If Ft is a constrained function type Ft_1 when Fc, where Ft_1 is a function type and Fc is a function constraint, then Rep(T) = {type,LINE,bounded_fun,[Rep(Ft_1),Rep(Fc)]}. If Ft is a function type (A_1, ..., A_n) -> B, where each A_i and B are types, then Rep(Ft) = {type,LINE,'fun',[{type,LINE,product,[Rep(A_1), ..., Rep(A_n)]},Rep(B)]}.

Function Constraints

A function constraint Fc is a nonempty sequence of constraints C_1, ..., C_k, and Rep(Fc) = [Rep(C_1), ..., Rep(C_k)].

If C is a constraint is_subtype(V, T) or V :: T, where V is a type variable and T is a type, then Rep(C) = {type,LINE,constraint,[Rep(F),[Rep(V),Rep(T)]]}.

The Abstract Format After Preprocessing

The compilation option can be given to the compiler to have the abstract code stored in the chunk in the BEAM file (for debugging purposes).

In OTP R9C and later, the chunk will contain

where is the abstract code as described in this document.

In releases of OTP prior to R9C, the abstract code after some more processing was stored in the BEAM file. The first element of the tuple would be either (R7B) or (R8B).