Match Specifications in Erlang

19992018 Ericsson AB. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. Match Specifications in Erlang Patrik Nyblom 1999-06-01 PA1 match_spec.xml

A "match specification" (match_spec) is an Erlang term describing a small "program" that tries to match something. It can be used to either control tracing with erlang:trace_pattern/3 or to search for objects in an ETS table with for example ets:select/2. The match specification in many ways works like a small function in Erlang, but is interpreted/compiled by the Erlang runtime system to something much more efficient than calling an Erlang function. The match specification is also very limited compared to the expressiveness of real Erlang functions.

The most notable difference between a match specification and an Erlang fun is the syntax. Match specifications are Erlang terms, not Erlang code. Also, a match specification has a strange concept of exceptions:

An exception (such as ) in the part, which resembles an Erlang guard, generates immediate failure.

An exception in the part, which resembles the body of an Erlang function, is implicitly caught and results in the single atom .

Grammar

A match specification used in tracing can be described in the following informal grammar:

MatchExpression ::= [ MatchFunction, ... ] MatchFunction ::= { MatchHead, MatchConditions, MatchBody } MatchHead ::= MatchVariable | | [ MatchHeadPart, ... ] MatchHeadPart ::= term() | MatchVariable | MatchVariable ::= '$<number>' MatchConditions ::= [ MatchCondition, ...] | MatchCondition ::= { GuardFunction } | { GuardFunction, ConditionExpression, ... } BoolFunction ::= | | | | | | | | | | | | | | | | | | | | ConditionExpression ::= ExprMatchVariable | { GuardFunction } | { GuardFunction, ConditionExpression, ... } | TermConstruct ExprMatchVariable ::= MatchVariable (bound in the MatchHead) | | TermConstruct = {{}} | {{ ConditionExpression, ... }} | | [ConditionExpression, ...] | | #{term() => ConditionExpression, ...} | NonCompositeTerm | Constant NonCompositeTerm ::= term() (not list or tuple or map) Constant ::= {, term()} GuardFunction ::= BoolFunction | | | | | | | | | | | | | | | | | | | | | | | ']]> | =']]> | | | | | | | | MatchBody ::= [ ActionTerm ] ActionTerm ::= ConditionExpression | ActionCall ActionCall ::= {ActionFunction} | {ActionFunction, ActionTerm, ...} ActionFunction ::= | | | | | | | | | | | |

A match specification used in ets(3) can be described in the following informal grammar:

MatchExpression ::= [ MatchFunction, ... ] MatchFunction ::= { MatchHead, MatchConditions, MatchBody } MatchHead ::= MatchVariable | | { MatchHeadPart, ... } MatchHeadPart ::= term() | MatchVariable | MatchVariable ::= '$<number>' MatchConditions ::= [ MatchCondition, ...] | MatchCondition ::= { GuardFunction } | { GuardFunction, ConditionExpression, ... } BoolFunction ::= | | | | | | | | | | | | | | | | | | | ConditionExpression ::= ExprMatchVariable | { GuardFunction } | { GuardFunction, ConditionExpression, ... } | TermConstruct ExprMatchVariable ::= MatchVariable (bound in the MatchHead) | | TermConstruct = {{}} | {{ ConditionExpression, ... }} | | [ConditionExpression, ...] | #{} | #{term() => ConditionExpression, ...} | NonCompositeTerm | Constant NonCompositeTerm ::= term() (not list or tuple or map) Constant ::= {, term()} GuardFunction ::= BoolFunction | | | | | | | | | | | | | | | | | | | | | | | ']]> | =']]> | | | | | | | MatchBody ::= [ ConditionExpression, ... ]

Function Descriptions

Functions Allowed in All Types of Match Specifications

The functions allowed in work as follows:

is_atom, is_float, is_integer, is_list, is_number, is_pid, is_port, is_reference, is_tuple, is_map, is_binary, is_function

Same as the corresponding guard tests in Erlang, return or .

is_record

Takes an additional parameter, which must be the result of )]]>, like in .

'not'

Negates its single argument (anything other than gives ).

'and'

Returns if all its arguments (variable length argument list) evaluate to , otherwise . Evaluation order is undefined.

'or'

Returns if any of its arguments evaluates to . Variable length argument list. Evaluation order is undefined.

'andalso'

Works as , but quits evaluating its arguments when one argument evaluates to something else than true. Arguments are evaluated left to right.

'orelse'

Works as , but quits evaluating as soon as one of its arguments evaluates to . Arguments are evaluated left to right.

'xor'

Only two arguments, of which one must be true and the other false to return ; otherwise returns false.

abs, element, hd, length, node, round, size, tl, trunc, '+', '-', '*', 'div', 'rem', 'band', 'bor', 'bxor', 'bnot', 'bsl', 'bsr', '>', '>=', '<', '=<', '=:=', '==', '=/=', '/=', self

Same as the corresponding Erlang BIFs (or operators). In case of bad arguments, the result depends on the context. In the part of the expression, the test fails immediately (like in an Erlang guard). In the part, exceptions are implicitly caught and the call results in the atom .

Functions Allowed Only for Tracing

The functions allowed only for tracing work as follows:

is_seq_trace

Returns if a sequential trace token is set for the current process, otherwise .

set_seq_token

Works as , but returns on success, and on error or bad argument. Only allowed in the part and only allowed when tracing.

get_seq_token

Same as and only allowed in the part when tracing.

message

Sets an additional message appended to the trace message sent. One can only set one additional message in the body. Later calls replace the appended message.

As a special case, disables sending of trace messages ('call' and 'return_to') for this function call, just like if the match specification had not matched. This can be useful if only the side effects of the part are desired.

Another special case is , which sets the default behavior, as if the function had no match specification; trace message is sent with no extra information (if no other calls to are placed before , it is in fact a "noop").

Takes one argument: the message. Returns and can only be used in the part and when tracing.

return_trace

Causes a trace message to be sent upon return from the current function. Takes no arguments, returns and can only be used in the part when tracing. If the process trace flag is active, the trace message is inhibited.

Warning: If the traced function is tail-recursive, this match specification function destroys that property. Hence, if a match specification executing this function is used on a perpetual server process, it can only be active for a limited period of time, or the emulator will eventually use all memory in the host machine and crash. If this match specification function is inhibited using process trace flag , tail-recursiveness still remains.

exception_trace

Works as return_trace plus; if the traced function exits because of an exception, an trace message is generated, regardless of the exception is caught or not.

process_dump

Returns some textual information about the current process as a binary. Takes no arguments and is only allowed in the part when tracing.

enable_trace

With one parameter this function turns on tracing like the Erlang call , where is the parameter to .

With two parameters, the first parameter is to be either a process identifier or the registered name of a process. In this case tracing is turned on for the designated process in the same way as in the Erlang call , where P1 is the first and P2 is the second argument. The process gets its trace messages sent to the same tracer as the process executing the statement uses. cannot be one of the atoms , or (unless they are registered names). cannot be or .

Returns and can only be used in the part when tracing.

disable_trace

With one parameter this function disables tracing like the Erlang call , where is the parameter to .

With two parameters this function works as the Erlang call , where P1 can be either a process identifier or a registered name and is specified as the first argument to the match specification function. cannot be or .

Returns and can only be used in the part when tracing.

trace

With two parameters this function takes a list of trace flags to disable as first parameter and a list of trace flags to enable as second parameter. Logically, the disable list is applied first, but effectively all changes are applied atomically. The trace flags are the same as for , not including , but including .

If a tracer is specified in both lists, the tracer in the enable list takes precedence. If no tracer is specified, the same tracer as the process executing the match specification is used (not the meta tracer). If that process doesn't have tracer either, then trace flags are ignored.

When using a tracer module, the module must be loaded before the match specification is executed. If it is not loaded, the match fails.

With three parameters to this function, the first is either a process identifier or the registered name of a process to set trace flags on, the second is the disable list, and the third is the enable list.

Returns if any trace property was changed for the trace target process, otherwise . Can only be used in the part when tracing.

caller

Returns the calling function as a tuple {Module, Function, Arity} or the atom if the calling function cannot be determined. Can only be used in the part when tracing.

Notice that if a "technically built in function" (that is, a function not written in Erlang) is traced, the function sometimes returns the atom . The calling Erlang function is not available during such calls.

display

For debugging purposes only. Displays the single argument as an Erlang term on stdout, which is seldom what is wanted. Returns and can only be used in the part when tracing.

get_tcw

Takes no argument and returns the value of the node's trace control word. The same is done by .

The trace control word is a 32-bit unsigned integer intended for generic trace control. The trace control word can be tested and set both from within trace match specifications and with BIFs. This call is only allowed when tracing.

set_tcw

Takes one unsigned integer argument, sets the value of the node's trace control word to the value of the argument, and returns the previous value. The same is done by . It is only allowed to use in the part when tracing.

silent

Takes one argument. If the argument is , the call trace message mode for the current process is set to silent for this call and all later calls, that is, call trace messages are inhibited even if is called in the part for a traced function.

This mode can also be activated with flag to .

If the argument is , the call trace message mode for the current process is set to normal (non-silent) for this call and all later calls.

If the argument is not or , the call trace message mode is unaffected.

All "function calls" must be tuples, even if they take no arguments. The value of is the atom() , but the value of is the pid() of the current process.

Match target

Each execution of a match specification is done against a match target term. The format and content of the target term depends on the context in which the match is done. The match target for ETS is always a full table tuple. The match target for call trace is always a list of all function arguments. The match target for event trace depends on the event type, see table below.

Context Type Match target Description ETS {Key, Value1, Value2, ...} A table object Trace call [Arg1, Arg2, ...] Function arguments Trace send [Receiver, Message] Receiving process/port and message term Trace 'receive' [Node, Sender, Message] Sending node, process/port and message term Match target depending on context

Variables and Literals

Variables take the form ']]>, where ]]> is an integer between 0 and 100,000,000 (1e+8). The behavior if the number is outside these limits is undefined. In the part, the special variable matches anything, and never gets bound (like in Erlang).

In the parts, no unbound variables are allowed, so is interpreted as itself (an atom). Variables can only be bound in the part.

In the and parts, only variables bound previously can be used.

As a special case, the following apply in the parts:

The variable expands to the whole match target term.

The variable expands to a list of the values of all bound variables in order (that is, ).

In the part, all literals (except the variables above) are interpreted "as is".

In the parts, the interpretation is in some ways different. Literals in these parts can either be written "as is", which works for all literals except tuples, or by using the special form , where is any Erlang term.

For tuple literals in the match specification, double tuple parentheses can also be used, that is, construct them as a tuple of arity one containing a single tuple, which is the one to be constructed. The "double tuple parenthesis" syntax is useful to construct tuples from already bound variables, like in . Examples:

Expression Variable Bindings Result {{'$1','$2'}} '$1' = a, '$2' = b {a,b} {const, {'$1', '$2'}} Irrelevant {'$1', '$2'} a Irrelevant a '$1' '$1' = [] [] ['$1'] '$1' = [] [[]] [{{a}}] Irrelevant [{a}] 42 Irrelevant 42 "hello" Irrelevant "hello" $1 Irrelevant 49 (the ASCII value for character '1') Literals in MatchCondition/MatchBody Parts of a Match Specification

Execution of the Match

The execution of the match expression, when the runtime system decides whether a trace message is to be sent, is as follows:

For each tuple in the list and while no match has succeeded:

Match the part against the match target term, binding the ']]> variables (much like in ). If the part cannot match the arguments, the match fails.

Evaluate each (where only ']]> variables previously bound in the part can occur) and expect it to return the atom . When a condition does not evaluate to , the match fails. If any BIF call generates an exception, the match also fails.

Two cases can occur:

If the match specification is executing when tracing:

Evaluate each in the same way as the , but ignore the return values. Regardless of what happens in this part, the match has succeeded.

If the match specification is executed when selecting objects from an ETS table:

Evaluate the expressions in order and return the value of the last expression (typically there is only one expression in this context).

Differences between Match Specifications in ETS and Tracing

ETS match specifications produce a return value. Usually the contains one single that defines the return value without any side effects. Calls with side effects are not allowed in the ETS context.

When tracing there is no return value to produce, the match specification either matches or does not. The effect when the expression matches is a trace message rather than a returned term. The s are executed as in an imperative language, that is, for their side effects. Functions with side effects are also allowed when tracing.

Tracing Examples

Match an argument list of three, where the first and third arguments are equal:

Match an argument list of three, where the second argument is a number > 3:

', '$1', 3}],
  []}]
    ]]>

Match an argument list of three, where the third argument is either a tuple containing argument one and two, or a list beginning with argument one and two (that is, or ):

The above problem can also be solved as follows:

Match two arguments, where the first is a tuple beginning with a list that in turn begins with the second argument times two (that is, [{[4,x],y},2] or [{[8], y, z},4]):

Match three arguments. When all three are equal and are numbers, append the process dump to the trace message, otherwise let the trace message be "as is", but set the sequential trace token label to 4711:

As can be noted above, the parameter list can be matched against a single or an . To replace the whole parameter list with a single variable is a special case. In all other cases the must be a proper list.

Generate a trace message only if the trace control word is set to 1:

Generate a trace message only if there is a seq_trace token:

Remove the 'silent' trace flag when the first argument is 'verbose', and add it when it is 'silent':

Add a return_trace message if the function is of arity 3:

Generate a trace message only if the function is of arity 3 and the first argument is 'trace':

ETS Examples

Match all objects in an ETS table, where the first element is the atom 'strider' and the tuple arity is 3, and return the whole object:

Match all objects in an ETS table with arity > 1 and the first element is 'gandalf', and return element 2:

=',{size, '$1'},2}],
  [{element,2,'$1'}]}]
    ]]>

In this example, if the first element had been the key, it is much more efficient to match that key in the part than in the part. The search space of the tables is restricted with regards to the so that only objects with the matching key are searched.

Match tuples of three elements, where the second element is either 'merry' or 'pippin', and return the whole objects:

Function ets:test_ms/2> can be useful for testing complicated ETS matches.