Xref is a cross reference tool that can be used for finding
dependencies between functions, modules, applications and
releases.
Calls between functions are either
local calls like f(), or
external calls like
m:f().
Module data,
which are extracted from BEAM files, include local functions,
exported functions, local calls and external calls. By default,
calls to built-in functions () are ignored, but
if the option builtins, accepted by some of this
module's functions, is set to true, calls to BIFs
are included as well. It is the analyzing OTP version that
decides what functions are BIFs. Functional objects are assumed
to be called where they are created (and nowhere else).
Unresolved calls are calls to
apply or spawn with variable module, variable
function, or variable arguments. Examples are M:F(a),
apply(M, f, [a]), and
spawn(m, f(), Args). Unresolved calls are
represented by calls where variable modules have been replaced
with the atom '$M_EXPR', variable functions have been
replaced with the atom '$F_EXPR', and variable number of
arguments have been replaced with the number -1. The
above mentioned examples are represented by calls to
'$M_EXPR':'$F_EXPR'/1, '$M_EXPR':f/1, and
m:'$F_EXPR'/-1. The unresolved calls are a subset of the
external calls.
Unresolved calls make module data incomplete, which
implies that the results of analyses may be invalid.
Applications are collections of modules. The
modules' BEAM files are located in the ebin
subdirectory of the application directory. The name of the
application directory determines the name and version of the
application.
Releases are collections of applications
located in the lib subdirectory of the release directory.
There is more to read about applications and releases in the
Design Principles book.
Xref servers are identified
by names, supplied when creating new servers. Each Xref server
holds a set of releases, a set of applications, and a set of
modules with module data. Xref servers are independent of each
other, and all analyses are evaluated in the context of one
single Xref server (exceptions are the functions m/1 and
d/1 which do not use servers at all). The
mode of an Xref server determines what module
data are extracted from BEAM files as modules are added to the
server. Starting with R7, BEAM files compiled with the option
debug_info contain so called
debug information, which is an abstract
representation of the code. In functions mode, which is
the default mode, function calls and line numbers are extracted
from debug information. In modules mode, debug
information is ignored if present, but dependencies between
modules are extracted from other parts of the BEAM files. The
modules mode is significantly less time and space
consuming than the functions mode, but the analyses that
can be done are limited.
An
analyzed module is a
module that has been added to an Xref server together with its
module data.
A
library module is a
module located in some directory mentioned in the
library path.
A library module is said to be used if some of its exported
functions are used by some analyzed module.
An
unknown module is a
module that is neither an analyzed module nor a library module,
but whose exported functions are used by some analyzed module.
An
unknown function is a
used function that is neither local or exported by any
analyzed module nor exported by any library module.
An
undefined function is an externally used function that
is not exported by any analyzed module or library module. With
this notion, a local function can be an undefined function, namely
if it is externally used from some module. All unknown functions
are also undefined functions; there is a figure in the
User's Guide that illustrates this relationship.
Starting with R9C, the module attribute tag deprecated
can be used to inform Xref about
deprecated functions and
optionally when functions are planned to be removed. A few
examples show the idea:
-deprecated({f,1}).
- The exported function f/1 is deprecated. Nothing is
said whether f/1 will be removed or not.
-deprecated({f,'_'}).
- All exported functions f/0, f/1 and so on are
deprecated.
-deprecated(module).
- All exported functions in the module are deprecated.
Equivalent to -deprecated({'_','_'})..
-deprecated([{g,1,next_version}]).
- The function g/1 is deprecated and will be
removed in next version.
-deprecated([{g,2,next_major_release}]).
- The function g/2 is deprecated and will be
removed in next major release.
-deprecated([{g,3,eventually}]).
- The function g/3 is deprecated and will
eventually be removed.
-deprecated({'_','_',eventually}).
- All exported functions in the module are deprecated and
will eventually be removed.
Before any analysis can take place, module data must be set up. For instance, the cross reference and the unknown
functions are computed when all module data are known. The
functions that need complete data (analyze, q,
variables) take care of setting up data automatically.
Module data need to be set up (again) after calls to any of the
add, replace, remove,
set_library_path or update functions.
The result of setting up module data is the
Call Graph. A (directed) graph
consists of a set of vertices and a set of (directed) edges. The
edges represent
calls (From, To)
between functions, modules, applications or releases. From is
said to call To, and To is said to be used by From. The vertices
of the Call Graph are the functions of all module data: local
and exported functions of analyzed modules; used BIFs; used
exported functions of library modules; and unknown functions.
The functions module_info/0,1 added by the compiler are
included among the exported functions, but only when called from
some module. The edges are the function calls of all module
data. A consequence of the edges being a set is that there is
only one edge if a function is locally or externally used
several times on one and the same line of code.
The Call Graph is
represented by
Erlang terms (the sets are lists), which is suitable for many
analyses. But for analyses that look at chains of calls, a list
representation is much too
slow. Instead the representation offered by the digraph
module is used. The translation of the list representation of
the Call Graph - or a subgraph thereof - to the digraph
representation does not
come for free, so the language used for expressing queries to be
described below has a special operator for this task and a
possibility to save the digraph representation for
subsequent analyses.
In addition to the Call Graph there is a graph called the
Inter Call Graph. This is
a graph of calls (From, To) such that there is a chain of
calls from From to To in the Call Graph, and every From and To
is an exported function or an unused local function.
The vertices are the same as for the Call Graph.
Calls between modules, applications and releases are also
directed graphs. The
types
of the vertices and edges of these graphs are (ranging from the
most special to the most general):
Fun for functions; Mod for modules;
App for applications; and Rel for releases.
The following paragraphs will describe the different constructs
of the language used for selecting and analyzing parts of the
graphs, beginning with the
constants:
- Expression ::= Constants
- Constants ::= Consts | Consts : Type | RegExpr
- Consts ::= Constant | [Constant, ...]
| {Constant, ...}
- Constant ::= Call | Const
- Call ::= FunSpec -> FunSpec
| {MFA, MFA}
| AtomConst -> AtomConst
| {AtomConst, AtomConst}
- Const ::= AtomConst | FunSpec | MFA
- AtomConst ::= Application | Module | Release
- FunSpec ::= Module : Function / Arity
- MFA ::=
{Module, Function, Arity}
- RegExpr ::= RegString : Type
| RegFunc
| RegFunc : Type
- RegFunc ::= RegModule : RegFunction / RegArity
- RegModule ::= RegAtom
- RegFunction ::= RegAtom
- RegArity ::= RegString | Number | _ | -1
- RegAtom ::= RegString | Atom | _
- RegString ::= - a regular expression, as described in the
re module, enclosed in double quotes -
- Type ::= Fun | Mod | App | Rel
- Function ::= Atom
- Application ::= Atom
- Module ::= Atom
- Release ::= Atom
- Arity ::= Number | -1
- Atom ::= - same as Erlang atoms -
- Number ::= - same as non-negative Erlang integers -
Examples of constants are: kernel, kernel->stdlib,
[kernel, sasl], [pg -> mnesia, {tv, mnesia}] : Mod.
It is an error if an instance of Const does not match any
vertex of any graph.
If there are more than one vertex matching an untyped instance
of AtomConst, then the one of the most general type is
chosen.
A list of constants is interpreted as a set of constants, all of
the same type.
A tuple of constants constitute a chain of calls (which may,
but does not have to, correspond to an actual chain of calls of
some graph).
Assigning a type to a list or tuple of Constant is
equivalent to assigning the type to each Constant.
Regular expressions are used as a
means to select some of the vertices of a graph.
A RegExpr consisting of a RegString and a type -
an example is "xref_.*" : Mod - is interpreted as those
modules (or applications or releases, depending on the type)
that match the expression.
Similarly, a RegFunc is interpreted as those vertices
of the Call Graph that match the expression.
An example is "xref_.*":"add_.*"/"(2|3)", which matches
all add functions of arity two or three of any of the
xref modules.
Another example, one that matches all functions of arity 10 or
more: _:_/"[1-9].+". Here _ is an abbreviation for
".*", that is, the regular expression that matches
anything.
The syntax of
variables is
simple:
- Expression ::= Variable
- Variable ::= - same as Erlang variables -
There are two kinds of variables: predefined variables and user
variables.
Predefined variables
hold set up module data, and cannot be assigned to but only used
in queries.
User variables on the other
hand can be assigned to, and are typically used for
temporary results while evaluating a query, and for keeping
results of queries for use in subsequent queries.
The predefined variables are (variables marked with (*) are
available in functions mode only):
E
- Call Graph Edges (*).
V
- Call Graph Vertices (*).
M
- Modules. All modules: analyzed modules, used library
modules, and unknown modules.
A
- Applications.
R
- Releases.
ME
- Module Edges. All module calls.
AE
- Application Edges. All application calls.
RE
- Release Edges. All release calls.
L
- Local Functions (*). All local functions of analyzed modules.
X
- Exported Functions. All exported functions of analyzed
modules and all used exported functions of library modules.
F
- Functions (*).
B
- Used BIFs. B is empty if builtins is
false for all analyzed modules.
U
- Unknown Functions.
UU
- Unused Functions (*). All local and exported functions of
analyzed modules that have not been used.
XU
- Externally Used Functions. Functions of all modules -
including local functions - that have been used in some
external call.
LU
- Locally Used Functions (*). Functions of all modules that have
been used in some local call.
LC
- Local Calls (*).
XC
- External Calls (*).
AM
- Analyzed Modules.
UM
- Unknown Modules.
LM
- Used Library Modules.
UC
- Unresolved Calls. Empty in modules mode.
EE
- Inter Call Graph Edges (*).
DF
- Deprecated Functions. All deprecated exported
functions and all used deprecated BIFs.
DF_1
- Deprecated Functions. All deprecated functions
to be removed in next version.
DF_2
- Deprecated Functions. All deprecated functions
to be removed in next version or next major release.
DF_3
- Deprecated Functions. All deprecated functions to be
removed in next version, next major release, or later.
These are a few
facts about the
predefined variables (the set operators + (union) and
- (difference) as well as the cast operator
(Type) are described below):
- F is equal to L + X.
- V is equal to X + L + B + U, where X,
L, B and U are pairwise disjoint (that
is, have no elements in common).
- UU is equal to V - (XU + LU), where
LU and XU may have elements in common. Put in
another way:
- V is equal to UU + XU + LU.
- E is equal to LC + XC. Note that LC
and XC may have elements in common, namely if some
function is locally and externally used from one and the same
function.
- U is a subset of XU.
- B is a subset of XU.
- LU is equal to range LC.
- XU is equal to range XC.
- LU is a subset of F.
- UU is a subset of F.
- range UC is a subset of U.
- M is equal to AM + LM + UM, where AM,
LM and UM are pairwise disjoint.
- ME is equal to (Mod) E.
- AE is equal to (App) E.
- RE is equal to (Rel) E.
- (Mod) V is a subset of M. Equality holds
if all analyzed modules have some local, exported, or unknown
function.
- (App) M is a subset of A. Equality holds
if all applications have some module.
- (Rel) A is a subset of R. Equality holds
if all releases have some application.
- DF_1 is a subset of DF_2.
- DF_2 is a subset of DF_3.
- DF_3 is a subset of DF.
- DF is a subset of X + B.
An important notion is that of
conversion of expressions. The syntax of
a cast expression is:
- Expression ::= ( Type ) Expression
The interpretation of the cast operator depends on the named
type Type, the type of Expression, and the
structure of the elements of the interpretation of Expression.
If the named type is equal to the
expression type, no conversion is done. Otherwise, the
conversion is done one step at a time;
(Fun) (App) RE, for instance, is equivalent to
(Fun) (Mod) (App) RE. Now assume that the
interpretation of Expression is a set of constants
(functions, modules, applications or releases). If the named
type is more general than the expression type, say Mod
and Fun respectively, then the interpretation of the cast
expression is the set of modules that have at least one
of their functions mentioned in the interpretation of the
expression. If the named
type is more special than the expression type, say Fun
and Mod, then the interpretation is the set of all the
functions of the modules (in modules mode, the conversion
is partial since the local functions are not known).
The conversions to and from applications and releases
work analogously. For instance, (App) "xref_.*" : Mod
returns all applications containing at least one module
such that xref_ is a prefix of the module name.
Now assume that the interpretation of Expression is a
set of calls. If the named type is more general than the
expression type, say Mod and Fun respectively,
then the interpretation of the cast expression is the set of
calls (M1, M2) such that the interpretation of the
expression contains a call from some function
of M1 to some function of M2. If the named type is more special
than the expression type, say Fun and Mod, then
the interpretation is the set of all function calls
(F1, F2) such that the interpretation of the expression
contains a call (M1, M2) and F1 is
a function of M1 and F2 is a function of M2 (in modules
mode, there are no functions calls, so a cast to Fun
always yields an empty set). Again, the conversions to and from
applications and releases work analogously.
The interpretation of constants and variables are sets, and
those sets can be used as the basis for forming new sets by the
application of
set operators.
The syntax:
- Expression ::= Expression BinarySetOp Expression
- BinarySetOp ::= + | * | -
+, * and - are interpreted as union,
intersection and difference respectively: the union of two sets
contains the elements of both sets; the intersection of two sets
contains the elements common to both sets; and the difference of
two sets contains the elements of the first set that are not
members of the second set. The elements of the two sets must be
of the same structure; for instance, a function call cannot be
combined with a function. But if a cast operator can make the
elements compatible, then the more general elements are
converted to the less general element type. For instance,
M + F is equivalent to
(Fun) M + F, and E - AE
is equivalent to E - (Fun) AE. One more
example: X * xref : Mod is interpreted as the set of
functions exported by the module xref; xref : Mod
is converted to the more special type of X (Fun,
that is) yielding all functions of xref, and the
intersection with X (all functions exported by analyzed
modules and library modules) is interpreted as those functions
that are exported by some module and functions of
xref.
There are also unary set operators:
- Expression ::= UnarySetOp Expression
- UnarySetOp ::= domain | range | strict
Recall that a call is a pair (From, To). domain
applied to a set of calls is interpreted as the set of all
vertices From, and range as the set of all vertices To.
The interpretation of the strict operator is the operand
with all calls on the form (A, A) removed.
The interpretation of the
restriction operators is a
subset of the first operand, a set of calls. The second operand,
a set of vertices, is converted to the type of the first operand.
The syntax of the restriction operators:
- Expression ::= Expression RestrOp Expression
- RestrOp ::= |
- RestrOp ::= ||
- RestrOp ::= |||
The interpretation in some detail for the three operators:
|
- The subset of calls from any of the vertices.
||
- The subset of calls to any of the vertices.
|||
- The subset of calls to and from any of the vertices.
For all sets of calls CS and all sets of vertices
VS, CS ||| VS is equivalent to
CS | VS * CS || VS.
Two functions (modules,
applications, releases) belong to the same strongly connected
component if they call each other (in)directly. The
interpretation of the components operator is the set of
strongly connected components of a set of calls. The
condensation of a set of calls is a new set of calls
between the strongly connected components such that there is an
edge between two components if there is some constant of the first
component that calls some constant of the second component.
The interpretation of the of operator is a chain of
calls of the second operand (a set of calls) that passes throw
all of the vertices of the first operand (a tuple of
constants), in the given order. The second operand
is converted to the type of the first operand.
For instance, the of operator can be used for finding out
whether a function calls another function indirectly, and the
chain of calls demonstrates how. The syntax of the graph
analyzing operators:
- Expression ::= Expression GraphOp Expression
- GraphOp ::= components | condensation | of
As was mentioned before, the graph analyses operate on
the digraph representation of graphs.
By default, the digraph representation is created when
needed (and deleted when no longer used), but it can also be
created explicitly by use of the closure operator:
- Expression ::= ClosureOp Expression
- ClosureOp ::= closure
The interpretation of the closure operator is the
transitive closure of the operand.
The restriction operators are defined for closures as well;
closure E | xref : Mod is
interpreted as the direct or indirect function calls from the
xref module, while the interpretation of
E | xref : Mod is the set of direct
calls from xref.
If some graph is to be used in several graph analyses, it saves
time to assign the digraph representation of the graph
to a user variable,
and then make sure that every graph analysis operates on that
variable instead of the list representation of the graph.
The lines where functions are defined (more precisely: where
the first clause begins) and the lines where functions are used
are available in functions mode. The line numbers refer
to the files where the functions are defined. This holds also for
files included with the -include and -include_lib
directives, which may result in functions defined apparently in
the same line. The line operators are used for assigning
line numbers to functions and for assigning sets of line numbers
to function calls.
The syntax is similar to the one of the cast operator:
- Expression ::= ( LineOp) Expression
- Expression ::= ( XLineOp) Expression
- LineOp ::= Lin | ELin | LLin | XLin
- XLineOp ::= XXL
The interpretation of the Lin operator applied to a set
of functions assigns to each function the line number where the
function is defined. Unknown functions and functions of library
modules are assigned the number 0.
The interpretation of some LineOp operator applied to a
set of function calls assigns to each call the set of line
numbers where the first function calls the second function. Not
all calls are assigned line numbers by all operators:
- the Lin operator is defined for Call Graph Edges;
- the LLin operator is defined for Local Calls.
- the XLin operator is defined for External Calls.
- the ELin operator is defined for Inter Call Graph Edges.
The Lin (LLin, XLin) operator assigns
the lines where calls (local calls, external calls) are made.
The ELin operator assigns to each call (From, To),
for which it is defined, every line L such that there is
a chain of calls from From to To beginning with a call on line
L.
The XXL operator is defined for the interpretation of
any of the LineOp operators applied to a set of function
calls. The result is that of replacing the function call with
a line numbered function call, that is, each of the two
functions of the call is replaced by a pair of the function and
the line where the function is defined. The effect of the
XXL operator can be undone by the LineOp operators. For
instance, (Lin) (XXL) (Lin) E is
equivalent to (Lin) E.
The +, -, * and # operators are
defined for line number expressions, provided the operands are
compatible. The LineOp operators are also defined for
modules, applications, and releases; the operand is implicitly
converted to functions. Similarly, the cast operator is defined
for the interpretation of the LineOp operators.
The interpretation of the
counting operator is the number of elements of a set. The operator
is undefined for closures. The +, - and *
operators are interpreted as the obvious arithmetical operators
when applied to numbers. The syntax of the counting operator:
- Expression ::= CountOp Expression
- CountOp ::= #
All binary operators are left associative; for instance,
A | B || C is equivalent to
(A | B) || C. The following is a list
of all operators, in increasing order of
precedence:
- +, -
- *
- #
- |, ||, |||
- of
- (Type)
- closure, components, condensation,
domain, range, strict
Parentheses are used for grouping, either to make an expression
more readable or to override the default precedence of operators:
- Expression ::= ( Expression )
A
query is a non-empty sequence of
statements. A statement is either an assignment of a user
variable or an expression. The value of an assignment is the
value of the right hand side expression. It makes no sense to
put a plain expression anywhere else but last in queries. The
syntax of queries is summarized by these productions:
- Query ::= Statement, ...
- Statement ::= Assignment | Expression
- Assignment ::= Variable := Expression
| Variable = Expression
A variable cannot be assigned a new value unless first removed.
Variables assigned to by the = operator are removed at
the end of the query, while variables assigned to by the
:= operator can only be removed by calls to
forget. There are no user variables when module data
need to be set up again; if any of the functions that make it
necessary to set up module data again is called, all user
variables are forgotten.
Types
application() = atom()
arity() = int() | -1
bool() = true | false
call() = {atom(), atom()} | funcall()
constant() = mfa() | module() | application() | release()
directory() = string()
file() = string()
funcall() = {mfa(), mfa()}
function() = atom()
int() = integer() >= 0
library() = atom()
library_path() = path() | code_path
mfa() = {module(), function(), arity()}
mode() = functions | modules
module() = atom()
release() = atom()
string_position() = int() | at_end
variable() = atom()
xref() = atom() | pid()