Age | Commit message (Collapse) | Author |
|
bjorng/bjorn/compiler/fix-atom-leak/ERL-563/OTP-14968
Stop the compiler from overflowing the atom table
|
|
Use integer variable names instead of atoms in v3_core, sys_core_fold,
and v3_kernel to avoid overflowing the atom table.
It is a deliberate design decision to calculate the first free integer
variable name (in sys_core_fold and v3_kernel) instead of somehow
passing it from one pass to another. I don't want that kind of
dependency between compiler passes. Also note that the next free
variable name is not easily available after running the inliner.
|
|
When a generator in a list comprehension was given some
other term than a list, the wrong line could be pointed
out in the exception. Here is an example:
bad_generator() ->
[I || %%This line would be pointed out.
I <- not_a_list].
https://bugs.erlang.org/browse/ERL-572
|
|
Add syntax in try/catch to retrieve the stacktrace directly
|
|
We used to not care about the number of values returned from the
'after infinity' clause in a receive (because it could never be
executed). It is time to start caring because this will cause problem
when we will soon start to do some more aggressive optimizizations.
|
|
This commit adds a new syntax for retrieving the stacktrace
without calling erlang:get_stacktrace/0. That allow us to
deprecate erlang:get_stacktrace/0 and ultimately remove it.
The problem with erlang:get_stacktrace/0 is that it can keep huge
terms in a process for an indefinite time after an exception. The
stacktrace can be huge after a 'function_clause' exception or a failed
call to a BIF or operator, because the arguments for the call will be
included in the stacktrace. For example:
1> catch abs(lists:seq(1, 1000)).
{'EXIT',{badarg,[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]}}
2> erlang:get_stacktrace().
[{erlang,abs,
[[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
23,24|...]],
[]},
{erl_eval,do_apply,6,[{file,"erl_eval.erl"},{line,674}]},
{erl_eval,expr,5,[{file,"erl_eval.erl"},{line,431}]},
{shell,exprs,7,[{file,"shell.erl"},{line,687}]},
{shell,eval_exprs,7,[{file,"shell.erl"},{line,642}]},
{shell,eval_loop,3,[{file,"shell.erl"},{line,627}]}]
3>
We can extend the syntax for clauses in try/catch to optionally bind
the stacktrace to a variable.
Here is an example using the current syntax:
try
Expr
catch C:E ->
Stk = erlang:get_stacktrace(),
.
.
.
In the new syntax, it would look like:
try
Expr
catch
C:E:Stk ->
.
.
.
Only a variable (not a pattern) is allowed in the stacktrace position,
to discourage matching of the stacktrace. (Matching would also be
expensive, because the raw format of the stacktrace would have to be
converted to the cooked form before matching.)
Note that:
try
Expr
catch E ->
.
.
.
is a shorthand for:
try
Expr
catch throw:E ->
.
.
.
If the stacktrace is to be retrieved for a throw, the 'throw:'
prefix must be explicitly included:
try
Expr
catch throw:E:Stk ->
.
.
.
|
|
Tuple calls is the ability to invoke a function on a tuple
as first argument:
1> Var = dict:new().
{dict,0,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}
2> Var:size().
0
This behaviour is considered by most to be undesired and confusing,
especially when it comes to errors. For example, imagine you invoke
"Mod:new()" where a Mod is an atom and you accidentally pass {ok, dict}.
It raises:
{undef,[{ok,new,[{ok,dict}],[]},...]}
As it attempts to invoke ok:new/1, which is really hard to debug
as there is no call to new/1 on the source code.
Furthemore, this behaviour is implemented at the VM level, which
imposes such semantics on all languages running on BEAM.
Since we cannot remove the behaviour above, this proposal makes the
behaviour opt-in with a compiler flag:
-compile(tuple_calls).
This means that, if a codebase relies on this functionality, they
can keep compatibility by adding configuring their build tool to
always use the 'tuple_calls' flag or explicitly on each module.
As long as the compile attribute above is listed, the codebase will
work on old and new Erlang versions alike. The only downside of the
current implementation is that modules compiled on OTP 20 that rely
on 'tuple_calls' will have to be recompiled to run with 'tuple_calls'
on OTP 21+.
|
|
|
|
The new Dbgi chunk returns data in the following format:
{debug_info_v1, Backend, Data}
This allows compilers to store the debug info in different
formats. In order to retrieve a particular format, for
instance, Erlang Abstract Format, one may invoke:
Backend:debug_info(erlang_v1, Module, Data, Opts)
Besides introducing the chunk above, this commit also:
* Changes beam_lib:chunk(Beam, [:abstract_code]) to
read from the new Dbgi chunk while keeping backwards
compatibility with old .beams
* Adds the {debug_info, {Backend, Data}} option to
compile:file/2 and friends that are stored in the
Dbgi chunk. This allows the debug info encryption
mechanism to work across compilers
* Improves dialyzer to work directly on Core Erlang,
allowing languages that do not have the Erlang
Abstract Format to be dialyzer as long as they emit
the new chunk and their backend implementation is
available
Backwards compatibility is kept across the board except
for those calling beam_lib:chunk(Beam, ["Abst"]), as the
old chunk is no longer available. Note however the "Abst"
chunk has always been optional.
Future OTP versions may remove parsing the "Abst" chunk
altogether from beam_lib once Erlang 19 and earlier is no
longer supported.
The current Dialyzer implementation still supports earlier
.beam files and such may also be removed in future versions.
|
|
Binary construction that mixes long literal strings with variables
will make Dialyzer slow. Example:
<<"long string (thousand of characters)",T/binary>>
The string literals in binary construction is translated to one binary
segment per character; all those segments will slow down Dialyzer.
We can speed up Dialyzer if we combine several characters (up to 256)
to a signle segment in the binary. It will also slightly speed up the
compiler.
This optimization will make core listings file with binary strings
harder to read, but they were not that easy to read before this
change.
ERL-308
|
|
A map expression such as,
#{'a' => 1, 'b' => 2, 'a' => 3}
will produce a warning for the repeated key 'a'.
|
|
The previous variable names can be generated by
projects like LFE and Elixir, leading to possible
conflicts. Our first to choice to solve such conflicts
was to use $ but that's not a valid variable name in core.
Therefore we picked @ which is currently supported and
still reduces the chance of conflicts.
|
|
The filters in a list comprehension can be guard expressions or
an ordinary expressions.
If a guard expression is used as a filter, an exception will basically
mean the same as 'false':
t() ->
L = [{some_tag,42},an_atom],
[X || X <- L, element(1, X) =:= some_tag]
%% Returns [{some_tag,42}]
On the other hand, if an ordinary expression is used as a filter, there
will be an exception:
my_element(N, T) -> element(N, T).
t() ->
L = [{some_tag,42},an_atom],
[X || X <- L, my_element(1, X) =:= some_tag]
%% Causes a 'badarg' exception when element(1, an_atom) is evaluated
It has been allowed for several releases to override a BIF with
a local function. Thus, if we define a function called element/2,
it will be called instead of the BIF element/2 within the module.
We must use the "erlang:" prefix to call the BIF.
Therefore, the following code is expected to work the same way as in
our second example above:
-compile({no_auto_import,[element/2]}).
element(N, T) ->
erlang:element(N, T).
t() ->
L = [{some_tag,42},an_atom],
[X || X <- L, element(1, X) =:= some_tag].
%% Causes a 'badarg' exception when element(1, an_atom) is evaluated
But the compiler refuses to compile the code with the following
diagnostic:
call to local/imported function element/2 is illegal in guard
|
|
sys_pre_expand previously did a lot more work, for example,
translating records and funs, but now is merely a grab bag
of small transformations. Move those transformations to
v3_core.
|
|
This speeds up the compilation of binary literals
with string values in them. For example, compiling
a file with a ~340kB binary would yield the following
times by the compiler:
Compiling "foo"
parse_module : 0.130 s 5327.6 kB
transform_module : 0.000 s 5327.6 kB
lint_module : 0.011 s 5327.8 kB
expand_module : 0.508 s 71881.2 kB
v3_core : 0.463 s 11.5 kB
Notice the increase in memory and processing time
in expand_module and v3_core. This happened because
expand_module would expand the string in binaries
into chars. For example, the binary <<"foo">>, which
is represented as
{bin, 1, [
{bin_element, 1, {string, 1, "foo"}, default, default}
]}
would be converted to
{bin, 1, [
{bin_element, 1, {char, 1, $f}, default, default},
{bin_element, 1, {char, 1, $o}, default, default},
{bin_element, 1, {char, 1, $o}, default, default}
]}
However, v3_core would then traverse all of those
characters and convert it into an actual binary, as it
is a literal value.
This patch addresses this issue by moving the expansion
of string into chars to v3_core and only if a literal
value cannot not be built. This reduces the compilation
time of the file mentioned above to the values below:
Compiling "bar"
parse_module : 0.134 s 5327.6 kB
transform_module : 0.000 s 5327.6 kB
lint_module : 0.005 s 5327.8 kB
expand_module : 0.000 s 5328.7 kB
v3_core : 0.013 s 11.2 kB
|
|
Add more filename/line number annotations while translating to
Core Erlang in v3_core, and ensure that sys_core_fold retains
existing annotations. The goal is to avoid that sys_core_fold
generate warnings with "no_file" instead of a filename.
|
|
a3ec2644f5 attempted to teach v3_core not to generate code with
unbound variables. The approach taken in that commit is to
discard all expressions following a badmatch. That does not
work if the badmatch is nested:
{[V] = [] = foo,V},
V
That would be rewritten to:
{error({badmatch,foo})},
V
where V is unbound.
If we were to follow the same approach, the tuple construction
code would have to look out for a badmatch. As would list construction,
begin...end, and so on.
Therefore, as it is impractical to discard all expressions that
follow a badmatch, the only other solution is to ensure that the
variables that the pattern binds will somehow be bound. That can
be arranged by rewriting the pattern to a pattern that binds the
same variables. Thus:
error({badmatch,foo}),
E = foo,
case E of
{[V],[]} ->
V;
Other ->
error({badmatch,Other}
end
|
|
v3_core would generate unsafe code for the following example:
f() ->
{ok={error,E}} = foo(),
E.
Internally, the code would look similar to:
f() ->
Var = foo(),
error({badmatch,Var}),
E.
That is, there would remain a reference to an unbound variable.
Normally, sys_core_fold would remove the reference to 'E', but if
if optimization was disabled the compiler would crash.
|
|
|
|
|
|
The expression in a bit string comprehension is limited to a
literal bit string expression. That is, the following code
is legal:
<< <<X>> || X <- List >>
but not this code:
<< foo(X) || X <- List >>
The limitation is annoying. For one thing, tools that transform
the abstract format must be careful not to produce code such as:
<< begin
%% Some instrumentation code.
<<X>>
end || X <- List >>
One reason for the limitation could be that we'll get
reduce/reduce conflicts if we try to allow an arbitrary
expression in a bit string comprehension:
binary_comprehension -> '<<' expr '||' lc_exprs '>>' :
{bc,?anno('$1'),'$2','$4'}.
Unfortunately, there does not seem to be an easy way to work
around that problem. The best we can do is to allow 'expr_max'
expressions (as in the binary syntax):
binary_comprehension -> '<<' expr_max '||' lc_exprs '>>' :
{bc,?anno('$1'),'$2','$4'}.
That will work, but functions calls must be enclosed in
parentheses:
<< (foo(X)) || X <- List >>
|
|
We will need them when we start to produce warnings for patterns
that can't match.
|
|
Internally in the v3_core pass, an #imatch{} record represents
a match expression:
Pattern = Expression
If Pattern is a single, unbound variable, #imatch{} will be
rewritten to #iset{}; otherwise it will be rewritten to #icase{}.
To determine how #imatch{} should be translated, the pattern is
processed using upattern/3. The return value from upattern/3 is thrown
away (after having been used for determing how the #imatch{} record
should be translated).
That means that every pattern in an #imatch{} is processed twice,
which is wasteful.
We can easily avoid the double processing of patterns by
introducing a new helper function that determines whether the
pattern is a new variable.
|
|
* maint:
Fix crash when attempting to update a fun as if it were a map
|
|
The following example would cause an internal consistency
failure in the compiler:
f() -> ok.
update() -> (fun f/0)#{u => 42}.
The reason is that internally, v3_core will (incorrectly)
rewrite update/0 to code similar to this:
update() ->
if
is_map(fun f/0) ->
maps:update(u, 42, fun f/0)
end.
Since funs are not allowed to be created in guards, incorrect and
unsafe code would be generated.
It is easy to fix the bug. There already is a is_valid_map_src/1
function in v3_core that tests whether the argument for the map update
operation can possibly be a valid map. A fun is represented as a
variable with a special name in Core Erlang, so it would not be
recognized as unsafe. All we'll need to do to fix the bug is to look
closer at variables to ensure they don't represent funs. That will
ensure that the code is rewritten in the correct way:
update() ->
error({badmap,fun f/0})
end.
Reported-by: Thomas Arts
|
|
When translating guards to Core Erlang, it is sometimes necessary
to add an is_boolean/1 guard test. Here is an example when it is
necessary:
o(A, B) when A or B ->
ok.
That would be translated to something like:
o(A, B) when ((A =:= true) or (B =:= true)) and
is_boolean(A) and is_boolean(B) ->
ok.
The is_boolean/1 tests are necessary to ensure that the guard
fails for calls such as:
o(true, not_boolean)
However, because of a bug in v3_core, is_boolean/1 tests were
added when they were not necessary. Here is an example:
f(B) when not B -> ok.
That would be translated to:
f(B) when (B =:= false) and is_boolean(B) -> ok.
The following translation will work just as well.
f(B) when B =:= false -> ok.
Correct the bug to suppress those unnecessary is_boolean/1 tests.
|
|
|
|
|
|
|
|
According to EEP-43 for maps, a 'badmap' exception should be
generated when an attempt is made to update non-map term such as:
<<>>#{a=>42}
That was not implemented in the OTP 17.
José Valim suggested that we should take the opportunity to
improve the errors coming from map operations:
http://erlang.org/pipermail/erlang-questions/2015-February/083588.html
This commit implement better errors from map operations similar
to his suggestion.
When a map update operation (Map#{...}) or a BIF that expects a map
is given a non-map term, the exception will be:
{badmap,Term}
This kind of exception is similar to the {badfun,Term} exception
from operations that expect a fun.
When a map operation requires a key that is not present in a map,
the following exception will be raised:
{badkey,Key}
José Valim suggested that the exception should be
{badkey,Key,Map}. We decided not to do that because the map
could potentially be huge and cause problems if the error
propagated through links to other processes.
For BIFs, it could be argued that the exceptions could be simply
'badmap' and 'badkey', because the bad map and bad key can be found in
the argument list for the BIF in the stack backtrace. However, for the
map update operation (Map#{...}), the bad map or bad key will not be
included in the stack backtrace, so that information must be included
in the exception reason itself. For consistency, the BIFs should raise
the same exceptions as update operation.
If more than one key is missing, it is undefined which of
keys that will be reported in the {badkey,Key} exception.
|
|
Duplicated variables as aliases in patterns, such as:
f({_,_}=Dup=Dup) -> ...
will work, but produce sub-optimal code similar to:
f({_,_}=Dup=NewVar) when Dup =:= NewVar -> ...
with one extra guard test for each duplicated variable.
Rewrite pat_alias/2 to eliminate all duplicated variables. While
we are at it, also simplify handling of tuples, conses, and literals
by using the data functions in the cerl module.
|
|
get_ianno/1 would retrieve either a bare annotation or an
annotation wrapped in an #a{} record. In both cases, it would
return a wrapped annotation.
We can replace the calls to get_ianno/1 with calls to get_anno/1,
because the argument is always an #iclause{} and all iclause records
are always initialized with a wrapped annotation.
|
|
If we have a sequence of put_map_* instructions operating on the
same map, it will be more efficient if we can have one is_map/2
instruction before put_map_* instructions, so that each put_map_*
does not need to test whether the argument is a map.
|
|
|
|
There is no need to always introduce a new variable to hold a map.
Maps are novars (constructs that don't export variables).
|
|
Compiling the following function:
f(V) when not (bar and V) -> true; %Line 4
f(_) -> false.
would produce the following warnings:
no_file: Warning: the call to is_boolean/1 has no effect
t.erl:4: Warning: the guard for this clause evaluates to 'false'
t.erl:4: Warning: use of operator '=:=' has no effect
Two of the warnings refer to calls to is_boolean/1 and '=:='/2 which
v3_core added when translating the code to Core Erlang. The only
relevant warning is:
t.erl:4: Warning: the guard for this clause evaluates to 'false'
Suppress the other two warning by marking the compiler-generated
calls with a 'compiler_generated' annotation.
|
|
|
|
|
|
Core Erlang annotations are supposed to be a list of terms. v3_core
could temporarily stuff a record in the 'anno' field of a Core Erlang
record. That will cause Dialyzer warnings if we would tighten the
type specs for annotations. (We want to tighten the warnings in order
to catch more real problems.)
Avoid abusing the annotation by wrapping the entire Core Erlang
record in a #isimple{} record.
Reported-by: Kostis Sagonas
|
|
In c34ad2d5, the compiler learned to silence some warnings for
expressions that were explicitly assigned to the '_' variable,
as in this example:
_ = list_to_integer(S),
ok
That commit intentionally only made it possible to silence warnings
for BIFs that could cause an exception. Warnings would still be
produced for:
_ = date(),
ok
because date/0 can never fail and thus making the call completely
useless. The reasoning was that such warnings can always be
eliminated by eliminating the offending code.
While that is true, there is the question about rules and their
consistency. It is surprising that '_' can be used to silence
some warnings, but has no effect on other warnings.
Therefore, we will teach the compiler to silence warnings for
the following constructs:
* Calls to safe BIFs such as date/0
* Expressions that will cause an exception such as 'X/0'
* Terms that are built but not used, such as '{x,X}'
|
|
When translating a function with map construction:
f(A) ->
B = b,
C = c,
#{A=>1,B=>2,C=>3}.
v3_core would break apart the map construction into three
parts because of the way the map instructions in BEAM work --
variable keys need to be in their own instruction.
In the example, constant propagation will turn two of the
keys to literal keys. But the initial breaking apart will
not be undone, so there will still be three map constructions:
'f'/1 =
fun (_cor0) ->
let <_cor3> = ~{::<_cor0,1>}~
in let <_cor4> = ~{::<'b',2>|_cor3}~
in ~{::<'c',3>|_cor4}~
It would be possible to complicate the sys_core_fold pass
to regroup map operations so that we would get:
'f'/1 =
fun (_cor0) ->
let <_cor3> = ~{::<_cor0,1>}~
in ~{::<'b',2>,::<'c',3>|_cor3}~
A simpler way that allows to simplify the translation is
to skip the grouping in v3_core and translate the function
to:
'f'/1 =
fun (_cor0) ->
~{::<_cor0,1>,::<'b',2>,::<'c',3>}~
We will then let v3_kernel do the grouping while translating
from Core Erlang to Kernel Erlang.
|
|
Maps have certain invariants that must be preserved:
(1) A map as a pattern must be represented as #c_map{} record,
never as a literal. The reason is that the pattern '#{}' will
match any map, not just the empty map. The literal '#{}' will
only match the empty map.
(2) In a map pattern, the key must be a literal, a variable, or
data (list or tuple). Keys that are binaries or maps *must* be
represented as literals.
(3) Maps in expressions should be represented as literals if possible.
Nothing is broken if this invariant is broken, but the generated
code will be less efficient.
To preserve invariant (1), cerl:update_c_map/3 must never collapse
a map to a literal. To preserve invariant (3), cerl:update_c_map/3
must collapse a map to a literal if possible.
To preserve both invariants, we need a way for cerl:update_c_map/3 to
know whether the map is used as a pattern or as an expression. The
simplest way is to have an 'is_pat' boolean in the #c_map{} record
which is set when a #c_map{} record is initially created.
We also need to update core_parse.yrl to establish the invariants
in the same way as v3_core, to ensure that compiling from a
.core file will work even if all optimizations on Core Erlang are
disabled.
|
|
The translation of list comprehension with a map pattern
with a big literal binary as key such as:
lc(L) ->
[V || #{<<2:301>> := V} <- L].
would generate Core Erlang code where an unbound variable
were referenced:
'lc'/1 =
fun (L) ->
letrec
'lc$^0'/1 = fun (_cor4) ->
case _cor4 of
<[~{~<_cor1,V>}~|_cor3]> when 'true' ->
let <_cor5> = apply 'lc$^0'/1(_cor3)
in [V|_cor5]
<[_cor2|_cor3]> when 'true' ->
apply 'lc$^0'/1(_cor3)
<[]> when 'true' ->
[]
end
in let <_cor1> = #{#<2>(301,1,'integer',['unsigned'|['big']])}#
in apply 'lc$^0'/1(L)
In the map pattern in the 'case' in the 'letrec', the key is the
variable '_cor1' which should be bound in the enclosing environment.
It is not.
There is binding of '_cor1', but in the wrong place (at the end of
the function). Because of the way v3_kernel translates letrecs,
the code *happens* to work.
The code will break if Core Erlang optimizations were strengthened
to more aggressively eliminate variable bindings that are not used,
or if the translation from Core Erlang to Kernel Erlang were changed.
Correct the translation so that '_cor1' is bound in the environment
enclosing the 'letrec':
'lc'/1 =
fun (L) ->
let <_cor1> = #{#<2>(301,1,'integer',['unsigned'|['big']])}#
in letrec
'lc$^0'/1 = fun (_cor4) ->
case _cor4 of
<[~{~<_cor1,V>}~|_cor3]> when 'true' ->
let <_cor5> = apply 'lc$^0'/1(_cor3)
in [V|_cor5]
<[_cor2|_cor3]> when 'true' ->
apply 'lc$^0'/1(_cor3)
<[]> when 'true' ->
[]
end
in apply 'lc$^0'/1(L)
Unfortunately I was not able to come up with a test case that
demonstrates the bug.
|
|
* maint:
Fix miscompilation when module contains multiple named funs
Fix locations of shadowing warnings in ms_transform
|
|
Commit 78ce8917d started to use get_anno/1 to extract the line
annotation from filter qualifiers in comprehensions, but this does not
respect the spec of this function and resuls in a dialyzer warning.
To make the code more type-friendly, introduce a get_qual_anno/1
function.
Kostis Sagonas suggested that the function should be implemented
similar to this to also ensure that the qualifiers are of the
appropriate form:
get_qual_anno({call,Line,_,_}) -> Line;
get_qual_anno({op,Line,_,_,_}) -> Line;
.
.
.
get_qual_anno({var,Line,_}) -> Line.
The problem is that it is difficult to know exacly which forms
that may occur and the function will need to be updated if new
abstract forms are added. Thus this implementation would complicate
maintanance without any real payoff.
Reported-by: Kostis Sagonas
|
|
A module containing two named funs bearing the same name and arity could be
miscompiled.
Reported-by: Sam Chapin
|
|
Matching of type:
#{K := V1} = #{K := V2} = M,
Will alias (coalesce) to
#{K := V1 = V2} = M.
|
|
|
|
Check for literals instead of variables when constructing chains.
|
|
Two patterns, binary_segment size and map_pair key, are expressions
even in matching. If only bound variables are used we are fine but
some expressions which appears as literals needs to be lifted.
Currently only Map key binaries will use this.
|