diff options
author | Björn Gustavsson <[email protected]> | 2019-03-07 10:00:39 +0100 |
---|---|---|
committer | Björn Gustavsson <[email protected]> | 2019-03-09 09:31:08 +0100 |
commit | 2d2e78ad6e667655560a67e848153dbb218914f7 (patch) | |
tree | f14140ca46e4eeaacf2895d59a12772f6f1dac33 | |
parent | 3066a5bf51467d1f8f0b05cf7b7bab0ef6a17578 (diff) | |
download | otp-2d2e78ad6e667655560a67e848153dbb218914f7.tar.gz otp-2d2e78ad6e667655560a67e848153dbb218914f7.tar.bz2 otp-2d2e78ad6e667655560a67e848153dbb218914f7.zip |
Optimize tail-recursive calls of BIFs
BEAM currently does not call BIFs at the end of a function in a
tail-recursive way. That is, when calling a BIF at the end of a
function, the BIF is first called, and then the stack frame
is deallocated, and then control is transferred to the caller.
If there is no stack frame when a BIF is called in the tail position,
the loader will emit a sequence of three instructions: first an
instruction that allocates a stack frame and saves the continuation
pointer (`allocate`), then an instruction that calls the BIF
(`call_bif`), and lastly an instruction that deallocates the stack
frame and returns to the caller (`deallocate_return`).
The old compiler would essentially allocate a stack frame for each
clause in a function, so it would not be that common that a BIF was
called in the tail position when there was no stack frame, so the
three-instruction sequence was deemed acceptable.
The new compiler only allocates stack frames when truly needed, so
the three-instruction BIF call sequence has become much more common.
This commit introduces a new `call_bif_only` instruction so that only
one instruction will be needed when calling a BIF in the tail position
when there is no stack frame. This instruction is also used when there
is a stack frame to make it possible to deallocate the stack frame
**before** calling the BIF, which may make a subsequent garbage
collection at the end of the BIF call cheaper (copying less garbage).
The one downside of this change is that the function that called the
BIF will not be included in the stack backtrace (similar to how a
tail-recursive call to an Erlang function will not be included in the
backtrace).
That was the quick summary of the commit. Here comes a detailed look
at how BIF calls are translated by the loader. The first example is a
function that calls `setelement/3` in the tail position:
update_no_stackframe(X) ->
setelement(5, X, new_value).
Here is the BEAM code:
{function, update_no_stackframe, 1, 12}.
{label,11}.
{line,[...]}.
{func_info,{atom,t},{atom,update_no_stackframe},1}.
{label,12}.
{move,{x,0},{x,1}}.
{move,{atom,new_value},{x,2}}.
{move,{integer,5},{x,0}}.
{line,[...]}.
{call_ext_only,3,{extfunc,erlang,setelement,3}}.
Because there is no stack frame, the `call_ext_only` instruction will
be used to call `setelement/3`:
{call_ext_only,3,{extfunc,erlang,setelement,3}}.
The loader will transform this instruction to a three-instruction
sequence:
0000000020BD8130: allocate_tt 0 3
0000000020BD8138: call_bif_e erlang:setelement/3
0000000020BD8148: deallocate_return_Q 0
Using the `call_bif_only` instruction introduced in this commit,
only one instruction is needed:
000000005DC377F0: call_bif_only_e erlang:setelement/3
`call_bif_only` calls the BIF and returns to the caller.
Now let's look at a function that already has a stack frame when
`setelement/3` is called:
update_with_stackframe(X) ->
foobar(X),
setelement(5, X, new_value).
Here is the BEAM code:
{function, update_with_stackframe, 1, 14}.
{label,13}.
{line,[...]}.
{func_info,{atom,t},{atom,update_with_stackframe},1}.
{label,14}.
{allocate,1,1}.
{move,{x,0},{y,0}}.
{line,[...]}.
{call,1,{f,16}}.
{move,{y,0},{x,1}}.
{move,{atom,new_value},{x,2}}.
{move,{integer,5},{x,0}}.
{line,[...]}.
{call_ext_last,3,{extfunc,erlang,setelement,3},1}.
Since there is a stack frame, the `call_ext_last` instruction will be used
to deallocate the stack frame and call the function:
{call_ext_last,3,{extfunc,erlang,setelement,3},1}.
Before this commit, the loader would translate this instruction to:
0000000020BD81B8: call_bif_e erlang:setelement/3
0000000020BD81C8: deallocate_return_Q 1
That is, the BIF is called before deallocating the stack frame and returning
to the calling function.
After this commit, the loader will translate the `call_ext_last` like this:
000000005DC37868: deallocate_Q 1
000000005DC37870: call_bif_only_e erlang:setelement/3
There are still two instructions, but now the stack frame will be
deallocated before calling the BIF, which could make the potential
garbage collection after the BIF call slightly more efficient (copying
less garbage).
We could have introduced a `call_bif_last` instruction, but the code
for calling a BIF is relatively large and there does not seem be a
practical way to share the code between `call_bif` and `call_bif_only`
(since the difference is at the end, after the BIF call). Therefore,
we did not want to clone the BIF calling code yet another time to
make a `call_bif_last` instruction.
-rw-r--r-- | erts/emulator/beam/bif.c | 6 | ||||
-rw-r--r-- | erts/emulator/beam/bif_instrs.tab | 114 | ||||
-rw-r--r-- | erts/emulator/beam/ops.tab | 30 | ||||
-rw-r--r-- | erts/emulator/test/trace_local_SUITE.erl | 4 | ||||
-rw-r--r-- | lib/compiler/test/beam_except_SUITE.erl | 13 |
5 files changed, 139 insertions, 28 deletions
diff --git a/erts/emulator/beam/bif.c b/erts/emulator/beam/bif.c index 7faba35e1c..c102ddbee6 100644 --- a/erts/emulator/beam/bif.c +++ b/erts/emulator/beam/bif.c @@ -5165,6 +5165,12 @@ erts_schedule_bif(Process *proc, pc = i; mfa = &exp->info.mfa; } + else if (BeamIsOpCode(*i, op_call_bif_only_e)) { + /* Pointer to bif export in i+1 */ + exp = (Export *) i[1]; + pc = i; + mfa = &exp->info.mfa; + } else if (BeamIsOpCode(*i, op_apply_bif)) { /* Pointer to bif in i+1, and mfa in i-3 */ pc = c_p->cp; diff --git a/erts/emulator/beam/bif_instrs.tab b/erts/emulator/beam/bif_instrs.tab index 8499f61114..8e0caa38a3 100644 --- a/erts/emulator/beam/bif_instrs.tab +++ b/erts/emulator/beam/bif_instrs.tab @@ -209,8 +209,8 @@ i_length.execute(Fail, Live, Dst) { } // -// The most general BIF call. The BIF may build any amount of data -// on the heap. The result is always returned in r(0). +// Call a BIF, store the result in x(0) and transfer control to the +// next instruction. // call_bif(Exp) { ErtsBifFunc bf; @@ -219,8 +219,10 @@ call_bif(Exp) { Export *export = (Export*) $Exp; if (!((FCALLS - 1) > 0 || (FCALLS-1) > neg_o_reds)) { - /* If we have run out of reductions, we do a context - switch before calling the bif */ + /* + * If we have run out of reductions, do a context + * switch before calling the BIF. + */ c_p->arity = GET_BIF_ARITY(export); c_p->current = &export->info.mfa; goto context_switch3; @@ -257,9 +259,12 @@ call_bif(Exp) { HTOP = HEAP_TOP(c_p); FCALLS = c_p->fcalls; ERTS_DBG_CHK_REDS(c_p, FCALLS); - /* We have to update the cache if we are enabled in order - to make sure no book keeping is done after we disabled - msacc. We don't always do this as it is quite expensive. */ + + /* + * We have to update the cache if we are enabled in order + * to make sure no bookkeeping is done after we disabled + * msacc. We don't always do this as it is quite expensive. + */ if (ERTS_MSACC_IS_ENABLED_CACHED_X()) { ERTS_MSACC_UPDATE_CACHE_X(); } @@ -269,6 +274,12 @@ call_bif(Exp) { CHECK_TERM(r(0)); $NEXT0(); } else if (c_p->freason == TRAP) { + /* + * Set the continuation pointer to return to next + * instruction after the trap (either by a return from + * erlang code or by nif_bif.epilogue() when the BIF + * is done). + */ SET_CP(c_p, $NEXT_INSTRUCTION); SET_I(c_p->i); SWAPIN; @@ -281,6 +292,95 @@ call_bif(Exp) { ASSERT(c_p->stop == E); I = handle_error(c_p, I, reg, &export->info.mfa); goto post_error_handling; + //| -no_next +} + +// +// Call a BIF tail-recursively, storing the result in x(0) and doing +// a return to the continuation poiner (c_p->cp). +// + +call_bif_only(Exp) { + ErtsBifFunc bf; + Eterm result; + ErlHeapFragment *live_hf_end; + Export *export = (Export*) $Exp; + + if (!((FCALLS - 1) > 0 || (FCALLS-1) > neg_o_reds)) { + /* + * If we have run out of reductions, do a context + * switch before calling the BIF. + */ + c_p->arity = GET_BIF_ARITY(export); + c_p->current = &export->info.mfa; + goto context_switch3; + } + + ERTS_MSACC_SET_BIF_STATE_CACHED_X(GET_BIF_MODULE(export), + GET_BIF_ADDRESS(export)); + + bf = GET_BIF_ADDRESS(export); + + PRE_BIF_SWAPOUT(c_p); + ERTS_DBG_CHK_REDS(c_p, FCALLS); + c_p->fcalls = FCALLS - 1; + if (FCALLS <= 0) { + save_calls(c_p, export); + } + ASSERT(!ERTS_PROC_IS_EXITING(c_p)); + ERTS_VERIFY_UNUSED_TEMP_ALLOC(c_p); + live_hf_end = c_p->mbuf; + ERTS_CHK_MBUF_SZ(c_p); + result = (*bf)(c_p, reg, I); + ERTS_CHK_MBUF_SZ(c_p); + ASSERT(!ERTS_PROC_IS_EXITING(c_p) || is_non_value(result)); + ERTS_VERIFY_UNUSED_TEMP_ALLOC(c_p); + ERTS_HOLE_CHECK(c_p); + ERTS_REQ_PROC_MAIN_LOCK(c_p); + if (ERTS_IS_GC_DESIRED(c_p)) { + Uint arity = GET_BIF_ARITY(export); + result = erts_gc_after_bif_call_lhf(c_p, live_hf_end, result, + reg, arity); + E = c_p->stop; + } + PROCESS_MAIN_CHK_LOCKS(c_p); + HTOP = HEAP_TOP(c_p); + FCALLS = c_p->fcalls; + ERTS_DBG_CHK_REDS(c_p, FCALLS); + + /* + * We have to update the cache if we are enabled in order + * to make sure no bookkeeping is done after we disabled + * msacc. We don't always do this as it is quite expensive. + */ + if (ERTS_MSACC_IS_ENABLED_CACHED_X()) { + ERTS_MSACC_UPDATE_CACHE_X(); + } + ERTS_MSACC_SET_STATE_CACHED_M_X(ERTS_MSACC_STATE_EMULATOR); + if (ERTS_LIKELY(is_value(result))) { + /* + * Success. Store the result and return to the caller. + */ + r(0) = result; + CHECK_TERM(r(0)); + $return(); + } else if (c_p->freason == TRAP) { + /* + * Dispatch to a trap. When the trap is done, a jump + * to the continuation pointer (c_p->cp) will be done. + */ + SET_I(c_p->i); + SWAPIN; + Dispatch(); + } + + /* + * Error handling. SWAPOUT is not needed because it was done above. + */ + ASSERT(c_p->stop == E); + I = handle_error(c_p, I, reg, &export->info.mfa); + goto post_error_handling; + //| -no_next } // diff --git a/erts/emulator/beam/ops.tab b/erts/emulator/beam/ops.tab index e688c6996b..da5364183c 100644 --- a/erts/emulator/beam/ops.tab +++ b/erts/emulator/beam/ops.tab @@ -74,23 +74,19 @@ trace_jump W return +# To ensure that a "move Src x(0)" instruction can be combined with +# the following call instruction, we need to make sure that there is +# no line/1 instruction between the move and the call. # -# To ensure that a "move Src x(0)" instruction can be combined -# with the following call instruction, we need to make sure that -# there is no line/1 instruction between the move and the call. -# -# A tail-recursive call to an external function (non-BIF) will -# never be saved on the stack, so there is no reason to keep -# the line instruction. (The compiler did not remove the line -# instruction because it cannot tell the difference between -# BIFs and ordinary Erlang functions.) -# +# A tail-recursive call to an external function (BIF or non-BIF) will +# never be saved on the stack, so there is no reason to keep the line +# instruction. move S X0=x==0 | line Loc | call_ext Ar Func => \ line Loc | move S X0 | call_ext Ar Func -move S X0=x==0 | line Loc | call_ext_last Ar Func=u$is_not_bif D => \ +move S X0=x==0 | line Loc | call_ext_last Ar Func D => \ move S X0 | call_ext_last Ar Func D -move S X0=x==0 | line Loc | call_ext_only Ar Func=u$is_not_bif => \ +move S X0=x==0 | line Loc | call_ext_only Ar Func => \ move S X0 | call_ext_only Ar Func move S X0=x==0 | line Loc | call Ar Func => \ line Loc | move S X0 | call Ar Func @@ -102,9 +98,9 @@ line I allocate t t? allocate_heap t I t? -%cold +# This instruction when a BIF is called tail-recursively when +# ther is stack frame. deallocate Q -%hot init y allocate_zero t t? @@ -985,10 +981,9 @@ call_ext_only u==0 u$func:os:perf_counter/0 => \ call_ext u Bif=u$is_bif => call_bif Bif -call_ext_last u Bif=u$is_bif D => call_bif Bif | deallocate_return D +call_ext_last u Bif=u$is_bif D => deallocate D | call_bif_only Bif -call_ext_only Ar=u Bif=u$is_bif => \ - allocate u Ar | call_bif Bif | deallocate_return u +call_ext_only Ar=u Bif=u$is_bif => call_bif_only Bif # # Any remaining calls are calls to Erlang functions, not BIFs. @@ -1020,6 +1015,7 @@ i_perf_counter %hot call_bif e +call_bif_only e # # Calls to non-building and guard BIFs. diff --git a/erts/emulator/test/trace_local_SUITE.erl b/erts/emulator/test/trace_local_SUITE.erl index 253d5fed23..ad802352b9 100644 --- a/erts/emulator/test/trace_local_SUITE.erl +++ b/erts/emulator/test/trace_local_SUITE.erl @@ -1181,7 +1181,9 @@ undef(X) -> ?MODULE:undef(X, X). % undef lists_reverse(A, B) -> - lists:reverse(A, B). + Res = lists:reverse(A, B), + _ = (catch abs(A)), + Res. diff --git a/lib/compiler/test/beam_except_SUITE.erl b/lib/compiler/test/beam_except_SUITE.erl index 9380fe06c8..8e3b373d29 100644 --- a/lib/compiler/test/beam_except_SUITE.erl +++ b/lib/compiler/test/beam_except_SUITE.erl @@ -84,9 +84,16 @@ coverage(_) -> {'EXIT',{function_clause, [{?MODULE,fc,[y],[File,{line,2}]}|_]}} = (catch fc(y)), - {'EXIT',{function_clause, - [{?MODULE,fc,[[a,b,c]],[File,{line,6}]}|_]}} = - (catch fc([a,b,c])), + case ?MODULE of + beam_except_no_opt_SUITE -> + %% There will be a different stack fram in + %% unoptimized code. + ok; + _ -> + {'EXIT',{function_clause, + [{?MODULE,fc,[[a,b,c]],[File,{line,6}]}|_]}} = + (catch fc([a,b,c])) + end, {'EXIT',{undef,[{erlang,error,[a,b,c],_}|_]}} = (catch erlang:error(a, b, c)), |