Age | Commit message (Collapse) | Author |
|
|
|
When a simple_one_for_one supervisor is shutting down, and a child
exits with an exit reason of the form {shutdown, Term}, handle it as
if the exit reason were 'shutdown', without printing an error report.
This makes the behaviour of wait_dynamic_children match that of
do_restart.
This fixes ERL-163.
|
|
* josevalim/supervisor-get-callback-module/PR-1000/OTP-13619:
Return callback module in supervisor format_status
|
|
The previous implementation of supervisor:get_callback_module/1
used sys:get_status/1 to get the supervisor inner state and
retrieve the callback module. Such implementation forbids any
other supervisor implementation that has an internal state
different than the #state{} record in supervisor.erl.
This patch allows supervisors to return the callback module
as part of the sys:get_status/1 data, no longer coupling the
callback module implementation with the inner #state{} record.
Notice we have kept the clause matching the previous
sys:get_status/1 reply for backwards compatibility purposes.
|
|
* josevalim/supervisor-try-again-restart/PR-1001/OTP-13618:
Avoid potential timer bottleneck on supervisor restart
|
|
Using the new type syntax, we can specify which keys are required, and
which are optional in a way Dialyzer could use.
|
|
Prior to this patch, when the supervisor was unable to
restart a child, it used the timer process to execute
a function later on, allowing other messages in the
supervisor queue to be consumed.
This patch keeps the same idea but bypasses the timer
process by directly sending a message to the supervisor
process itself.
|
|
|
|
Conflicts:
lib/stdlib/src/supervisor.erl
|
|
Speed up supervisor:count_children/1 for simple_one_for_one
supervisors. This is achieved by avoiding looping through all the
child process and verifying that each one is alive.
For a supervisor with 100,000 'temporary' children the count-time will
drop from approx 25ms to about 0.005ms.
For a supervisor with 100,000 'permanent' or 'transient' children the
count-time will drop from approx 30ms to about 0.005ms.
This avoids having the supervisor block for an extended period while
the count takes place. Under normal circumstances the accuracy of the
result should also improve since the duration is too short for many
processes to die during the count.
|
|
This function is used by release_handler during upgrade. This was
earlier implemented in the release_handler, but it required a copy og
the definition of the supervisor's internal state, which caused
problems when this state was updated.
|
|
The field 'dynamics' in #state{} is a union of two opaque types, which
is possibly problematic. Tagging the types should make the code safe
for warnings from future versions of Dialyzer.
|
|
Record field types have been modified due to commit 8ce35b2:
"Take out automatic insertion of 'undefined' from typed record fields".
|
|
* nybek/supervisor_reporting_error:
Fix supervisor reporting error
|
|
|
|
A bug in supervisor:get_childspec/2 results in
{error, simple_one_for_one} being returned on every call when the
supervisor strategy is simple_one_for_one.
This commit includes a small refactoring which brings together the
two 'start_child' clauses to make the code easier to follow.
|
|
When a simple_one_for_one supervisor is shutting down, if one or more
of its children fail to terminate within the 'shutdown' timeout, then
the following reporting error will occur:
* If there is only one child that fails to exit on time, then no
supervisor report will be generated.
* If more than one child fails to exit on time, then the reporting
of 'nb_children' in the supervisor report will be 1 less than it
should be.
|
|
If a child of a simple_one_for_one returns ignore from its start
function no longer store the child for any restart type. It is not
possible to restart or delete the child because the supervisor is a
simple_one_for_one.
Previously a simple_one_for_one would crash, potentially without
shutting down all of its children, when it tried to shutdown a child
with undefined pid.
Previous only one undefined pid child was stored in a simple_one_for_one
supervisor no matter how many times the child start function returned
ignore.
|
|
fbaa0bec replaced the use of now/0 with erlang:monotonic_time/1 but at
the same time introduced a bug in inPeriod/3 so that it would always
return 'true' (the subtraction Time - Now would always result in
a non-positive number that would always be less than Period).
The symptoms of the bug is that when a child has been restarted the
maximum number of times allowed, the supervisor will terminate,
regardless of how much time that elapses between the restarts.
There was no test case that detected this problem. Add the missing
test case to ensure that this bug stays killed.
Reported-by: Rafał Studnicki
|
|
* rickard/time_api/OTP-11997: (22 commits)
Update primary bootstrap
inets: Suppress deprecated warning on erlang:now/0
inets: Cleanup of multiple copies of functions Add inets_lib with common functions used by multiple modules
inets: Update comments
Suppress deprecated warning on erlang:now/0
Use new time API and be back-compatible in inets Remove unused functions and removed redundant test
asn1 test SUITE: Eliminate use of now/0
Disable deprecated warning on erlang:now/0 in diameter_lib
Use new time API and be back-compatible in ssh
Replace all calls to now/0 in CT with new time API functions
test_server: Replace usage of erlang:now() with usage of new API
Replace usage of erlang:now() with usage of new API
Replace usage of erlang:now() with usage of new API
Replace usage of erlang:now() with usage of new API
Replace usage of erlang:now() with usage of new API
otp_SUITE: Warn for calls to erlang:now/0
Replace usage of erlang:now() with usage of new API
Multiple timer wheels
Erlang based BIF timer implementation for scalability
Implement ethread events with timeout
...
Conflicts:
bootstrap/bin/start.boot
bootstrap/bin/start_clean.boot
bootstrap/lib/compiler/ebin/beam_asm.beam
bootstrap/lib/compiler/ebin/compile.beam
bootstrap/lib/kernel/ebin/auth.beam
bootstrap/lib/kernel/ebin/dist_util.beam
bootstrap/lib/kernel/ebin/global.beam
bootstrap/lib/kernel/ebin/hipe_unified_loader.beam
bootstrap/lib/kernel/ebin/inet_db.beam
bootstrap/lib/kernel/ebin/inet_dns.beam
bootstrap/lib/kernel/ebin/inet_res.beam
bootstrap/lib/kernel/ebin/os.beam
bootstrap/lib/kernel/ebin/pg2.beam
bootstrap/lib/stdlib/ebin/dets.beam
bootstrap/lib/stdlib/ebin/dets_utils.beam
bootstrap/lib/stdlib/ebin/erl_tar.beam
bootstrap/lib/stdlib/ebin/escript.beam
bootstrap/lib/stdlib/ebin/file_sorter.beam
bootstrap/lib/stdlib/ebin/otp_internal.beam
bootstrap/lib/stdlib/ebin/qlc.beam
bootstrap/lib/stdlib/ebin/random.beam
bootstrap/lib/stdlib/ebin/supervisor.beam
bootstrap/lib/stdlib/ebin/timer.beam
erts/aclocal.m4
erts/emulator/beam/bif.c
erts/emulator/beam/erl_bif_info.c
erts/emulator/beam/erl_db_hash.c
erts/emulator/beam/erl_init.c
erts/emulator/beam/erl_process.h
erts/emulator/beam/erl_thr_progress.c
erts/emulator/beam/utils.c
erts/emulator/sys/unix/sys.c
erts/preloaded/ebin/erlang.beam
erts/preloaded/ebin/erts_internal.beam
erts/preloaded/ebin/init.beam
erts/preloaded/src/erts_internal.erl
lib/common_test/test/ct_hooks_SUITE_data/cth/tests/empty_cth.erl
lib/diameter/src/base/diameter_lib.erl
lib/kernel/src/os.erl
lib/ssh/test/ssh_basic_SUITE.erl
system/doc/efficiency_guide/advanced.xml
|
|
|
|
Takes the name of the child (or Pid, in the case of a
simple_one_for_one supervisor) and returns the map which specifies the
child.
|
|
Earlier, supervisor flags and child specs were given as tuples. While
this is kept for backwards compatibility, it is now also allowed to
give these parameters as maps:
-type sup_flags() :: #{strategy => strategy(), % optional
intensity => non_neg_integer(), % optional
period => pos_integer()} % optional
-type child_spec() :: #{id => child_id(), % mandatory
start => mfargs(), % mandatory
restart => restart(), % optional
shutdown => shutdown(), % optional
type => worker(), % optional
modules => modules()} % optional
Default values are as follows:
Supervisor flags:
strategy: one_for_one
intensity: 1
period: 5
Child specs:
restart: permanent
type: worker
shutdown: 5000 for workers, 'infinity' for supervisors
modules: [M], where M comes from the child's start {M,F,A}
Some of these default values are quite hard to decide on, since there
really is no "most common way". It always depends on the use case. So
we decided that the most important reason for having default values is
to lower the start barrier and get "something" running. For production
use, most systems must be fine tuned in this respect anyway.
This is how we reasoned about it:
Strategy: just pick one - and 'one_for_one' was most used in OTP.
Max restart frequency (intensity/period): by allowing one restart only
we keep the important supervisor feature of restarting children, but
we also avoid the possibility of scaling to a huge amount of restarts
if the supervisor tree is deep.
Restart: just pick one - and 'permanent' is fairly common.
Shutdown for workers: to avoid the confusion of why the terminate
function is not executed, we decided not to use 'brutal_kill'. Which
number to use is probably not that important, so we chose 5000.
Shutdown for supervisors: the recommended shutdown value for
supervisors is 'infinity', so we decied to use that. Having the same
(integer) value as for workers can give very strange results.
Type: just pick one - and we believe that 'worker' is most common
|
|
Remove duplicated code related to checking of supervisor flags.
|
|
The types array(), dict(), digraph(), gb_set(), gb_tree(), queue(),
set(), and tid() have been deprecated. They will be removed in OTP 18.0.
Instead the types array:array(), dict:dict(), digraph:graph(),
gb_set:set(), gb_tree:tree(), queue:queue(), sets:set(), and ets:tid()
can be used. (Note: it has always been necessary to use ets:tid().)
It is allowed in OTP 17.0 to locally re-define the types array(), dict(),
and so on.
New types array:array/1, dict:dict/2, gb_sets:set/1, gb_trees:tree/2,
queue:queue/1, and sets:set/1 have been added.
|
|
* calebh/fix-regestry-type-annotation:
Fix alternative registry type annotations in supervisor
OTP-11707
|
|
The type annotations for alternative registries using
the {via,Module,Name} syntax was not given for
sup_name() and sup_ref() in the supervisor module.
The type annotation was inconsistent with the
documentation and causes Dialyzer to complain about
valid supervisor:start_link and supervisor:start_child
calls.
|
|
|
|
|
|
|
|
In rest_for_one and one_for_all supervisors one child dying can cause
multiple children to be restarted. Previously if the child that caused
the restart is started successfully but another child fails to start,
the supervisor would not terminate this child with the other
successfully restarted children as no record of the pid was kept. Thus
the supervisor would try to start this child again. This could lead to
multiples of the same child or if the child is registered cause repeated
attempts at starting this child - until the max restart threshold was
reached.
Now the child that failed to start becomes the restarting child, instead
of staying with the same child, for the next restart attempt. This has
the following side effects:
1) In one_for_all the new version of the child that original died is
terminated before a restart attempt is made.
2) In rest_for_one all succesfully restarted children are not terminated
and restarting continues from the child that failed to start.
|
|
|
|
If a child fails to start, supervisor relies upon error_logger which does not
work when IO is inhibited. Instead pass the error up the chain and let someone
else use a proper Reason for any possible printouts.
|
|
|
|
* rj/fix-supervisor-shutdown-doc:
Fix small typo in kernel app doc
Cosmetic: split very long lines from supervisor doc
Fix supervisor doc: Shutdown, MaxR and MaxT type specs
Add the type restrictions in the code comments
Remove trailing spaces
OTP-9987
|
|
When an attempt to restart a child failed, supervisor would earlier
keep the execution flow and try to restart the child over and over
again until it either succeeded or the restart frequency limit was
reached. If none of these happened, supervisor would hang forever in
this loop.
This commit adds a timer of 0 ms where the control is left back to the
gen_server which implements the supervisor. This way any incoming
request to the supervisor will be handled - which could help breaking
the infinite loop - e.g. shutdown request for the supervisor or for
the problematic child.
This introduces some incompatibilities in stdlib due to new return
values from supervisor:
* restart_child/2 can now return {error,restarting}
* delete_child/2 can now return {error,restarting}
* which_children/1 returns a list of {Id,Child,Type,Mods},
where Child, in addition to the old pid() or 'undefined',
now also can be 'restarting'.
|
|
|
|
|
|
Supervisor should never keep child specs for dead temporary children.
|
|
Dialyzer complained over a mismatch between the callback spec of
Mod:code_change in gen_server and the spec of supervisor:code_change
(which is the implementation of a gen_server Mod:code_change).
This commit changes the callback spec to allow {error,Reason} as
return value. Also, release_handler is updated to handle this return
value.
|
|
|
|
* cf/simple_one_for_one_shutdown:
Explain how dynamic child processes are stopped
Stack errors when dynamic children are stopped
Explicitly kill dynamic children in supervisors
Conflicts:
lib/stdlib/doc/src/supervisor.xml
OTP-9647
|
|
Now, in child specification, the shutdown value can also be set to infinity
for worker children. This restriction was removed because this is not always
possible to predict the shutdown time for a worker. This is highly
application-dependent.
|
|
Replace the behaviour_info(callbacks) export in stdlib's behaviours with
-callback' attributes for all the callbacks.
|
|
Because a simple_one_for_one supervisor can have many workers, we stack
errors during its shutdown to report only one message for each encountered
error type. Instead of reporting the child's pid, we use the number of
concerned children.
|
|
According to the supervisor's documentation:
"Important note on simple-one-for-one supervisors: The dynamically
created child processes of a simple-one-for-one supervisor are not
explicitly killed, regardless of shutdown strategy, but are expected
to terminate when the supervisor does (that is, when an exit signal
from the parent process is received)."
All is fine as long as we stop simple_one_for_one supervisor manually.
Dynamic children catch the exit signal from the supervisor and leave.
But, if this happens when we stop an application, after the top
supervisor has stopped, the application master kills all remaining
processes associated to this application. So, dynamic children that trap
exit signals can be killed during their cleanup (here we mean inside
terminate/2). This is unpredictable and highly time-dependent.
In this commit, supervisor module is patched to explicitly terminate
dynamic children accordingly to the shutdown strategy.
NOTE: Order in which dynamic children are stopped is not defined. In
fact, this is "almost" done at the same time.
|
|
|
|
In the current implementation of supervisors, temporary children
should never be restarted. However, when a temporary child is
restarted as part of a one_for_all or rest_for_one strategy where
the failing process is not the temporary child, the supervisor
still tries to restart it.
Because the supervisor doesn't keep some of the MFA information
of temporary children, this causes the supervisor to hit its
restart limit and crash.
This patch fixes the behaviour by inserting a clause in
terminate_children/2-3 (private function) that will omit temporary
children when building a list of killed processes, to avoid having
the supervisor trying to restart them again.
Only supervisors in need of restarting children used the list, so
the change should be of no impact for the functions that called
terminate_children/2-3 only to kill all children.
The documentation has been modified to make this behaviour
more explicit.
|
|
In R13B proc_lib, gen_server and gen_fsm were all changed to handle
exit reason {shutdown,Term} in the same way as exit reason 'shutdown',
i.e. no crash reports are generated.
This is an update of supervisor to do the same, i.e. handle these two
exit reasons in the same way. This means that for children with
restart type 'transient' there will be no attempt to restart the
process if it terminates with reason {shutdown,Term}, and there will
be no supervisor report.
|
|
|