Age | Commit message (Collapse) | Author |
|
|
|
|
|
In rest_for_one and one_for_all supervisors one child dying can cause
multiple children to be restarted. Previously if the child that caused
the restart is started successfully but another child fails to start,
the supervisor would not terminate this child with the other
successfully restarted children as no record of the pid was kept. Thus
the supervisor would try to start this child again. This could lead to
multiples of the same child or if the child is registered cause repeated
attempts at starting this child - until the max restart threshold was
reached.
Now the child that failed to start becomes the restarting child, instead
of staying with the same child, for the next restart attempt. This has
the following side effects:
1) In one_for_all the new version of the child that original died is
terminated before a restart attempt is made.
2) In rest_for_one all succesfully restarted children are not terminated
and restarting continues from the child that failed to start.
|
|
|
|
If a child fails to start, supervisor relies upon error_logger which does not
work when IO is inhibited. Instead pass the error up the chain and let someone
else use a proper Reason for any possible printouts.
|
|
|
|
* rj/fix-supervisor-shutdown-doc:
Fix small typo in kernel app doc
Cosmetic: split very long lines from supervisor doc
Fix supervisor doc: Shutdown, MaxR and MaxT type specs
Add the type restrictions in the code comments
Remove trailing spaces
OTP-9987
|
|
When an attempt to restart a child failed, supervisor would earlier
keep the execution flow and try to restart the child over and over
again until it either succeeded or the restart frequency limit was
reached. If none of these happened, supervisor would hang forever in
this loop.
This commit adds a timer of 0 ms where the control is left back to the
gen_server which implements the supervisor. This way any incoming
request to the supervisor will be handled - which could help breaking
the infinite loop - e.g. shutdown request for the supervisor or for
the problematic child.
This introduces some incompatibilities in stdlib due to new return
values from supervisor:
* restart_child/2 can now return {error,restarting}
* delete_child/2 can now return {error,restarting}
* which_children/1 returns a list of {Id,Child,Type,Mods},
where Child, in addition to the old pid() or 'undefined',
now also can be 'restarting'.
|
|
|
|
|
|
Supervisor should never keep child specs for dead temporary children.
|
|
Dialyzer complained over a mismatch between the callback spec of
Mod:code_change in gen_server and the spec of supervisor:code_change
(which is the implementation of a gen_server Mod:code_change).
This commit changes the callback spec to allow {error,Reason} as
return value. Also, release_handler is updated to handle this return
value.
|
|
|
|
* cf/simple_one_for_one_shutdown:
Explain how dynamic child processes are stopped
Stack errors when dynamic children are stopped
Explicitly kill dynamic children in supervisors
Conflicts:
lib/stdlib/doc/src/supervisor.xml
OTP-9647
|
|
Now, in child specification, the shutdown value can also be set to infinity
for worker children. This restriction was removed because this is not always
possible to predict the shutdown time for a worker. This is highly
application-dependent.
|
|
Replace the behaviour_info(callbacks) export in stdlib's behaviours with
-callback' attributes for all the callbacks.
|
|
Because a simple_one_for_one supervisor can have many workers, we stack
errors during its shutdown to report only one message for each encountered
error type. Instead of reporting the child's pid, we use the number of
concerned children.
|
|
According to the supervisor's documentation:
"Important note on simple-one-for-one supervisors: The dynamically
created child processes of a simple-one-for-one supervisor are not
explicitly killed, regardless of shutdown strategy, but are expected
to terminate when the supervisor does (that is, when an exit signal
from the parent process is received)."
All is fine as long as we stop simple_one_for_one supervisor manually.
Dynamic children catch the exit signal from the supervisor and leave.
But, if this happens when we stop an application, after the top
supervisor has stopped, the application master kills all remaining
processes associated to this application. So, dynamic children that trap
exit signals can be killed during their cleanup (here we mean inside
terminate/2). This is unpredictable and highly time-dependent.
In this commit, supervisor module is patched to explicitly terminate
dynamic children accordingly to the shutdown strategy.
NOTE: Order in which dynamic children are stopped is not defined. In
fact, this is "almost" done at the same time.
|
|
|
|
In the current implementation of supervisors, temporary children
should never be restarted. However, when a temporary child is
restarted as part of a one_for_all or rest_for_one strategy where
the failing process is not the temporary child, the supervisor
still tries to restart it.
Because the supervisor doesn't keep some of the MFA information
of temporary children, this causes the supervisor to hit its
restart limit and crash.
This patch fixes the behaviour by inserting a clause in
terminate_children/2-3 (private function) that will omit temporary
children when building a list of killed processes, to avoid having
the supervisor trying to restart them again.
Only supervisors in need of restarting children used the list, so
the change should be of no impact for the functions that called
terminate_children/2-3 only to kill all children.
The documentation has been modified to make this behaviour
more explicit.
|
|
In R13B proc_lib, gen_server and gen_fsm were all changed to handle
exit reason {shutdown,Term} in the same way as exit reason 'shutdown',
i.e. no crash reports are generated.
This is an update of supervisor to do the same, i.e. handle these two
exit reasons in the same way. This means that for children with
restart type 'transient' there will be no attempt to restart the
process if it terminates with reason {shutdown,Term}, and there will
be no supervisor report.
|
|
|
|
Since initial arguments of temporary children under simple_one_for_one
supervisors are not saved, only a list of pids was stored in such
supervisors. When adding/deleting many children, this would scale
badly. To avoid this the list is now changed to a set.
|
|
supervisor:terminate_child/2 was not allowed if the supervisor used
restart strategy simple_one_for_one. This is now changed so that
children of this type of supervisors can be terminated by specifying
the child's Pid.
|
|
terminates" and improved test suite
The bug fix supplied by Filipe David Manana <[email protected]>
did not cover all possible ways that a process may be terminated
as for instance with supervisor:terminate_child. Also there
was a bug in the base case of the patch returning a list of a list instead
of only the list.
Added a timeout for the test cases, eliminated unnecessary sleeps,
improved code.
|
|
The temporary child specs are never removed from the supervisor's state, and
have they're MFA component set to {M, F, undefined} instead of the MFA passed
in the supervisor:start_child/2 call. Subsequent calls to supervisor:restart_child/2
may crash. Stack trace example:
{badarg,[{erlang,apply,[gen_server,start_link,undefined]},
{supervisor,do_start_child,2},{supervisor,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
|
|
dialyzer spec.
|
|
Previous commit changed the supervisor to not save
parameter lists for temporary processes supervised by simple-one-for-one
supervisors. But it is unnecessary to save them for any
temporary processes as they should not be restarted. Proably the
biggest gain is in the simple-one-for-one case.
Also changed the test case count_children_memory so it does
not test that which_children will produce garbage that must
be reclaimed later. This is a strange thing to test and it is no
longer true for all invocations of which_children.
|
|
|
|
|
|
- Export two more types so that they can be used in other modules
- Correct some types and specs
- Add spec for behaviour_info/1
|
|
|
|
|
|
* jn/supervisor_child_count_only:
Add count_children/1 to supervisor.erl to determine the number of
OTP-8436 Added supervisor:count_children/1 to count the number of children
being managed without the memory impact of which_children/1.
(Thanks to Jay Nelson.)
|
|
children being managed without the memory impact of which_children/1
The function which_children/1 returns a list of the child processes
currently being supervised, but it has the penalty of creating a new
list thereby consuming more memory. In low memory situations it is
often desirable to know which supervisor may have generated many
processes, but the act of discovering the culprit should not cause the
node to crash (or worse a different node if the kernel kills one
randomly). The new function count_children/1 can give an indication
of which supervisor is taxing resources the most without adding to the
burden. Rather than creating a new list, it walks the supervisor's
internal children structure using an accumulator function so that any
used memory can be incrementally collected yet the resulting count can
still be obtained.
The return result of count_children/1 is a property list of counts
containing:
- {specs, Total_Num_Child_Specs}
- {active, Num_Active_Child_Processes_Of_Supervisor_Or_Worker_Type}
- {supervisors, Num_Supervisor_Type_Children_Including_Dead_Processes}
- {workers, Num_Worker_Type_Children_Including_Dead_Processes}
This patch was made in response to mailing list discussions of the
problem diagnosing heavily taxed production systems. I cannot find
the original request, but http://www.erlang.org/cgi-bin/ezmlm-cgi/4/35060
is my original post of the patch.
|
|
|