Age | Commit message (Collapse) | Author |
|
Add additional details on the behavior of supervisors when
reaching maximum restart intensity, as stated by @rvirding
at [Medium](https://goo.gl/XhwpSL)
|
|
Language cleaned up by the technical writers xsipewe and tmanevik
from Combitech. Proofreading and corrections by Björn Gustavsson
and Hans Bolinder.
|
|
* raimo/new-gen-state-machine/OTP-13065: (52 commits)
Add section on state filtering
Promote gen_statem over gen_fsm
Modify code_change/4 to return CallbackMode
Use ?NAME macro in examples
Introduce Fred Herbert suggested additions
Introduce corrections from Fred Hebert and Ingela
Use .png pictures instead of .gif
Write Design Principles chapter
Fix missing short forms for event timeout
Do more intricate Fred Hebert doc changes
Change Caller -> From as suggested by Fred Hebert
Do documentation improvements from Fred Hebert
Fix broken documenation reference
Rename state_timeout -> event_timeout
Fix most of the system docs and emacs mode
Change code_change/4 to {ok,State,Data}
Fixup sharpened test suite
Sharpen test suite
Remove the remove_event action and all alike
Relax caller() type check and cleanup
...
Conflicts:
lib/stdlib/src/gen.erl
lib/stdlib/src/gen_event.erl
lib/stdlib/src/gen_fsm.erl
lib/stdlib/src/gen_server.erl
lib/stdlib/test/error_logger_forwarder.erl
|
|
|
|
|
|
Speed up supervisor:count_children/1 for simple_one_for_one
supervisors. This is achieved by avoiding looping through all the
child process and verifying that each one is alive.
For a supervisor with 100,000 'temporary' children the count-time will
drop from approx 25ms to about 0.005ms.
For a supervisor with 100,000 'permanent' or 'transient' children the
count-time will drop from approx 30ms to about 0.005ms.
This avoids having the supervisor block for an extended period while
the count takes place. Under normal circumstances the accuracy of the
result should also improve since the duration is too short for many
processes to die during the count.
|
|
Fix mistakes found by 'xmllint'.
|
|
|
|
If a child of a simple_one_for_one returns ignore from its start
function no longer store the child for any restart type. It is not
possible to restart or delete the child because the supervisor is a
simple_one_for_one.
Previously a simple_one_for_one would crash, potentially without
shutting down all of its children, when it tried to shutdown a child
with undefined pid.
Previous only one undefined pid child was stored in a simple_one_for_one
supervisor no matter how many times the child start function returned
ignore.
|
|
|
|
Most of the updates have already been made in
'Fix alternative registry type annotations in supervisor',
a5412706f4185fddbac29216a49affd1e9f11da0.
Thanks to MaximMinin.
|
|
|
|
`I` should be `If`
|
|
|
|
|
|
Since commit 47759479146ca11ad81eca0bb3236b265e20601d,
simple-one-for-one supervisors _do_ kill their children explicitly on
shutdown. That commit also removed this note, but it seems like the
merge commit 45b4d5309e0686cc5fa28506de76f75b598bbd95 incorrectly
reinstated it.
|
|
|
|
If a child fails to start, supervisor relies upon error_logger which does not
work when IO is inhibited. Instead pass the error up the chain and let someone
else use a proper Reason for any possible printouts.
|
|
|
|
* rj/fix-supervisor-shutdown-doc:
Fix small typo in kernel app doc
Cosmetic: split very long lines from supervisor doc
Fix supervisor doc: Shutdown, MaxR and MaxT type specs
Add the type restrictions in the code comments
Remove trailing spaces
OTP-9987
|
|
When an attempt to restart a child failed, supervisor would earlier
keep the execution flow and try to restart the child over and over
again until it either succeeded or the restart frequency limit was
reached. If none of these happened, supervisor would hang forever in
this loop.
This commit adds a timer of 0 ms where the control is left back to the
gen_server which implements the supervisor. This way any incoming
request to the supervisor will be handled - which could help breaking
the infinite loop - e.g. shutdown request for the supervisor or for
the problematic child.
This introduces some incompatibilities in stdlib due to new return
values from supervisor:
* restart_child/2 can now return {error,restarting}
* delete_child/2 can now return {error,restarting}
* which_children/1 returns a list of {Id,Child,Type,Mods},
where Child, in addition to the old pid() or 'undefined',
now also can be 'restarting'.
|
|
|
|
* uw/extending_gen:
Add plugin support for alternative name lookup
OTP-9945
|
|
The next code snippets from supervisor.erl show that Shutdown from a
child specification must be greater than zero and the same applies to
MaxT.
--- supervisor.erl ----------------------------------------------------------
validShutdown(Shutdown, _)
when is_integer(Shutdown), Shutdown > 0 -> true;
validShutdown(infinity, _) -> true;
validShutdown(brutal_kill, _) -> true;
validShutdown(Shutdown, _) -> throw({invalid_shutdown, Shutdown}).
validIntensity(Max) when is_integer(Max),
Max >= 0 -> true;
validIntensity(What) -> throw({invalid_intensity, What}).
validPeriod(Period) when is_integer(Period),
Period > 0 -> true;
validPeriod(What) -> throw({invalid_period, What}).
-----------------------------------------------------------------------------
|
|
Supervisor should never keep child specs for dead temporary children.
|
|
OTP behaviour instances (gen_server, gen_fsm, gen_event) can currently
register themselves either locally or globally, and the behaviour
libraries (including gen.erl) support both addressing methods, as well
as the normal Pid and {Name, Node}.
However, there are alternative registry implementations - e.g. gproc -
and one can well imagine other ways of locating a behaviour instance,
e.g. on a node connected only via a TCP tunnel, rather than via
Distributed Erlang. In all these cases, one needs to write extra code
to identify the behaviour instance, even though the instance itself
need not be aware of how it is located.
This patch introduces a new way of locating a behaviour instance:
{via, Module, Name}.
Module is expected to export a subset of the functions in global.erl,
namely:
register_name(Name, Pid) -> yes | no
whereis_name(Name) -> pid() | undefined
unregister_name(Name) -> ok
send(Name, Msg) -> Pid
Semantics are expected to be the same as for global.erl
This can be used in all places where {global, Name} is accepted.
faulty export in gen_fsm_SUITE.erl
await process death in dummy_via:reset()
fix error in gen_[server|fsm]:enter_loop()
fix documentation
|
|
|
|
* cf/simple_one_for_one_shutdown:
Explain how dynamic child processes are stopped
Stack errors when dynamic children are stopped
Explicitly kill dynamic children in supervisors
Conflicts:
lib/stdlib/doc/src/supervisor.xml
OTP-9647
|
|
|
|
Now, in child specification, the shutdown value can also be set to infinity
for worker children. This restriction was removed because this is not always
possible to predict the shutdown time for a worker. This is highly
application-dependent.
|
|
|
|
|
|
According to the supervisor's documentation:
"Important note on simple-one-for-one supervisors: The dynamically
created child processes of a simple-one-for-one supervisor are not
explicitly killed, regardless of shutdown strategy, but are expected
to terminate when the supervisor does (that is, when an exit signal
from the parent process is received)."
All is fine as long as we stop simple_one_for_one supervisor manually.
Dynamic children catch the exit signal from the supervisor and leave.
But, if this happens when we stop an application, after the top
supervisor has stopped, the application master kills all remaining
processes associated to this application. So, dynamic children that trap
exit signals can be killed during their cleanup (here we mean inside
terminate/2). This is unpredictable and highly time-dependent.
In this commit, supervisor module is patched to explicitly terminate
dynamic children accordingly to the shutdown strategy.
NOTE: Order in which dynamic children are stopped is not defined. In
fact, this is "almost" done at the same time.
|
|
In the current implementation of supervisors, temporary children
should never be restarted. However, when a temporary child is
restarted as part of a one_for_all or rest_for_one strategy where
the failing process is not the temporary child, the supervisor
still tries to restart it.
Because the supervisor doesn't keep some of the MFA information
of temporary children, this causes the supervisor to hit its
restart limit and crash.
This patch fixes the behaviour by inserting a clause in
terminate_children/2-3 (private function) that will omit temporary
children when building a list of killed processes, to avoid having
the supervisor trying to restart them again.
Only supervisors in need of restarting children used the list, so
the change should be of no impact for the functions that called
terminate_children/2-3 only to kill all children.
The documentation has been modified to make this behaviour
more explicit.
|
|
Use Erlang specs and types for documentation
|
|
supervisor:terminate_child/2 was not allowed if the supervisor used
restart strategy simple_one_for_one. This is now changed so that
children of this type of supervisors can be terminated by specifying
the child's Pid.
|
|
Make it explicit that the shutdown timeout is to be specified in
milliseconds.
|
|
* jn/supervisor_child_count_only:
Add count_children/1 to supervisor.erl to determine the number of
OTP-8436 Added supervisor:count_children/1 to count the number of children
being managed without the memory impact of which_children/1.
(Thanks to Jay Nelson.)
|
|
children being managed without the memory impact of which_children/1
The function which_children/1 returns a list of the child processes
currently being supervised, but it has the penalty of creating a new
list thereby consuming more memory. In low memory situations it is
often desirable to know which supervisor may have generated many
processes, but the act of discovering the culprit should not cause the
node to crash (or worse a different node if the kernel kills one
randomly). The new function count_children/1 can give an indication
of which supervisor is taxing resources the most without adding to the
burden. Rather than creating a new list, it walks the supervisor's
internal children structure using an accumulator function so that any
used memory can be incrementally collected yet the resulting count can
still be obtained.
The return result of count_children/1 is a property list of counts
containing:
- {specs, Total_Num_Child_Specs}
- {active, Num_Active_Child_Processes_Of_Supervisor_Or_Worker_Type}
- {supervisors, Num_Supervisor_Type_Children_Including_Dead_Processes}
- {workers, Num_Worker_Type_Children_Including_Dead_Processes}
This patch was made in response to mailing list discussions of the
problem diagnosing heavily taxed production systems. I cannot find
the original request, but http://www.erlang.org/cgi-bin/ezmlm-cgi/4/35060
is my original post of the patch.
|
|
|