This section should be read with the
A supervisor is responsible for starting, stopping, and monitoring its child processes. The basic idea of a supervisor is that it is to keep its child processes alive by restarting them when necessary.
Which child processes to start and monitor is specified by a
list of
The callback module for a supervisor starting the server from
-module(ch_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link(ch_sup, []).
init(_Args) ->
SupFlags = #{strategy => one_for_one, intensity => 1, period => 5},
ChildSpecs = [#{id => ch3,
start => {ch3, start_link, []},
restart => permanent,
shutdown => brutal_kill,
type => worker,
modules => [cg3]}],
{ok, {SupFlags, ChildSpecs}}.
The
The
This is the type definition for the supervisor flags:
strategy(), % optional
intensity => non_neg_integer(), % optional
period => pos_integer()} % optional
strategy() = one_for_all
| one_for_one
| rest_for_one
| simple_one_for_one]]>
The restart strategy is specified by
the
SupFlags = #{strategy => Strategy, ...}
The
If a child process terminates, only that process is restarted.
If a child process terminates, all other child processes are terminated, and then all child processes, including the terminated one, are restarted.
If a child process terminates, the rest of the child processes (that is, the child processes after the terminated process in start order) are terminated. Then the terminated child process and the rest of the child processes are restarted.
See
The supervisors have a built-in mechanism to limit the number of
restarts which can occur in a given time interval. This is
specified by the two keys
SupFlags = #{intensity => MaxR, period => MaxT, ...}
If more than
When the supervisor terminates, then the next higher-level supervisor takes some action. It either restarts the terminated supervisor or terminates itself.
The intention of the restart mechanism is to prevent a situation where a process repeatedly dies for the same reason, only to be restarted again.
The keys
The default values are 1 restart per 5 seconds. This was chosen to be safe for most systems, even with deep supervision hierarchies, but you will probably want to tune the settings for your particular use case.
First, the intensity decides how big bursts of restarts you want to tolerate. For example, you might want to accept a burst of at most 5 or 10 attempts, even within the same second, if it results in a successful restart.
Second, you need to consider the sustained failure rate, if crashes keep happening but not often enough to make the supervisor give up. If you set intensity to 10 and set the period as low as 1, the supervisor will allow child processes to keep restarting up to 10 times per second, forever, filling your logs with crash reports until someone intervenes manually.
You should therefore set the period to be long enough that you can accept that the supervisor keeps going at that rate. For example, if you have picked an intensity value of 5, then setting the period to 30 seconds will give you at most one restart per 6 seconds for any longer period of time, which means that your logs won't fill up too quickly, and you will have a chance to observe the failures and apply a fix.
These choices depend a lot on your problem domain. If you don't have real time monitoring and ability to fix problems quickly, for example in an embedded system, you might want to accept at most one restart per minute before the supervisor should give up and escalate to the next level to try to clear the error automatically. On the other hand, if it is more important that you keep trying even at a high failure rate, you might want a sustained rate of as much as 1-2 restarts per second.
Avoiding common mistakes:
Do not forget to consider the burst rate. If you set intensity to 1 and period to 6, it gives the same sustained error rate as 5/30 or 10/60, but will not allow even 2 restart attempts in quick succession. This is probably not what you wanted.
Do not set the period to a very high value if you want to tolerate bursts. If you set intensity to 5 and period to 3600 (one hour), the supervisor will allow a short burst of 5 restarts, but then gives up if it sees another single restart almost an hour later. You probably want to regard those crashes as separate incidents, so setting the period to 5 or 10 minutes will be more reasonable.
If your application has multiple levels of supervision, then do not simply set the restart intensities to the same values on all levels. Keep in mind that the total number of restarts (before the top level supervisor gives up and terminates the application) will be the product of the intensity values of all the supervisors above the failing child process.
For example, if the top level allows 10 restarts, and the next level also allows 10, a crashing child below that level will be restarted 100 times, which is probably excessive. Allowing at most 3 restarts for the top level supervisor might be a better choice in this case.
The type definition for a child specification is as follows:
child_id(), % mandatory
start => mfargs(), % mandatory
restart => restart(), % optional
shutdown => shutdown(), % optional
type => worker(), % optional
modules => modules()} % optional
child_id() = term()
mfargs() = {M :: module(), F :: atom(), A :: [term()]}
modules() = [module()] | dynamic
restart() = permanent | transient | temporary
shutdown() = brutal_kill | timeout()
worker() = worker | supervisor]]>
The
Note that this identifier occasionally has been called "name". As far as possible, the terms "identifier" or "id" are now used but in order to keep backwards compatibility, some occurences of "name" can still be found, for example in error messages.
It is to be (or result in) a call to any of the following:
The
The
Be careful when setting the shutdown time to
The
The
This information is used by the release handler during
upgrades and downgrades, see
The
Example: The child specification to start the server
#{id => ch3,
start => {ch3, start_link, []},
restart => permanent,
shutdown => brutal_kill,
type => worker,
modules => [ch3]}
or simplified, relying on the default values:
#{id => ch3,
start => {ch3, start_link, []}
shutdown => brutal_kill}
Example: A child specification to start the event manager from
the chapter about
#{id => error_man,
start => {gen_event, start_link, [{local, error_man}]},
modules => dynamic}
Both server and event manager are registered processes which
can be expected to be always accessible. Thus they are
specified to be
Example: A child specification to start another supervisor:
#{id => sup,
start => {sup, start_link, []},
restart => transient,
type => supervisor} % will cause default shutdown=>infinity
In the previous example, the supervisor is started by calling
start_link() ->
supervisor:start_link(ch_sup, []).
In this case, the supervisor is not registered. Instead its pid
must be used. A name can be specified by calling
The new supervisor process calls the callback function
init(_Args) ->
SupFlags = #{},
ChildSpecs = [#{id => ch3,
start => {ch3, start_link, []},
shutdown => brutal_kill}],
{ok, {SupFlags, ChildSpecs}}.
The supervisor then starts all its child processes according to
the child specifications in the start specification. In this case
there is one child process,
In addition to the static supervision tree, dynamic child processes can be added to an existing supervisor with the following call:
supervisor:start_child(Sup, ChildSpec)
Child processes added using
Any child process, static or dynamic, can be stopped in accordance with the shutdown specification:
supervisor:terminate_child(Sup, Id)
The child specification for a stopped child process is deleted with the following call:
supervisor:delete_child(Sup, Id)
As with dynamically added child processes, the effects of deleting a static child process is lost if the supervisor itself restarts.
A supervisor with restart strategy
The following is an example of a callback module for a
-module(simple_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).
start_link() ->
supervisor:start_link(simple_sup, []).
init(_Args) ->
SupFlags = #{strategy => simple_one_for_one,
intensity => 0,
period => 1},
ChildSpecs = [#{id => call,
start => {call, start_link, []},
shutdown => brutal_kill}],
{ok, {SupFlags, ChildSpecs}}.
When started, the supervisor does not start any child processes. Instead, all child processes are added dynamically by calling:
supervisor:start_child(Sup, List)
For example, adding a child to
supervisor:start_child(Pid, [id1])
The result is that the child process is started by calling
call:start_link(id1)
A child under a
supervisor:terminate_child(Sup, Pid)
Because a
Since the supervisor is part of a supervision tree, it is automatically terminated by its supervisor. When asked to shut down, it terminates all child processes in reversed start order according to the respective shutdown specifications, and then terminates itself.