diff options
Diffstat (limited to 'system/doc/design_principles')
-rw-r--r-- | system/doc/design_principles/fsm.xml | 338 | ||||
-rw-r--r-- | system/doc/design_principles/part.xml | 1 | ||||
-rw-r--r-- | system/doc/design_principles/spec_proc.xml | 86 | ||||
-rw-r--r-- | system/doc/design_principles/statem.xml | 113 | ||||
-rw-r--r-- | system/doc/design_principles/sup_princ.xml | 70 | ||||
-rw-r--r-- | system/doc/design_principles/xmlfiles.mk | 1 |
6 files changed, 203 insertions, 406 deletions
diff --git a/system/doc/design_principles/fsm.xml b/system/doc/design_principles/fsm.xml deleted file mode 100644 index 4f2b75e6e8..0000000000 --- a/system/doc/design_principles/fsm.xml +++ /dev/null @@ -1,338 +0,0 @@ -<?xml version="1.0" encoding="utf-8" ?> -<!DOCTYPE chapter SYSTEM "chapter.dtd"> - -<chapter> - <header> - <copyright> - <year>1997</year><year>2016</year> - <holder>Ericsson AB. All Rights Reserved.</holder> - </copyright> - <legalnotice> - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. - - </legalnotice> - - <title>gen_fsm Behaviour</title> - <prepared></prepared> - <docno></docno> - <date></date> - <rev></rev> - <file>fsm.xml</file> - </header> - <marker id="gen_fsm behaviour"></marker> - <note> - <p> - There is a new behaviour - <seealso marker="statem"><c>gen_statem</c></seealso> - that is intended to replace <c>gen_fsm</c> for new code. - It has the same features and add some really useful. - This module will not be removed for the foreseeable future - to keep old state machine implementations running. - </p> - </note> - <p>This section is to be read with the <c>gen_fsm(3)</c> manual page - in STDLIB, where all interface functions and callback - functions are described in detail.</p> - - <section> - <title>Finite-State Machines</title> - <p>A Finite-State Machine (FSM) can be described as a set of - relations of the form:</p> - <pre> -State(S) x Event(E) -> Actions(A), State(S')</pre> - <p>These relations are interpreted as meaning:</p> - <quote> - <p>If we are in state <c>S</c> and event <c>E</c> occurs, we - are to perform actions <c>A</c> and make a transition to - state <c>S'</c>.</p> - </quote> - <p>For an FSM implemented using the <c>gen_fsm</c> behaviour, - the state transition rules are written as a number of Erlang - functions, which conform to the following convention:</p> - <pre> -StateName(Event, StateData) -> - .. code for actions here ... - {next_state, StateName', StateData'}</pre> - </section> - - <section> - <title>Example</title> - <p>A door with a code lock can be viewed as an FSM. Initially, - the door is locked. Anytime someone presses a button, this - generates an event. Depending on what buttons have been pressed - before, the sequence so far can be correct, incomplete, or wrong.</p> - <p>If it is correct, the door is unlocked for 30 seconds (30,000 ms). - If it is incomplete, we wait for another button to be pressed. If - it is is wrong, we start all over, waiting for a new button - sequence.</p> - <p>Implementing the code lock FSM using <c>gen_fsm</c> results in - the following callback module:</p> - <marker id="ex"></marker> - <code type="none"><![CDATA[ --module(code_lock). --behaviour(gen_fsm). - --export([start_link/1]). --export([button/1]). --export([init/1, locked/2, open/2]). - -start_link(Code) -> - gen_fsm:start_link({local, code_lock}, code_lock, lists:reverse(Code), []). - -button(Digit) -> - gen_fsm:send_event(code_lock, {button, Digit}). - -init(Code) -> - {ok, locked, {[], Code}}. - -locked({button, Digit}, {SoFar, Code}) -> - case [Digit|SoFar] of - Code -> - do_unlock(), - {next_state, open, {[], Code}, 30000}; - Incomplete when length(Incomplete)<length(Code) -> - {next_state, locked, {Incomplete, Code}}; - _Wrong -> - {next_state, locked, {[], Code}} - end. - -open(timeout, State) -> - do_lock(), - {next_state, locked, State}.]]></code> - <p>The code is explained in the next sections.</p> - </section> - - <section> - <title>Starting gen_fsm</title> - <p>In the example in the previous section, the <c>gen_fsm</c> is - started by calling <c>code_lock:start_link(Code)</c>:</p> - <code type="none"> -start_link(Code) -> - gen_fsm:start_link({local, code_lock}, code_lock, lists:reverse(Code), []). - </code> - <p><c>start_link</c> calls the function <c>gen_fsm:start_link/4</c>, - which spawns and links to a new process, a <c>gen_fsm</c>.</p> - <list type="bulleted"> - <item> - <p>The first argument, <c>{local, code_lock}</c>, specifies - the name. In this case, the <c>gen_fsm</c> is locally - registered as <c>code_lock</c>.</p> - <p>If the name is omitted, the <c>gen_fsm</c> is not registered. - Instead its pid must be used. The name can also be given - as <c>{global, Name}</c>, in which case the <c>gen_fsm</c> is - registered using <c>global:register_name/2</c>.</p> - </item> - <item> - <p>The second argument, <c>code_lock</c>, is the name of - the callback module, that is, the module where the callback - functions are located.</p> - <p>The interface functions (<c>start_link</c> and <c>button</c>) - are then located in the same module as the callback - functions (<c>init</c>, <c>locked</c>, and <c>open</c>). This - is normally good programming practice, to have the code - corresponding to one process contained in one module.</p> - </item> - <item> - <p>The third argument, <c>Code</c>, is a list of digits that - which is passed reversed to the callback function <c>init</c>. - Here, <c>init</c> - gets the correct code for the lock as indata.</p> - </item> - <item> - <p>The fourth argument, <c>[]</c>, is a list of options. See - the <c>gen_fsm(3)</c> manual page for available options.</p> - </item> - </list> - <p>If name registration succeeds, the new <c>gen_fsm</c> process calls - the callback function <c>code_lock:init(Code)</c>. This function - is expected to return <c>{ok, StateName, StateData}</c>, where - <c>StateName</c> is the name of the initial state of the - <c>gen_fsm</c>. In this case <c>locked</c>, assuming the door is - locked to begin with. <c>StateData</c> is the internal state of - the <c>gen_fsm</c>. (For <c>gen_fsm</c>, the internal state is - often referred to 'state data' to - distinguish it from the state as in states of a state machine.) - In this case, the state data is the button sequence so far (empty - to begin with) and the correct code of the lock.</p> - <code type="none"> -init(Code) -> - {ok, locked, {[], Code}}.</code> - <p><c>gen_fsm:start_link</c> is synchronous. It does not return until - the <c>gen_fsm</c> has been initialized and is ready to - receive notifications.</p> - <p><c>gen_fsm:start_link</c> must be used if the <c>gen_fsm</c> is - part of a supervision tree, that is, started by a supervisor. There - is another function, <c>gen_fsm:start</c>, to start a standalone - <c>gen_fsm</c>, that is, a <c>gen_fsm</c> that is not part of a - supervision tree.</p> - </section> - - <section> - <title>Notifying about Events</title> - <p>The function notifying the code lock about a button event is - implemented using <c>gen_fsm:send_event/2</c>:</p> - <code type="none"> -button(Digit) -> - gen_fsm:send_event(code_lock, {button, Digit}).</code> - <p><c>code_lock</c> is the name of the <c>gen_fsm</c> and must - agree with the name used to start it. - <c>{button, Digit}</c> is the actual event.</p> - <p>The event is made into a message and sent to the <c>gen_fsm</c>. - When the event is received, the <c>gen_fsm</c> calls - <c>StateName(Event, StateData)</c>, which is expected to return a - tuple <c>{next_state,StateName1,StateData1}</c>. - <c>StateName</c> is the name of the current state and - <c>StateName1</c> is the name of the next state to go to. - <c>StateData1</c> is a new value for the state data of - the <c>gen_fsm</c>.</p> - <code type="none"><![CDATA[ -locked({button, Digit}, {SoFar, Code}) -> - case [Digit|SoFar] of - Code -> - do_unlock(), - {next_state, open, {[], Code}, 30000}; - Incomplete when length(Incomplete)<length(Code) -> - {next_state, locked, {Incomplete, Code}}; - _Wrong -> - {next_state, locked, {[], Code}}; - end. - -open(timeout, State) -> - do_lock(), - {next_state, locked, State}.]]></code> - <p>If the door is locked and a button is pressed, the complete - button sequence so far is compared with the correct code for - the lock and, depending on the result, the door is either unlocked - and the <c>gen_fsm</c> goes to state <c>open</c>, or the door - remains in state <c>locked</c>.</p> - </section> - - <section> - <title>Time-Outs</title> - <p>When a correct code has been given, the door is unlocked and - the following tuple is returned from <c>locked/2</c>:</p> - <code type="none"> -{next_state, open, {[], Code}, 30000};</code> - <p>30,000 is a time-out value in milliseconds. After this time, - that is, 30 seconds, a time-out occurs. Then, - <c>StateName(timeout, StateData)</c> is called. The time-out - then occurs when the door has been in state <c>open</c> for 30 - seconds. After that the door is locked again:</p> - <code type="none"> -open(timeout, State) -> - do_lock(), - {next_state, locked, State}.</code> - </section> - - <section> - <title>All State Events</title> - <p>Sometimes an event can arrive at any state of the <c>gen_fsm</c>. - Instead of sending the message with <c>gen_fsm:send_event/2</c> - and writing one clause handling the event for each state function, - the message can be sent with <c>gen_fsm:send_all_state_event/2</c> - and handled with <c>Module:handle_event/3</c>:</p> - <code type="none"> --module(code_lock). -... --export([stop/0]). -... - -stop() -> - gen_fsm:send_all_state_event(code_lock, stop). - -... - -handle_event(stop, _StateName, StateData) -> - {stop, normal, StateData}.</code> - </section> - - <section> - <title>Stopping</title> - - <section> - <title>In a Supervision Tree</title> - <p>If the <c>gen_fsm</c> is part of a supervision tree, no stop - function is needed. The <c>gen_fsm</c> is automatically - terminated by its supervisor. Exactly how this is done is - defined by a - <seealso marker="sup_princ#shutdown">shutdown strategy</seealso> - set in the supervisor.</p> - <p>If it is necessary to clean up before termination, the shutdown - strategy must be a time-out value and the <c>gen_fsm</c> must be - set to trap exit signals in the <c>init</c> function. When ordered - to shutdown, the <c>gen_fsm</c> then calls the callback function - <c>terminate(shutdown, StateName, StateData)</c>:</p> - <code type="none"> -init(Args) -> - ..., - process_flag(trap_exit, true), - ..., - {ok, StateName, StateData}. - -... - -terminate(shutdown, StateName, StateData) -> - ..code for cleaning up here.. - ok.</code> - </section> - - <section> - <title>Standalone gen_fsm</title> - <p>If the <c>gen_fsm</c> is not part of a supervision tree, a stop - function can be useful, for example:</p> - <code type="none"> -... --export([stop/0]). -... - -stop() -> - gen_fsm:send_all_state_event(code_lock, stop). -... - -handle_event(stop, _StateName, StateData) -> - {stop, normal, StateData}. - -... - -terminate(normal, _StateName, _StateData) -> - ok.</code> - <p>The callback function handling the <c>stop</c> event returns a - tuple, <c>{stop,normal,StateData1}</c>, where <c>normal</c> - specifies that it is a normal termination and <c>StateData1</c> - is a new value for the state data of the <c>gen_fsm</c>. This - causes the <c>gen_fsm</c> to call - <c>terminate(normal,StateName,StateData1)</c> and then - it terminates gracefully:</p> - </section> - </section> - - <section> - <title>Handling Other Messages</title> - <p>If the <c>gen_fsm</c> is to be able to receive other messages - than events, the callback function - <c>handle_info(Info, StateName, StateData)</c> must be implemented - to handle them. Examples of - other messages are exit messages, if the <c>gen_fsm</c> is linked to - other processes (than the supervisor) and trapping exit signals.</p> - <code type="none"> -handle_info({'EXIT', Pid, Reason}, StateName, StateData) -> - ..code to handle exits here.. - {next_state, StateName1, StateData1}.</code> - <p>The code_change method must also be implemented.</p> - <code type="none"> -code_change(OldVsn, StateName, StateData, Extra) -> - ..code to convert state (and more) during code change - {ok, NextStateName, NewStateData}</code> - </section> -</chapter> - diff --git a/system/doc/design_principles/part.xml b/system/doc/design_principles/part.xml index 6495211e04..d52070a674 100644 --- a/system/doc/design_principles/part.xml +++ b/system/doc/design_principles/part.xml @@ -30,7 +30,6 @@ </header> <xi:include href="des_princ.xml"/> <xi:include href="gen_server_concepts.xml"/> - <xi:include href="fsm.xml"/> <xi:include href="statem.xml"/> <xi:include href="events.xml"/> <xi:include href="sup_princ.xml"/> diff --git a/system/doc/design_principles/spec_proc.xml b/system/doc/design_principles/spec_proc.xml index 5b156ac263..d663c5df79 100644 --- a/system/doc/design_principles/spec_proc.xml +++ b/system/doc/design_principles/spec_proc.xml @@ -45,61 +45,63 @@ <p>The <c>sys</c> module has functions for simple debugging of processes implemented using behaviours. The <c>code_lock</c> example from - <seealso marker="fsm#ex">gen_fsm Behaviour</seealso> + <seealso marker="statem#Example">gen_statem Behaviour</seealso> is used to illustrate this:</p> <pre> -% <input>erl</input> -Erlang (BEAM) emulator version 5.2.3.6 [hipe] [threads:0] +Erlang/OTP 20 [DEVELOPMENT] [erts-9.0] [source-5ace45e] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe] [kernel-poll:false] -Eshell V5.2.3.6 (abort with ^G) -1> <input>code_lock:start_link([1,2,3,4]).</input> -{ok,<0.32.0>} -2> <input>sys:statistics(code_lock, true).</input> +Eshell V9.0 (abort with ^G) +1> code_lock:start_link([1,2,3,4]). +Lock +{ok,<0.63.0>} +2> sys:statistics(code_lock, true). ok -3> <input>sys:trace(code_lock, true).</input> +3> sys:trace(code_lock, true). ok -4> <input>code_lock:button(4).</input> -*DBG* code_lock got event {button,4} in state closed +4> code_lock:button(1). +*DBG* code_lock receive cast {button,1} in state locked ok -*DBG* code_lock switched to state closed -5> <input>code_lock:button(3).</input> -*DBG* code_lock got event {button,3} in state closed +*DBG* code_lock consume cast {button,1} in state locked +5> code_lock:button(2). +*DBG* code_lock receive cast {button,2} in state locked ok -*DBG* code_lock switched to state closed -6> <input>code_lock:button(2).</input> -*DBG* code_lock got event {button,2} in state closed +*DBG* code_lock consume cast {button,2} in state locked +6> code_lock:button(3). +*DBG* code_lock receive cast {button,3} in state locked ok -*DBG* code_lock switched to state closed -7> <input>code_lock:button(1).</input> -*DBG* code_lock got event {button,1} in state closed +*DBG* code_lock consume cast {button,3} in state locked +7> code_lock:button(4). +*DBG* code_lock receive cast {button,4} in state locked ok -OPEN DOOR -*DBG* code_lock switched to state open -*DBG* code_lock got event timeout in state open -CLOSE DOOR -*DBG* code_lock switched to state closed -8> <input>sys:statistics(code_lock, get).</input> -{ok,[{start_time,{{2003,6,12},{14,11,40}}}, - {current_time,{{2003,6,12},{14,12,14}}}, - {reductions,333}, +Unlock +*DBG* code_lock consume cast {button,4} in state locked +*DBG* code_lock receive state_timeout lock in state open +Lock +*DBG* code_lock consume state_timeout lock in state open +8> sys:statistics(code_lock, get). +{ok,[{start_time,{{2017,4,21},{16,8,7}}}, + {current_time,{{2017,4,21},{16,9,42}}}, + {reductions,2973}, {messages_in,5}, {messages_out,0}]} -9> <input>sys:statistics(code_lock, false).</input> +9> sys:statistics(code_lock, false). ok -10> <input>sys:trace(code_lock, false).</input> +10> sys:trace(code_lock, false). ok -11> <input>sys:get_status(code_lock).</input> -{status,<0.32.0>, - {module,gen_fsm}, - [[{'$ancestors',[<0.30.0>]}, - {'$initial_call',{gen,init_it, - [gen_fsm,<0.30.0>,<0.30.0>, - {local,code_lock}, - code_lock, - [1,2,3,4], - []]}}], - running,<0.30.0>,[], - [code_lock,closed,{[],[1,2,3,4]},code_lock,infinity]]}</pre> +11> sys:get_status(code_lock). +{status,<0.63.0>, + {module,gen_statem}, + [[{'$initial_call',{code_lock,init,1}}, + {'$ancestors',[<0.61.0>]}], + running,<0.61.0>,[], + [{header,"Status for state machine code_lock"}, + {data,[{"Status",running}, + {"Parent",<0.61.0>}, + {"Logged Events",[]}, + {"Postponed",[]}]}, + {data,[{"State", + {locked,#{code => [1,2,3,4],remaining => [1,2,3,4]}}}]}]]} + </pre> </section> <section> diff --git a/system/doc/design_principles/statem.xml b/system/doc/design_principles/statem.xml index 22b622ec5f..0667af7868 100644 --- a/system/doc/design_principles/statem.xml +++ b/system/doc/design_principles/statem.xml @@ -293,6 +293,13 @@ StateName(EventType, EventContent, Data) -> <seealso marker="#State Time-Outs">State Time-Outs</seealso> </item> <item> + Start a + <seealso marker="stdlib:gen_statem#type-generic_timeout"> + generic time-out</seealso>, + read more in section + <seealso marker="#Generic Time-Outs">Generic Time-Outs</seealso> + </item> + <item> Start an <seealso marker="stdlib:gen_statem#type-event_timeout">event time-out</seealso>, see more in section @@ -320,8 +327,9 @@ StateName(EventType, EventContent, Data) -> <c>gen_statem(3)</c> </seealso> manual page. - You can, for example, reply to many callers - and generate multiple next events to handle. + You can, for example, reply to many callers, + generate multiple next events, + and set time-outs to relative or absolute times. </p> </section> @@ -369,6 +377,14 @@ StateName(EventType, EventContent, Data) -> </seealso> state timer timing out. </item> + <tag><c>{timeout,Name}</c></tag> + <item> + Generated by state transition action + <seealso marker="stdlib:gen_statem#type-generic_timeout"> + <c>{{timeout,Name},Time,EventContent}</c> + </seealso> + generic timer timing out. + </item> <tag><c>timeout</c></tag> <item> Generated by state transition action @@ -450,7 +466,7 @@ locked( [Digit] -> do_unlock(), {next_state, open, Data#{remaining := Code}, - [{state_timeout,10000,lock}]; + [{state_timeout,10000,lock}]}; [Digit|Rest] -> % Incomplete {next_state, locked, Data#{remaining := Rest}}; _Wrong -> @@ -779,7 +795,7 @@ handle_event(cast, {button,Digit}, State, #{code := Code} = Data) -> [Digit] -> % Complete do_unlock(), {next_state, open, Data#{remaining := Code}, - [{state_timeout,10000,lock}}; + [{state_timeout,10000,lock}]}; [Digit|Rest] -> % Incomplete {keep_state, Data#{remaining := Rest}}; [_|_] -> % Wrong @@ -873,7 +889,7 @@ stop() -> <marker id="Event Time-Outs" /> <title>Event Time-Outs</title> <p> - A timeout feature inherited from <c>gen_statem</c>'s predecessor + A time-out feature inherited from <c>gen_statem</c>'s predecessor <seealso marker="stdlib:gen_fsm"><c>gen_fsm</c></seealso>, is an event time-out, that is, if an event arrives the timer is cancelled. @@ -906,24 +922,24 @@ locked( ... ]]></code> <p> - Whenever we receive a button event we start an event timeout + Whenever we receive a button event we start an event time-out of 30 seconds, and if we get an event type <c>timeout</c> we reset the remaining code sequence. </p> <p> - An event timeout is cancelled by any other event so you either - get some other event or the timeout event. It is therefore - not possible nor needed to cancel or restart an event timeout. + An event time-out is cancelled by any other event so you either + get some other event or the time-out event. It is therefore + not possible nor needed to cancel or restart an event time-out. Whatever event you act on has already cancelled - the event timeout... + the event time-out... </p> </section> <!-- =================================================================== --> <section> - <marker id="Erlang Timers" /> - <title>Erlang Timers</title> + <marker id="Generic Time-Outs" /> + <title>Generic Time-Outs</title> <p> The previous example of state time-outs only work if the state machine stays in the same state during the @@ -934,13 +950,68 @@ locked( You may want to start a timer in one state and respond to the time-out in another, maybe cancel the time-out without changing states, or perhaps run multiple - time-outs in parallel. All this can be accomplished - with Erlang Timers: + time-outs in parallel. All this can be accomplished with + <seealso marker="stdlib:gen_statem#type-generic_timeout">generic time-outs</seealso>. + They may look a little bit like + <seealso marker="stdlib:gen_statem#type-event_timeout">event time-outs</seealso> + but contain a name to allow for any number of them simultaneously + and they are not automatically cancelled. + </p> + <p> + Here is how to accomplish the state time-out + in the previous example by instead using a generic time-out + named <c>open_tm</c>: + </p> + <code type="erl"><![CDATA[ +... +locked( + cast, {button,Digit}, + #{code := Code, remaining := Remaining} = Data) -> + case Remaining of + [Digit] -> + do_unlock(), + {next_state, open, Data#{remaining := Code}, + [{{timeout,open_tm},10000,lock}]}; +... + +open({timeout,open_tm}, lock, Data) -> + do_lock(), + {next_state,locked,Data}; +open(cast, {button,_}, Data) -> + {keep_state,Data}; +... + ]]></code> + <p> + Just as + <seealso marker="#State Time-Outs">state time-outs</seealso> + you can restart or cancel a specific generic time-out + by setting it to a new time or <c>infinity</c>. + </p> + <p> + Another way to handle a late time-out can be to not cancel it, + but to ignore it if it arrives in a state + where it is known to be late. + </p> + </section> + +<!-- =================================================================== --> + + <section> + <marker id="Erlang Timers" /> + <title>Erlang Timers</title> + <p> + The most versatile way to handle time-outs is to use + Erlang Timers; see <seealso marker="erts:erlang#start_timer/4"><c>erlang:start_timer3,4</c></seealso>. + Most time-out tasks can be performed with the + time-out features in <c>gen_statem</c>, + but an example of one that can not is if you should need + the return value from + <seealso marker="erts:erlang#cancel_timer/2"><c>erlang:cancel_timer(Tref)</c></seealso>, that is; the remaining time of the timer. </p> <p> Here is how to accomplish the state time-out - in the previous example by insted using an Erlang Timer: + in the previous example by instead using an Erlang Timer: </p> <code type="erl"><![CDATA[ ... @@ -1596,7 +1667,7 @@ handle_event( {call,From}, code_length, {_StateName,_LockButton}, #{code := Code}) -> {keep_state_and_data, - [{reply,From,length(Code)}]}; + [{reply,From,length(Code)}]}; %% %% State: locked handle_event( @@ -1636,7 +1707,7 @@ handle_event( if Digit =:= LockButton -> {next_state, {locked,LockButton}, Data, - [{reply,From,locked}]); + [{reply,From,locked}]}; true -> {keep_state_and_data, [postpone]} @@ -1710,10 +1781,10 @@ handle_event( EventType, EventContent, {open,LockButton}, Data) -> case {EventType, EventContent} of - {enter, _OldState} -> - do_unlock(), - {keep_state_and_data, - [{state_timeout,10000,lock},hibernate]}; + {enter, _OldState} -> + do_unlock(), + {keep_state_and_data, + [{state_timeout,10000,lock},hibernate]}; ... ]]></code> <p> diff --git a/system/doc/design_principles/sup_princ.xml b/system/doc/design_principles/sup_princ.xml index 0a24e97950..48b1905e94 100644 --- a/system/doc/design_principles/sup_princ.xml +++ b/system/doc/design_principles/sup_princ.xml @@ -163,7 +163,9 @@ SupFlags = #{strategy => Strategy, ...}</code> SupFlags = #{intensity => MaxR, period => MaxT, ...}</code> <p>If more than <c>MaxR</c> number of restarts occur in the last <c>MaxT</c> seconds, the supervisor terminates all the child - processes and then itself.</p> + processes and then itself. + The termination reason for the supervisor itself in that case will be + <c>shutdown</c>.</p> <p>When the supervisor terminates, then the next higher-level supervisor takes some action. It either restarts the terminated supervisor or terminates itself.</p> @@ -173,6 +175,69 @@ SupFlags = #{intensity => MaxR, period => MaxT, ...}</code> <p>The keys <c>intensity</c> and <c>period</c> are optional in the supervisor flags map. If they are not given, they default to <c>1</c> and <c>5</c>, respectively.</p> + <section> + <title>Tuning the intensity and period</title> + <p>The default values are 1 restart per 5 seconds. This was chosen to + be safe for most systems, even with deep supervision hierarchies, + but you will probably want to tune the settings for your particular + use case.</p> + <p>First, the intensity decides how big bursts of restarts you want + to tolerate. For example, you might want to accept a burst of at + most 5 or 10 attempts, even within the same second, if it results + in a successful restart.</p> + <p>Second, you need to consider the sustained failure rate, if + crashes keep happening but not often enough to make the supervisor + give up. If you set intensity to 10 and set the period as low as 1, + the supervisor will allow child processes to keep restarting up to + 10 times per second, forever, filling your logs with crash reports + until someone intervenes manually.</p> + <p>You should therefore set the period to be long enough that you can + accept that the supervisor keeps going at that rate. For example, + if you have picked an intensity value of 5, then setting the period + to 30 seconds will give you at most one restart per 6 seconds for + any longer period of time, which means that your logs won't fill up + too quickly, and you will have a chance to observe the failures and + apply a fix.</p> + <p>These choices depend a lot on your problem domain. If you don't + have real time monitoring and ability to fix problems quickly, for + example in an embedded system, you might want to accept at most + one restart per minute before the supervisor should give up and + escalate to the next level to try to clear the error automatically. + On the other hand, if it is more important that you keep trying + even at a high failure rate, you might want a sustained rate of as + much as 1-2 restarts per second.</p> + <p>Avoiding common mistakes:</p> + <list type="bulleted"> + <item> + <p>Do not forget to consider the burst rate. If you set intensity + to 1 and period to 6, it gives the same sustained error rate as + 5/30 or 10/60, but will not allow even 2 restart attempts in + quick succession. This is probably not what you wanted.</p> + </item> + <item> + <p>Do not set the period to a very high value if you want to + tolerate bursts. If you set intensity to 5 and period to 3600 + (one hour), the supervisor will allow a short burst of 5 + restarts, but then gives up if it sees another single restart + almost an hour later. You probably want to regard those crashes + as separate incidents, so setting the period to 5 or 10 minutes + will be more reasonable.</p> + </item> + <item> + <p>If your application has multiple levels of supervision, then + do not simply set the restart intensities to the same values on + all levels. Keep in mind that the total number of restarts + (before the top level supervisor gives up and terminates the + application) will be the product of the intensity values of all + the supervisors above the failing child process.</p> + <p>For example, if the top level allows 10 restarts, and the next + level also allows 10, a crashing child below that level will be + restarted 100 times, which is probably excessive. Allowing at + most 3 restarts for the top level supervisor might be a better + choice in this case.</p> + </item> + </list> + </section> </section> <section> @@ -211,7 +276,6 @@ child_spec() = #{id => child_id(), % mandatory <list type="bulleted"> <item><c>supervisor:start_link</c></item> <item><c>gen_server:start_link</c></item> - <item><c>gen_fsm:start_link</c></item> <item><c>gen_statem:start_link</c></item> <item><c>gen_event:start_link</c></item> <item>A function compliant with these functions. For details, @@ -276,7 +340,7 @@ child_spec() = #{id => child_id(), % mandatory <p><c>modules</c> are to be a list with one element <c>[Module]</c>, where <c>Module</c> is the name of the callback module, if the child process is a supervisor, - gen_server, gen_fsm or gen_statem. + gen_server, gen_statem. If the child process is a gen_event, the value shall be <c>dynamic</c>.</p> <p>This information is used by the release handler during diff --git a/system/doc/design_principles/xmlfiles.mk b/system/doc/design_principles/xmlfiles.mk index e476255d62..8877e94f39 100644 --- a/system/doc/design_principles/xmlfiles.mk +++ b/system/doc/design_principles/xmlfiles.mk @@ -24,7 +24,6 @@ DESIGN_PRINCIPLES_CHAPTER_FILES = \ des_princ.xml \ distributed_applications.xml \ events.xml \ - fsm.xml \ statem.xml \ gen_server_concepts.xml \ included_applications.xml \ |