diff options
Diffstat (limited to 'system/doc/getting_started/robustness.xml')
-rw-r--r-- | system/doc/getting_started/robustness.xml | 483 |
1 files changed, 483 insertions, 0 deletions
diff --git a/system/doc/getting_started/robustness.xml b/system/doc/getting_started/robustness.xml new file mode 100644 index 0000000000..227da4c027 --- /dev/null +++ b/system/doc/getting_started/robustness.xml @@ -0,0 +1,483 @@ +<?xml version="1.0" encoding="latin1" ?> +<!DOCTYPE chapter SYSTEM "chapter.dtd"> + +<chapter> + <header> + <copyright> + <year>2003</year><year>2009</year> + <holder>Ericsson AB. All Rights Reserved.</holder> + </copyright> + <legalnotice> + The contents of this file are subject to the Erlang Public License, + Version 1.1, (the "License"); you may not use this file except in + compliance with the License. You should have received a copy of the + Erlang Public License along with this software. If not, it can be + retrieved online at http://www.erlang.org/. + + Software distributed under the License is distributed on an "AS IS" + basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See + the License for the specific language governing rights and limitations + under the License. + + </legalnotice> + + <title>Robustness</title> + <prepared></prepared> + <docno></docno> + <date></date> + <rev></rev> + <file>robustness.xml</file> + </header> + <p>There are several things which are wrong with + the <seealso marker="conc_prog#ex">messenger example</seealso> from + the previous chapter. For example if a node where a user is logged + on goes down without doing a log off, the user will remain in + the server's <c>User_List</c> but the client will disappear thus + making it impossible for the user to log on again as the server + thinks the user already logged on.</p> + <p>Or what happens if the server goes down in the middle of sending a + message leaving the sending client hanging for ever in + the <c>await_result</c> function?</p> + + <section> + <title>Timeouts</title> + <p>Before improving the messenger program we will look into some + general principles, using the ping pong program as an example. + Recall that when "ping" finishes, it tells "pong" that it has + done so by sending the atom <c>finished</c> as a message to "pong" + so that "pong" could also finish. Another way to let "pong" + finish, is to make "pong" exit if it does not receive a message + from ping within a certain time, this can be done by adding a + <em>timeout</em> to <c>pong</c> as shown in the following example:</p> + <code type="none"> +-module(tut19). + +-export([start_ping/1, start_pong/0, ping/2, pong/0]). + +ping(0, Pong_Node) -> + io:format("ping finished~n", []); + +ping(N, Pong_Node) -> + {pong, Pong_Node} ! {ping, self()}, + receive + pong -> + io:format("Ping received pong~n", []) + end, + ping(N - 1, Pong_Node). + +pong() -> + receive + {ping, Ping_PID} -> + io:format("Pong received ping~n", []), + Ping_PID ! pong, + pong() + after 5000 -> + io:format("Pong timed out~n", []) + end. + +start_pong() -> + register(pong, spawn(tut19, pong, [])). + +start_ping(Pong_Node) -> + spawn(tut19, ping, [3, Pong_Node]).</code> + <p>After we have compiled this and copied the <c>tut19.beam</c> + file to the necessary directories:</p> + <p>On (pong@kosken):</p> + <pre> +(pong@kosken)1> <input>tut19:start_pong().</input> +true +Pong received ping +Pong received ping +Pong received ping +Pong timed out</pre> + <p>On (ping@gollum):</p> + <pre> +(ping@gollum)1> <input>tut19:start_ping(pong@kosken).</input> +<0.36.0> +Ping received pong +Ping received pong +Ping received pong +ping finished </pre> + <p>(The timeout is set in:</p> + <code type="none"> +pong() -> + receive + {ping, Ping_PID} -> + io:format("Pong received ping~n", []), + Ping_PID ! pong, + pong() + after 5000 -> + io:format("Pong timed out~n", []) + end.</code> + <p>We start the timeout (<c>after 5000</c>) when we enter + <c>receive</c>. The timeout is canceled if <c>{ping,Ping_PID}</c> + is received. If <c>{ping,Ping_PID}</c> is not received, + the actions following the timeout will be done after 5000 + milliseconds. <c>after</c> must be last in the <c>receive</c>, + i.e. preceded by all other message reception specifications in + the <c>receive</c>. Of course we could also call a function which + returned an integer for the timeout:</p> + <code type="none"> +after pong_timeout() -></code> + <p>In general, there are better ways than using timeouts to + supervise parts of a distributed Erlang system. Timeouts are + usually appropriate to supervise external events, for example if + you have expected a message from some external system within a + specified time. For example, we could use a timeout to log a user + out of the messenger system if they have not accessed it, for + example, in ten minutes.</p> + </section> + + <section> + <title>Error Handling</title> + <p>Before we go into details of the supervision and error handling + in an Erlang system, we need see how Erlang processes terminate, + or in Erlang terminology, <em>exit</em>.</p> + <p>A process which executes <c>exit(normal)</c> or simply runs out + of things to do has a <em>normal</em> exit.</p> + <p>A process which encounters a runtime error (e.g. divide by zero, + bad match, trying to call a function which doesn't exist etc) + exits with an error, i.e. has an <em>abnormal</em> exit. A + process which executes + <seealso marker="erts:erlang#exit/1">exit(Reason)</seealso> + where <c>Reason</c> is any Erlang term except the atom + <c>normal</c>, also has an abnormal exit.</p> + <p>An Erlang process can set up links to other Erlang processes. If + a process calls + <seealso marker="erts:erlang#link/1">link(Other_Pid)</seealso> + it sets up a bidirectional link between itself and the process + called <c>Other_Pid</c>. When a process terminates its sends + something called a <em>signal</em> to all the processes it has + links to.</p> + <p>The signal carries information about the pid it was sent from and + the exit reason.</p> + <p>The default behaviour of a process which receives a normal exit + is to ignore the signal.</p> + <p>The default behaviour in the two other cases (i.e. abnormal exit) + above is to bypass all messages to the receiving process and to + kill it and to propagate the same error signal to the killed + process' links. In this way you can connect all processes in a + transaction together using links and if one of the processes + exits abnormally, all the processes in the transaction will be + killed. As we often want to create a process and link to it at + the same time, there is a special BIF, + <seealso marker="erts:erlang#spawn_link/1">spawn_link</seealso> + which does the same as <c>spawn</c>, but also creates a link to + the spawned process.</p> + <p>Now an example of the ping pong example using links to terminate + "pong":</p> + <code type="none"> +-module(tut20). + +-export([start/1, ping/2, pong/0]). + +ping(N, Pong_Pid) -> + link(Pong_Pid), + ping1(N, Pong_Pid). + +ping1(0, _) -> + exit(ping); + +ping1(N, Pong_Pid) -> + Pong_Pid ! {ping, self()}, + receive + pong -> + io:format("Ping received pong~n", []) + end, + ping1(N - 1, Pong_Pid). + +pong() -> + receive + {ping, Ping_PID} -> + io:format("Pong received ping~n", []), + Ping_PID ! pong, + pong() + end. + +start(Ping_Node) -> + PongPID = spawn(tut20, pong, []), + spawn(Ping_Node, tut20, ping, [3, PongPID]).</code> + <pre> +(s1@bill)3> <input>tut20:start(s2@kosken).</input> +Pong received ping +<3820.41.0> +Ping received pong +Pong received ping +Ping received pong +Pong received ping +Ping received pong</pre> + <p>This is a slight modification of the ping pong program where both + processes are spawned from the same <c>start/1</c> function, + where the "ping" process can be spawned on a separate node. Note + the use of the <c>link</c> BIF. "Ping" calls + <c>exit(ping)</c> when it finishes and this will cause an exit + signal to be sent to "pong" which will also terminate.</p> + <p>It is possible to modify the default behaviour of a process so + that it does not get killed when it receives abnormal exit + signals, but all signals will be turned into normal messages on + the format <c>{'EXIT',FromPID,Reason}</c> and added to the end of + the receiving processes message queue. This behaviour is set by:</p> + <code type="none"> +process_flag(trap_exit, true)</code> + <p>There are several other process flags, see + <seealso marker="erts:erlang#process_flag/2">erlang(3)</seealso>. + Changing the default behaviour of a process in this way is + usually not done in standard user programs, but is left to + the supervisory programs in OTP (but that's another tutorial). + However we will modify the ping pong program to illustrate exit + trapping.</p> + <code type="none"> +-module(tut21). + +-export([start/1, ping/2, pong/0]). + +ping(N, Pong_Pid) -> + link(Pong_Pid), + ping1(N, Pong_Pid). + +ping1(0, _) -> + exit(ping); + +ping1(N, Pong_Pid) -> + Pong_Pid ! {ping, self()}, + receive + pong -> + io:format("Ping received pong~n", []) + end, + ping1(N - 1, Pong_Pid). + +pong() -> + process_flag(trap_exit, true), + pong1(). + +pong1() -> + receive + {ping, Ping_PID} -> + io:format("Pong received ping~n", []), + Ping_PID ! pong, + pong1(); + {'EXIT', From, Reason} -> + io:format("pong exiting, got ~p~n", [{'EXIT', From, Reason}]) + end. + +start(Ping_Node) -> + PongPID = spawn(tut21, pong, []), + spawn(Ping_Node, tut21, ping, [3, PongPID]).</code> + <pre> +(s1@bill)1> <input>tut21:start(s2@gollum).</input> +<3820.39.0> +Pong received ping +Ping received pong +Pong received ping +Ping received pong +Pong received ping +Ping received pong +pong exiting, got {'EXIT',<3820.39.0>,ping}</pre> + </section> + + <section> + <title>The Larger Example with Robustness Added</title> + <p>Now we return to the messenger program and add changes which + make it more robust:</p> + <code type="none"> +%%% Message passing utility. +%%% User interface: +%%% login(Name) +%%% One user at a time can log in from each Erlang node in the +%%% system messenger: and choose a suitable Name. If the Name +%%% is already logged in at another node or if someone else is +%%% already logged in at the same node, login will be rejected +%%% with a suitable error message. +%%% logoff() +%%% Logs off anybody at at node +%%% message(ToName, Message) +%%% sends Message to ToName. Error messages if the user of this +%%% function is not logged on or if ToName is not logged on at +%%% any node. +%%% +%%% One node in the network of Erlang nodes runs a server which maintains +%%% data about the logged on users. The server is registered as "messenger" +%%% Each node where there is a user logged on runs a client process registered +%%% as "mess_client" +%%% +%%% Protocol between the client processes and the server +%%% ---------------------------------------------------- +%%% +%%% To server: {ClientPid, logon, UserName} +%%% Reply {messenger, stop, user_exists_at_other_node} stops the client +%%% Reply {messenger, logged_on} logon was successful +%%% +%%% When the client terminates for some reason +%%% To server: {'EXIT', ClientPid, Reason} +%%% +%%% To server: {ClientPid, message_to, ToName, Message} send a message +%%% Reply: {messenger, stop, you_are_not_logged_on} stops the client +%%% Reply: {messenger, receiver_not_found} no user with this name logged on +%%% Reply: {messenger, sent} Message has been sent (but no guarantee) +%%% +%%% To client: {message_from, Name, Message}, +%%% +%%% Protocol between the "commands" and the client +%%% ---------------------------------------------- +%%% +%%% Started: messenger:client(Server_Node, Name) +%%% To client: logoff +%%% To client: {message_to, ToName, Message} +%%% +%%% Configuration: change the server_node() function to return the +%%% name of the node where the messenger server runs + +-module(messenger). +-export([start_server/0, server/0, + logon/1, logoff/0, message/2, client/2]). + +%%% Change the function below to return the name of the node where the +%%% messenger server runs +server_node() -> + messenger@super. + +%%% This is the server process for the "messenger" +%%% the user list has the format [{ClientPid1, Name1},{ClientPid22, Name2},...] +server() -> + process_flag(trap_exit, true), + server([]). + +server(User_List) -> + receive + {From, logon, Name} -> + New_User_List = server_logon(From, Name, User_List), + server(New_User_List); + {'EXIT', From, _} -> + New_User_List = server_logoff(From, User_List), + server(New_User_List); + {From, message_to, To, Message} -> + server_transfer(From, To, Message, User_List), + io:format("list is now: ~p~n", [User_List]), + server(User_List) + end. + +%%% Start the server +start_server() -> + register(messenger, spawn(messenger, server, [])). + +%%% Server adds a new user to the user list +server_logon(From, Name, User_List) -> + %% check if logged on anywhere else + case lists:keymember(Name, 2, User_List) of + true -> + From ! {messenger, stop, user_exists_at_other_node}, %reject logon + User_List; + false -> + From ! {messenger, logged_on}, + link(From), + [{From, Name} | User_List] %add user to the list + end. + +%%% Server deletes a user from the user list +server_logoff(From, User_List) -> + lists:keydelete(From, 1, User_List). + + +%%% Server transfers a message between user +server_transfer(From, To, Message, User_List) -> + %% check that the user is logged on and who he is + case lists:keysearch(From, 1, User_List) of + false -> + From ! {messenger, stop, you_are_not_logged_on}; + {value, {_, Name}} -> + server_transfer(From, Name, To, Message, User_List) + end. + +%%% If the user exists, send the message +server_transfer(From, Name, To, Message, User_List) -> + %% Find the receiver and send the message + case lists:keysearch(To, 2, User_List) of + false -> + From ! {messenger, receiver_not_found}; + {value, {ToPid, To}} -> + ToPid ! {message_from, Name, Message}, + From ! {messenger, sent} + end. + +%%% User Commands +logon(Name) -> + case whereis(mess_client) of + undefined -> + register(mess_client, + spawn(messenger, client, [server_node(), Name])); + _ -> already_logged_on + end. + +logoff() -> + mess_client ! logoff. + +message(ToName, Message) -> + case whereis(mess_client) of % Test if the client is running + undefined -> + not_logged_on; + _ -> mess_client ! {message_to, ToName, Message}, + ok +end. + +%%% The client process which runs on each user node +client(Server_Node, Name) -> + {messenger, Server_Node} ! {self(), logon, Name}, + await_result(), + client(Server_Node). + +client(Server_Node) -> + receive + logoff -> + exit(normal); + {message_to, ToName, Message} -> + {messenger, Server_Node} ! {self(), message_to, ToName, Message}, + await_result(); + {message_from, FromName, Message} -> + io:format("Message from ~p: ~p~n", [FromName, Message]) + end, + client(Server_Node). + +%%% wait for a response from the server +await_result() -> + receive + {messenger, stop, Why} -> % Stop the client + io:format("~p~n", [Why]), + exit(normal); + {messenger, What} -> % Normal response + io:format("~p~n", [What]) + after 5000 -> + io:format("No response from server~n", []), + exit(timeout) + end.</code> + <p>We have added the following changes:</p> + <p>The messenger server traps exits. If it receives an exit signal, + <c>{'EXIT',From,Reason}</c> this means that a client process has + terminated or is unreachable because:</p> + <list type="bulleted"> + <item>the user has logged off (we have removed the "logoff" + message),</item> + <item>the network connection to the client is broken,</item> + <item>the node on which the client process resides has gone down, + or</item> + <item>the client processes has done some illegal operation.</item> + </list> + <p>If we receive an exit signal as above, we delete the tuple, + <c>{From,Name}</c> from the servers <c>User_List</c> using + the <c>server_logoff</c> function. If the node on which the server + runs goes down, an exit signal (automatically generated by + the system), will be sent to all of the client processes: + <c>{'EXIT',MessengerPID,noconnection}</c> causing all the client + processes to terminate.</p> + <p>We have also introduced a timeout of five seconds in + the <c>await_result</c> function. I.e. if the server does not + reply within five seconds (5000 ms), the client terminates. This + is really only needed in the logon sequence before the client and + server are linked.</p> + <p>An interesting case is if the client was to terminate before + the server links to it. This is taken care of because linking to a + non-existent process causes an exit signal, + <c>{'EXIT',From,noproc}</c>, to be automatically generated as if + the process terminated immediately after the link operation.</p> + </section> +</chapter> + |