There are several things which are wrong with
the
Or what happens if the server goes down in the middle of sending a
message leaving the sending client hanging for ever in
the
Before improving the messenger program we will look into some
general principles, using the ping pong program as an example.
Recall that when "ping" finishes, it tells "pong" that it has
done so by sending the atom
-module(tut19).
-export([start_ping/1, start_pong/0, ping/2, pong/0]).
ping(0, Pong_Node) ->
io:format("ping finished~n", []);
ping(N, Pong_Node) ->
{pong, Pong_Node} ! {ping, self()},
receive
pong ->
io:format("Ping received pong~n", [])
end,
ping(N - 1, Pong_Node).
pong() ->
receive
{ping, Ping_PID} ->
io:format("Pong received ping~n", []),
Ping_PID ! pong,
pong()
after 5000 ->
io:format("Pong timed out~n", [])
end.
start_pong() ->
register(pong, spawn(tut19, pong, [])).
start_ping(Pong_Node) ->
spawn(tut19, ping, [3, Pong_Node]).
After we have compiled this and copied the
On (pong@kosken):
(pong@kosken)1> tut19:start_pong(). true Pong received ping Pong received ping Pong received ping Pong timed out
On (ping@gollum):
(ping@gollum)1> tut19:start_ping(pong@kosken). <0.36.0> Ping received pong Ping received pong Ping received pong ping finished
(The timeout is set in:
pong() ->
receive
{ping, Ping_PID} ->
io:format("Pong received ping~n", []),
Ping_PID ! pong,
pong()
after 5000 ->
io:format("Pong timed out~n", [])
end.
We start the timeout (
after pong_timeout() ->
In general, there are better ways than using timeouts to supervise parts of a distributed Erlang system. Timeouts are usually appropriate to supervise external events, for example if you have expected a message from some external system within a specified time. For example, we could use a timeout to log a user out of the messenger system if they have not accessed it, for example, in ten minutes.
Before we go into details of the supervision and error handling in an Erlang system, we need see how Erlang processes terminate, or in Erlang terminology, exit.
A process which executes
A process which encounters a runtime error (e.g. divide by zero,
bad match, trying to call a function which doesn't exist etc)
exits with an error, i.e. has an abnormal exit. A
process which executes
An Erlang process can set up links to other Erlang processes. If
a process calls
The signal carries information about the pid it was sent from and the exit reason.
The default behaviour of a process which receives a normal exit is to ignore the signal.
The default behaviour in the two other cases (i.e. abnormal exit)
above is to bypass all messages to the receiving process and to
kill it and to propagate the same error signal to the killed
process' links. In this way you can connect all processes in a
transaction together using links and if one of the processes
exits abnormally, all the processes in the transaction will be
killed. As we often want to create a process and link to it at
the same time, there is a special BIF,
Now an example of the ping pong example using links to terminate "pong":
-module(tut20).
-export([start/1, ping/2, pong/0]).
ping(N, Pong_Pid) ->
link(Pong_Pid),
ping1(N, Pong_Pid).
ping1(0, _) ->
exit(ping);
ping1(N, Pong_Pid) ->
Pong_Pid ! {ping, self()},
receive
pong ->
io:format("Ping received pong~n", [])
end,
ping1(N - 1, Pong_Pid).
pong() ->
receive
{ping, Ping_PID} ->
io:format("Pong received ping~n", []),
Ping_PID ! pong,
pong()
end.
start(Ping_Node) ->
PongPID = spawn(tut20, pong, []),
spawn(Ping_Node, tut20, ping, [3, PongPID]).
(s1@bill)3> tut20:start(s2@kosken). Pong received ping <3820.41.0> Ping received pong Pong received ping Ping received pong Pong received ping Ping received pong
This is a slight modification of the ping pong program where both
processes are spawned from the same
It is possible to modify the default behaviour of a process so
that it does not get killed when it receives abnormal exit
signals, but all signals will be turned into normal messages on
the format
process_flag(trap_exit, true)
There are several other process flags, see
-module(tut21).
-export([start/1, ping/2, pong/0]).
ping(N, Pong_Pid) ->
link(Pong_Pid),
ping1(N, Pong_Pid).
ping1(0, _) ->
exit(ping);
ping1(N, Pong_Pid) ->
Pong_Pid ! {ping, self()},
receive
pong ->
io:format("Ping received pong~n", [])
end,
ping1(N - 1, Pong_Pid).
pong() ->
process_flag(trap_exit, true),
pong1().
pong1() ->
receive
{ping, Ping_PID} ->
io:format("Pong received ping~n", []),
Ping_PID ! pong,
pong1();
{'EXIT', From, Reason} ->
io:format("pong exiting, got ~p~n", [{'EXIT', From, Reason}])
end.
start(Ping_Node) ->
PongPID = spawn(tut21, pong, []),
spawn(Ping_Node, tut21, ping, [3, PongPID]).
(s1@bill)1> tut21:start(s2@gollum). <3820.39.0> Pong received ping Ping received pong Pong received ping Ping received pong Pong received ping Ping received pong pong exiting, got {'EXIT',<3820.39.0>,ping}
Now we return to the messenger program and add changes which make it more robust:
%%% Message passing utility.
%%% User interface:
%%% login(Name)
%%% One user at a time can log in from each Erlang node in the
%%% system messenger: and choose a suitable Name. If the Name
%%% is already logged in at another node or if someone else is
%%% already logged in at the same node, login will be rejected
%%% with a suitable error message.
%%% logoff()
%%% Logs off anybody at at node
%%% message(ToName, Message)
%%% sends Message to ToName. Error messages if the user of this
%%% function is not logged on or if ToName is not logged on at
%%% any node.
%%%
%%% One node in the network of Erlang nodes runs a server which maintains
%%% data about the logged on users. The server is registered as "messenger"
%%% Each node where there is a user logged on runs a client process registered
%%% as "mess_client"
%%%
%%% Protocol between the client processes and the server
%%% ----------------------------------------------------
%%%
%%% To server: {ClientPid, logon, UserName}
%%% Reply {messenger, stop, user_exists_at_other_node} stops the client
%%% Reply {messenger, logged_on} logon was successful
%%%
%%% When the client terminates for some reason
%%% To server: {'EXIT', ClientPid, Reason}
%%%
%%% To server: {ClientPid, message_to, ToName, Message} send a message
%%% Reply: {messenger, stop, you_are_not_logged_on} stops the client
%%% Reply: {messenger, receiver_not_found} no user with this name logged on
%%% Reply: {messenger, sent} Message has been sent (but no guarantee)
%%%
%%% To client: {message_from, Name, Message},
%%%
%%% Protocol between the "commands" and the client
%%% ----------------------------------------------
%%%
%%% Started: messenger:client(Server_Node, Name)
%%% To client: logoff
%%% To client: {message_to, ToName, Message}
%%%
%%% Configuration: change the server_node() function to return the
%%% name of the node where the messenger server runs
-module(messenger).
-export([start_server/0, server/0,
logon/1, logoff/0, message/2, client/2]).
%%% Change the function below to return the name of the node where the
%%% messenger server runs
server_node() ->
messenger@super.
%%% This is the server process for the "messenger"
%%% the user list has the format [{ClientPid1, Name1},{ClientPid22, Name2},...]
server() ->
process_flag(trap_exit, true),
server([]).
server(User_List) ->
receive
{From, logon, Name} ->
New_User_List = server_logon(From, Name, User_List),
server(New_User_List);
{'EXIT', From, _} ->
New_User_List = server_logoff(From, User_List),
server(New_User_List);
{From, message_to, To, Message} ->
server_transfer(From, To, Message, User_List),
io:format("list is now: ~p~n", [User_List]),
server(User_List)
end.
%%% Start the server
start_server() ->
register(messenger, spawn(messenger, server, [])).
%%% Server adds a new user to the user list
server_logon(From, Name, User_List) ->
%% check if logged on anywhere else
case lists:keymember(Name, 2, User_List) of
true ->
From ! {messenger, stop, user_exists_at_other_node}, %reject logon
User_List;
false ->
From ! {messenger, logged_on},
link(From),
[{From, Name} | User_List] %add user to the list
end.
%%% Server deletes a user from the user list
server_logoff(From, User_List) ->
lists:keydelete(From, 1, User_List).
%%% Server transfers a message between user
server_transfer(From, To, Message, User_List) ->
%% check that the user is logged on and who he is
case lists:keysearch(From, 1, User_List) of
false ->
From ! {messenger, stop, you_are_not_logged_on};
{value, {_, Name}} ->
server_transfer(From, Name, To, Message, User_List)
end.
%%% If the user exists, send the message
server_transfer(From, Name, To, Message, User_List) ->
%% Find the receiver and send the message
case lists:keysearch(To, 2, User_List) of
false ->
From ! {messenger, receiver_not_found};
{value, {ToPid, To}} ->
ToPid ! {message_from, Name, Message},
From ! {messenger, sent}
end.
%%% User Commands
logon(Name) ->
case whereis(mess_client) of
undefined ->
register(mess_client,
spawn(messenger, client, [server_node(), Name]));
_ -> already_logged_on
end.
logoff() ->
mess_client ! logoff.
message(ToName, Message) ->
case whereis(mess_client) of % Test if the client is running
undefined ->
not_logged_on;
_ -> mess_client ! {message_to, ToName, Message},
ok
end.
%%% The client process which runs on each user node
client(Server_Node, Name) ->
{messenger, Server_Node} ! {self(), logon, Name},
await_result(),
client(Server_Node).
client(Server_Node) ->
receive
logoff ->
exit(normal);
{message_to, ToName, Message} ->
{messenger, Server_Node} ! {self(), message_to, ToName, Message},
await_result();
{message_from, FromName, Message} ->
io:format("Message from ~p: ~p~n", [FromName, Message])
end,
client(Server_Node).
%%% wait for a response from the server
await_result() ->
receive
{messenger, stop, Why} -> % Stop the client
io:format("~p~n", [Why]),
exit(normal);
{messenger, What} -> % Normal response
io:format("~p~n", [What])
after 5000 ->
io:format("No response from server~n", []),
exit(timeout)
end.
We have added the following changes:
The messenger server traps exits. If it receives an exit signal,
If we receive an exit signal as above, we delete the tuple,
We have also introduced a timeout of five seconds in
the
An interesting case is if the client was to terminate before
the server links to it. This is taken care of because linking to a
non-existent process causes an exit signal,