This document describes how one can implement ones own carrier protocol for the Erlang distribution. The distribution is normally carried by the TCP/IP protocol. What's explained here is the method for replacing TCP/IP with another protocol.
The document is a step by step explanation of the
This document was written a long time ago. Most of it is still
valid, but some things have changed since it was first written.
Most notably the driver interface. There have been some updates
to the documentation of the driver presented in this documentation,
but more could be done and are planned for the future. The
reader is encouraged to also read the
To implement a new carrier for the Erlang distribution, one must first make the protocol available to the Erlang machine, which involves writing an Erlang driver. There is no way one can use a port program, there has to be an Erlang driver. Erlang drivers can either be statically linked to the emulator, which can be an alternative when using the open source distribution of Erlang, or dynamically loaded into the Erlang machines address space, which is the only alternative if a precompiled version of Erlang is to be used.
Writing an Erlang driver is by no means easy. The driver is written as a couple of call-back functions called by the Erlang emulator when data is sent to the driver or the driver has any data available on a file descriptor. As the driver call-back routines execute in the main thread of the Erlang machine, the call-back functions can perform no blocking activity whatsoever. The call-backs should only set up file descriptors for waiting and/or read/write available data. All I/O has to be non blocking. Driver call-backs are however executed in sequence, why a global state can safely be updated within the routines.
When the driver is implemented, one would preferably write an
Erlang interface for the driver to be able to test the
functionality of the driver separately. This interface can then
be used by the distribution module which will cover the details of
the protocol from the
When the protocol is available to Erlang through a driver and an
Erlang interface module, a distribution module can be
written. The distribution module is a module with well defined
call-backs, much like a
The last step is to create boot scripts to make the protocol
implementation available at boot time. The implementation can be
debugged by starting the distribution when all of the system is
running, but in a real system the distribution should start very
early, why a boot-script and some command line parameters are
necessary. This last step also implies that the Erlang code in the
interface and distribution modules is written in such a way that
it can be run in the startup phase. Most notably there can be no
calls to the
Although Erlang drivers in general may be beyond the scope of this document, a brief introduction seems to be in place.
An Erlang driver is a native code module written in C (or
assembler) which serves as an interface for some special operating
system service. This is a general mechanism that is used
throughout the Erlang emulator for all kinds of I/O. An Erlang
driver can be dynamically linked (or loaded) to the Erlang
emulator at runtime by using the
The driver data-types and the functions available to the driver
writer are defined in the header file
When writing a driver to make a communications protocol available to Erlang, one should know just about everything worth knowing about that particular protocol. All operation has to be non blocking and all possible situations should be accounted for in the driver. A non stable driver will affect and/or crash the whole Erlang runtime system, which is seldom what's wanted.
The emulator calls the driver in the following situations:
The driver used for Erlang distribution should implement a reliable, order maintaining, variable length packet oriented protocol. All error correction, re-sending and such need to be implemented in the driver or by the underlying communications protocol. If the protocol is stream oriented (as is the case with both TCP/IP and our streamed Unix domain sockets), some mechanism for packaging is needed. We will use the simple method of having a header of four bytes containing the length of the package in a big endian 32 bit integer (as Unix domain sockets only can be used between processes on the same machine, we actually don't need to code the integer in some special endianess, but I'll do it anyway because in most situation you do need to do it. Unix domain sockets are reliable and order maintaining, so we don't need to implement resends and such in our driver.
Lets start writing our example Unix domain sockets driver by declaring prototypes and filling in a static ErlDrvEntry structure.
( 2) #include
( 3) #include
( 4) #include
( 5) #include
( 6) #include
( 7) #include
( 8) #include
( 9) #include
(10) #include
(11) #define HAVE_UIO_H
(12) #include "erl_driver.h"
(13) /*
(14) ** Interface routines
(15) */
(16) static ErlDrvData uds_start(ErlDrvPort port, char *buff);
(17) static void uds_stop(ErlDrvData handle);
(18) static void uds_command(ErlDrvData handle, char *buff, int bufflen);
(19) static void uds_input(ErlDrvData handle, ErlDrvEvent event);
(20) static void uds_output(ErlDrvData handle, ErlDrvEvent event);
(21) static void uds_finish(void);
(22) static int uds_control(ErlDrvData handle, unsigned int command,
(23) char* buf, int count, char** res, int res_size);
(24) /* The driver entry */
(25) static ErlDrvEntry uds_driver_entry = {
(26) NULL, /* init, N/A */
(27) uds_start, /* start, called when port is opened */
(28) uds_stop, /* stop, called when port is closed */
(29) uds_command, /* output, called when erlang has sent */
(30) uds_input, /* ready_input, called when input
(31) descriptor ready */
(32) uds_output, /* ready_output, called when output
(33) descriptor ready */
(34) "uds_drv", /* char *driver_name, the argument
(35) to open_port */
(36) uds_finish, /* finish, called when unloaded */
(37) NULL, /* void * that is not used (BC) */
(38) uds_control, /* control, port_control callback */
(39) NULL, /* timeout, called on timeouts */
(40) NULL, /* outputv, vector output interface */
(41) NULL, /* ready_async callback */
(42) NULL, /* flush callback */
(43) NULL, /* call callback */
(44) NULL, /* event callback */
(45) ERL_DRV_EXTENDED_MARKER, /* Extended driver interface marker */
(46) ERL_DRV_EXTENDED_MAJOR_VERSION, /* Major version number */
(47) ERL_DRV_EXTENDED_MINOR_VERSION, /* Minor version number */
(48) ERL_DRV_FLAG_SOFT_BUSY, /* Driver flags. Soft busy flag is
(49) required for distribution drivers */
(50) NULL, /* Reserved for internal use */
(51) NULL, /* process_exit callback */
(52) NULL /* stop_select callback */
(53) };]]>
On line 1 to 10 we have included the OS headers needed for our
driver. As this driver is written for Solaris, we know that the
header
The different call-back functions are declared ("forward declarations") on line 16 to 23.
The driver structure is similar for statically linked in
drivers and dynamically loaded. However some of the fields
should be left empty (i.e. initialized to NULL) in the
different types of drivers. The first field (the
As of erts version 5.5.3 the driver interface was extended with
version control and the possibility to pass capability information.
Capability flags are present at line 48. As of erts version 5.7.4
the
This driver was written before the runtime system had SMP support.
The driver will still function in the runtime system with SMP support,
but performance will suffer from lock contention on the driver lock
used for the driver. This can be alleviated by reviewing and perhaps
rewriting the code so that each instance of the driver safely can
execute in parallel. When instances safely can execute in parallel it
is safe to enable instance specific locking on the driver. This is done
by passing
Our defined call-backs thus are:
The ports implemented by this driver will operate in two major modes, which i will call the command and data modes. In command mode, only passive reading and writing (like gen_tcp:recv/gen_tcp:send) can be done, and this is the mode the port will be in during the distribution handshake. When the connection is up, the port will be switched to data mode and all data will be immediately read and passed further to the Erlang emulator. In data mode, no data arriving to the uds_command will be interpreted, but just packaged and sent out on the socket. The uds_control call-back will do the switching between those two modes.
While the
Lets define an enum for the different types of ports we have:
Lets look at the different types:
Now lets look at the state we'll need for our ports. One can note that not all fields are used for all types of ports and that one could save some space by using unions, but that would clutter the code with multiple indirections, so i simply use one struct for all types of ports, for readability.
This structure is used for all types of ports although some fields are useless for some types. The least memory consuming solution would be to arrange this structure as a union of structures, but the multiple indirections in the code to access a field in such a structure will clutter the code to much for an example.
Let's look at the fields in our structure:
lockfd - If the socket is a listen socket, we use a separate (regular) file for two purposes:
We store the creation serial number in the file. The creation is a number that should change between different instances of different Erlang emulators with the same name, so that process identifiers from one emulator won't be valid when sent to a new emulator with the same distribution name. The creation can be between 0 and 3 (two bits) and is stored in every process identifier sent to another node.
In a system with TCP based distribution, this data is
kept in the Erlang port mapper daemon
(
The distribution drivers implementation is not completely covered in this text, details about buffering and other things unrelated to driver writing are not explained. Likewise are some peculiarities of the UDS protocol not explained in detail. The chosen protocol is not important.
Prototypes for the driver call-back routines can be found in
the
The driver initialization routine is (usually) declared with a
macro to make the driver easier to port between different
operating systems (and flavours of systems). This is the only
routine that has to have a well defined name. All other
call-backs are reached through the driver structure. The macro
to use is named
The routine initializes the single global data structure and
returns a pointer to the driver entry. The routine will be
called when
The
fd = -1;
( 7) ud->lockfd = -1;
( 8) ud->creation = 0;
( 9) ud->port = port;
(10) ud->type = portTypeUnknown;
(11) ud->name = NULL;
(12) ud->buffer_size = 0;
(13) ud->buffer_pos = 0;
(14) ud->header_pos = 0;
(15) ud->buffer = NULL;
(16) ud->sent = 0;
(17) ud->received = 0;
(18) ud->partner = NULL;
(19) ud->next = first_data;
(20) first_data = ud;
(21)
(22) return((ErlDrvData) ud);
(23) } ]]>
Every data item is initialized, so that no problems will arise
when a newly created port is closed (without there being any
corresponding socket). This routine is called when
The
type == portTypeData || ud->type == portTypeIntermediate) {
( 5) DEBUGF(("Passive do_send %d",bufflen));
( 6) do_send(ud, buff + 1, bufflen - 1); /* XXX */
( 7) return;
( 8) }
( 9) if (bufflen == 0) {
(10) return;
(11) }
(12) switch (*buff) {
(13) case 'L':
(14) if (ud->type != portTypeUnknown) {
(15) driver_failure_posix(ud->port, ENOTSUP);
(16) return;
(17) }
(18) uds_command_listen(ud,buff,bufflen);
(19) return;
(20) case 'A':
(21) if (ud->type != portTypeUnknown) {
(22) driver_failure_posix(ud->port, ENOTSUP);
(23) return;
(24) }
(25) uds_command_accept(ud,buff,bufflen);
(26) return;
(27) case 'C':
(28) if (ud->type != portTypeUnknown) {
(29) driver_failure_posix(ud->port, ENOTSUP);
(30) return;
(31) }
(32) uds_command_connect(ud,buff,bufflen);
(33) return;
(34) case 'S':
(35) if (ud->type != portTypeCommand) {
(36) driver_failure_posix(ud->port, ENOTSUP);
(37) return;
(38) }
(39) do_send(ud, buff + 1, bufflen - 1);
(40) return;
(41) case 'R':
(42) if (ud->type != portTypeCommand) {
(43) driver_failure_posix(ud->port, ENOTSUP);
(44) return;
(45) }
(46) do_recv(ud);
(47) return;
(48) default:
(49) return;
(50) }
(51) } ]]>
The command routine takes three parameters; the handle
returned for the port by
If Erlang sends i.e. the list
One may wonder what is meant by "one packet of data" in the 'R' command. This driver always sends data packeted with a 4 byte header containing a big endian 32 bit integer that represents the length of the data in the packet. There is no need for different packet sizes or some kind of streamed mode, as this driver is for the distribution only. One may wonder why the header word is coded explicitly in big endian when an UDS socket is local to the host. The answer simply is that I see it as a good practice when writing a distribution driver, as distribution in practice usually cross the host boundaries.
On line 4-8 we handle the case where the port is in data or
intermediate mode, the rest of the routine handles the
different commands. We see (first on line 15) that the routine
uses the
The uds_input routine gets called when data is available on a
file descriptor previously passed to the
port, (ErlDrvEvent) ud->fd, DO_READ, 1);
( 9) } else {
(10) driver_failure_eof(ud->port);
(11) }
(12) return;
(13) }
(14) /* Got a package */
(15) if (ud->type == portTypeCommand) {
(16) ibuf[-1] = 'R'; /* There is always room for a single byte
(17) opcode before the actual buffer
(18) (where the packet header was) */
(19) driver_output(ud->port,ibuf - 1, res + 1);
(20) driver_select(ud->port, (ErlDrvEvent) ud->fd, DO_READ,0);
(21) return;
(22) } else {
(23) ibuf[-1] = DIST_MAGIC_RECV_TAG; /* XXX */
(24) driver_output(ud->port,ibuf - 1, res + 1);
(25) driver_select(ud->port, (ErlDrvEvent) ud->fd, DO_READ,1);
(26) }
(27) }
(28) } ]]>
The routine tries to read data until a packet is read or the
When the port is in data mode, all data is sent to Erlang in a
format that suits the distribution, in fact the raw data will
never reach any Erlang process, but will be
translated/interpreted by the emulator itself and then
delivered in the correct format to the correct processes. In
the current emulator version, received data should be tagged
with a single byte of 100. Thats what the macro
The
type == portTypeListener) {
( 5) UdsData *ad = ud->partner;
( 6) struct sockaddr_un peer;
( 7) int pl = sizeof(struct sockaddr_un);
( 8) int fd;
( 9) if ((fd = accept(ud->fd, (struct sockaddr *) &peer, &pl)) < 0) {
(10) if (errno != EWOULDBLOCK) {
(11) driver_failure_posix(ud->port, errno);
(12) return;
(13) }
(14) return;
(15) }
(16) SET_NONBLOCKING(fd);
(17) ad->fd = fd;
(18) ad->partner = NULL;
(19) ad->type = portTypeCommand;
(20) ud->partner = NULL;
(21) driver_select(ud->port, (ErlDrvEvent) ud->fd, DO_READ, 0);
(22) driver_output(ad->port, "Aok",3);
(23) return;
(24) }
(25) do_recv(ud);
(26) } ]]>
The important line here is the last line in the function, the
The output mechanisms are similar to the input. Lets first
look at the
port) == 0) {
(19) if ((written = writev(ud->fd, iov, 2)) == eio.size) {
(20) ud->sent += written;
(21) if (ud->type == portTypeCommand) {
(22) driver_output(ud->port, "Sok", 3);
(23) }
(24) return;
(25) } else if (written < 0) {
(26) if (errno != EWOULDBLOCK) {
(27) driver_failure_eof(ud->port);
(28) return;
(29) } else {
(30) written = 0;
(31) }
(32) } else {
(33) ud->sent += written;
(34) }
(35) /* Enqueue remaining */
(36) }
(37) driver_enqv(ud->port, &eio, written);
(38) send_out_queue(ud);
(39) } ]]>
This driver uses the
The routine builds an I/O vector containing the header bytes
and the buffer (the opcode has been removed and the buffer
length decreased by the output routine). If the queue is
empty, we'll write the data directly to the socket (or at
least try to). If any data is left, it is stored in the queue
and then we try to send the queue (line 38). An ack is sent
when the message is delivered completely (line 22). The
A short look at the
port, &vlen);
( 6) int wrote;
( 7) if (tmp == NULL) {
( 8) driver_select(ud->port, (ErlDrvEvent) ud->fd, DO_WRITE, 0);
( 9) if (ud->type == portTypeCommand) {
(10) driver_output(ud->port, "Sok", 3);
(11) }
(12) return 0;
(13) }
(14) if (vlen > IO_VECTOR_MAX) {
(15) vlen = IO_VECTOR_MAX;
(16) }
(17) if ((wrote = writev(ud->fd, tmp, vlen)) < 0) {
(18) if (errno == EWOULDBLOCK) {
(19) driver_select(ud->port, (ErlDrvEvent) ud->fd,
(20) DO_WRITE, 1);
(21) return 0;
(22) } else {
(23) driver_failure_eof(ud->port);
(24) return -1;
(25) }
(26) }
(27) driver_deq(ud->port, wrote);
(28) ud->sent += wrote;
(29) }
(30) } ]]>
What we do is simply to pick out an I/O vector from the queue
(which is the whole queue as an SysIOVec). If the I/O
vector is to long (IO_VECTOR_MAX is defined to 16), the vector
length is decreased (line 15), otherwise the
We will continue trying to write until the queue is empty or the writing would block.
The routine above are called from the
type == portTypeConnector) {
( 5) ud->type = portTypeCommand;
( 6) driver_select(ud->port, (ErlDrvEvent) ud->fd, DO_WRITE, 0);
( 7) driver_output(ud->port, "Cok",3);
( 8) return;
( 9) }
(10) send_out_queue(ud);
(11) } ]]>
The routine is simple, it first handles the fact that the output select will concern a socket in the business of connecting (and the connecting blocked). If the socket is in a connected state it simply sends the output queue, this routine is called when there is possible to write to a socket where we have an output queue, so there is no question what to do.
The driver implements a control interface, which is a
synchronous interface called when Erlang calls
The control interface gets a buffer to return its value in,
but is free to allocate its own buffer is the provided one is
to small. Here is the code for
received);
(18) put_packet_length((*res) + 5, ud->sent);
(19) put_packet_length((*res) + 9, driver_sizeq(ud->port));
(20) return 13;
(21) }
(22) case 'C':
(23) if (ud->type < portTypeCommand) {
(24) return report_control_error(res, res_size, "einval");
(25) }
(26) ud->type = portTypeCommand;
(27) driver_select(ud->port, (ErlDrvEvent) ud->fd, DO_READ, 0);
(28) ENSURE(1);
(29) **res = 0;
(30) return 1;
(31) case 'I':
(32) if (ud->type < portTypeCommand) {
(33) return report_control_error(res, res_size, "einval");
(34) }
(35) ud->type = portTypeIntermediate;
(36) driver_select(ud->port, (ErlDrvEvent) ud->fd, DO_READ, 0);
(37) ENSURE(1);
(38) **res = 0;
(39) return 1;
(40) case 'D':
(41) if (ud->type < portTypeCommand) {
(42) return report_control_error(res, res_size, "einval");
(43) }
(44) ud->type = portTypeData;
(45) do_recv(ud);
(46) ENSURE(1);
(47) **res = 0;
(48) return 1;
(49) case 'N':
(50) if (ud->type != portTypeListener) {
(51) return report_control_error(res, res_size, "einval");
(52) }
(53) ENSURE(5);
(54) (*res)[0] = 0;
(55) put_packet_length((*res) + 1, ud->fd);
(56) return 5;
(57) case 'T': /* tick */
(58) if (ud->type != portTypeData) {
(59) return report_control_error(res, res_size, "einval");
(60) }
(61) do_send(ud,"",0);
(62) ENSURE(1);
(63) **res = 0;
(64) return 1;
(65) case 'R':
(66) if (ud->type != portTypeListener) {
(67) return report_control_error(res, res_size, "einval");
(68) }
(69) ENSURE(2);
(70) (*res)[0] = 0;
(71) (*res)[1] = ud->creation;
(72) return 2;
(73) default:
(74) return report_control_error(res, res_size, "einval");
(75) }
(76) #undef ENSURE
(77) } ]]>
The macro
The rest of the driver is more or less UDS specific and not of general interest.
To test the distribution, one can use the
For net kernel to find out which distribution module to use, the
command line argument
If no epmd (TCP port mapper daemon) is used, one should also
specify the command line option
The path to the directory where the distribution modules reside
must be known at boot, which can either be achieved by
specifying
The distribution will be started at boot if all the above is
specified and an
$ erl -pa $ERL_TOP/lib/kernel/examples/uds_dist/ebin -proto_dist uds -no_epmd Erlang (BEAM) emulator version 5.0 Eshell V5.0 (abort with ^G) 1> net_kernel:start([bing,shortnames]). {ok,<0.30.0>} (bing@hador)2>
...
$ erl -pa $ERL_TOP/lib/kernel/examples/uds_dist/ebin -proto_dist uds \ -no_epmd -sname bong Erlang (BEAM) emulator version 5.0 Eshell V5.0 (abort with ^G) (bong@hador)1>
One can utilize the ERL_FLAGS environment variable to store the complicated parameters in:
$ ERL_FLAGS=-pa $ERL_TOP/lib/kernel/examples/uds_dist/ebin \ -proto_dist uds -no_epmd $ export ERL_FLAGS $ erl -sname bang Erlang (BEAM) emulator version 5.0 Eshell V5.0 (abort with ^G) (bang@hador)1>
The