1 files changed, 217 insertions, 0 deletions
diff --git a/system/doc/design_principles/distributed_applications.xml b/system/doc/design_principles/distributed_applications.xml
new file mode 100644
index 0000000000..39a24b3598
--- /dev/null
+++ b/system/doc/design_principles/distributed_applications.xml
@@ -0,0 +1,217 @@
+<?xml version="1.0" encoding="latin1" ?>
+<!DOCTYPE chapter SYSTEM "chapter.dtd">
+
+<chapter>
+  <header>
+    <copyright>
+      <year>2003</year><year>2009</year>
+      <holder>Ericsson AB. All Rights Reserved.</holder>
+    </copyright>
+    <legalnotice>
+      The contents of this file are subject to the Erlang Public License,
+      Version 1.1, (the "License"); you may not use this file except in
+      compliance with the License. You should have received a copy of the
+      Erlang Public License along with this software. If not, it can be
+      retrieved online at http://www.erlang.org/.
+    
+      Software distributed under the License is distributed on an "AS IS"
+      basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
+      the License for the specific language governing rights and limitations
+      under the License.
+    
+    </legalnotice>
+
+    <title>Distributed Applications</title>
+    <prepared></prepared>
+    <docno></docno>
+    <date></date>
+    <rev></rev>
+    <file>distributed_applications.xml</file>
+  </header>
+
+  <section>
+    <title>Definition</title>
+    <p>In a distributed system with several Erlang nodes, there may be
+      a need to control applications in a distributed manner. If
+      the node, where a certain application is running, goes down,
+      the application should be restarted at another node.</p>
+    <p>Such an application is called a <em>distributed application</em>.
+      Note that it is the control of the application which is
+      distributed, all applications can of course be distributed in
+      the sense that they, for example, use services on other nodes.</p>
+    <p>Because a distributed application may move between nodes, some
+      addressing mechanism is required to ensure that it can be
+      addressed by other applications, regardless on which node it
+      currently executes. This issue is not addressed here, but the
+      Kernel module <c>global</c> or STDLIB module <c>pg</c> can be
+      used for this purpose.</p>
+  </section>
+
+  <section>
+    <title>Specifying Distributed Applications</title>
+    <p>Distributed applications are controlled by both the application
+      controller and a distributed application controller process,
+      <c>dist_ac</c>. Both these processes are part of the <c>kernel</c>
+      application. Therefore, distributed applications are specified by
+      configuring the <c>kernel</c> application, using the following
+      configuration parameter (see also <c>kernel(6)</c>):</p>
+    <taglist>
+      <tag><c>distributed = [{Application, [Timeout,] NodeDesc}]</c></tag>
+      <item>
+        <p>Specifies where the application <c>Application = atom()</c>
+          may execute. <c>NodeDesc = [Node | {Node,...,Node}]</c> is
+          a list of node names in priority order. The order between
+          nodes in a tuple is undefined.</p>
+        <p><c>Timeout = integer()</c> specifies how many milliseconds to
+          wait before restarting the application at another node.
+          Defaults to 0.</p>
+      </item>
+    </taglist>
+    <p>For distribution of application control to work properly,
+      the nodes where a distributed application may run must contact
+      each other and negotiate where to start the application. This is
+      done using the following <c>kernel</c> configuration parameters:</p>
+    <taglist>
+      <tag><c>sync_nodes_mandatory = [Node]</c></tag>
+      <item>Specifies which other nodes must be started (within
+       the timeout specified by <c>sync_nodes_timeout</c>.</item>
+      <tag><c>sync_nodes_optional = [Node]</c></tag>
+      <item>Specifies which other nodes can be started (within
+       the timeout specified by <c>sync_nodes_timeout</c>.</item>
+      <tag><c>sync_nodes_timeout = integer() | infinity</c></tag>
+      <item>Specifies how many milliseconds to wait for the other nodes
+       to start.</item>
+    </taglist>
+    <p>When started, the node will wait for all nodes specified by
+      <c>sync_nodes_mandatory</c> and <c>sync_nodes_optional</c> to
+      come up. When all nodes have come up, or when all mandatory nodes
+      have come up and the time specified by <c>sync_nodes_timeout</c>
+      has elapsed, all applications will be started. If not all
+      mandatory nodes have come up, the node will terminate.</p>
+    <p>Example: An application <c>myapp</c> should run at the node
+      <c>cp1@cave</c>. If this node goes down, <c>myapp</c> should
+      be restarted at <c>cp2@cave</c> or <c>cp3@cave</c>. A system
+      configuration file <c>cp1.config</c> for <c>cp1@cave</c> could
+      look like:</p>
+    <code type="none">
+[{kernel,
+  [{distributed, [{myapp, 5000, [cp1@cave, {cp2@cave, cp3@cave}]}]},
+   {sync_nodes_mandatory, [cp2@cave, cp3@cave]},
+   {sync_nodes_timeout, 5000}
+  ]
+ }
+].</code>
+    <p>The system configuration files for <c>cp2@cave</c> and
+      <c>cp3@cave</c> are identical, except for the list of mandatory
+      nodes which should be <c>[cp1@cave, cp3@cave]</c> for
+      <c>cp2@cave</c> and <c>[cp1@cave, cp2@cave]</c> for
+      <c>cp3@cave</c>.</p>
+    <note>
+      <p>All involved nodes must have the same value for
+        <c>distributed</c> and <c>sync_nodes_timeout</c>, or
+        the behaviour of the system is undefined.</p>
+    </note>
+  </section>
+
+  <section>
+    <title>Starting and Stopping Distributed Applications</title>
+    <p>When all involved (mandatory) nodes have been started,
+      the distributed application can be started by calling
+      <c>application:start(Application)</c> at <em>all of these nodes.</em></p>
+    <p>It is of course also possible to use a boot script (see
+      <seealso marker="release_structure">Releases</seealso>) which
+      automatically starts the application.</p>
+    <p>The application will be started at the first node, specified
+      by the <c>distributed</c> configuration parameter, which is up
+      and running. The application is started as usual. That is, an
+      application master is created and calls the application callback
+      function:</p>
+    <code type="none">
+Module:start(normal, StartArgs)</code>
+    <p>Example: Continuing the example from the previous section,
+      the three nodes are started, specifying the system configuration
+      file:</p>
+    <pre>
+> <input>erl -sname cp1 -config cp1</input>
+> <input>erl -sname cp2 -config cp2</input>
+> <input>erl -sname cp3 -config cp3</input></pre>
+    <p>When all nodes are up and running, <c>myapp</c> can be started.
+      This is achieved by calling <c>application:start(myapp)</c> at
+      all three nodes. It is then started at <c>cp1</c>, as shown in
+      the figure below.</p>
+    <marker id="dist1"></marker>
+    <image file="../design_principles/dist1.gif">
+      <icaption>Application myapp - Situation 1</icaption>
+    </image>
+    <p>Similarly, the application must be stopped by calling
+      <c>application:stop(Application)</c> at all involved nodes.</p>
+  </section>
+
+  <section>
+    <title>Failover</title>
+    <p>If the node where the application is running goes down,
+      the application is restarted (after the specified timeout) at
+      the first node, specified by the <c>distributed</c> configuration
+      parameter, which is up and running. This is called a
+      <em>failover</em>.</p>
+    <p>The application is started the normal way at the new node,
+      that is, by the application master calling:</p>
+    <code type="none">
+Module:start(normal, StartArgs)</code>
+    <p>Exception: If the application has the <c>start_phases</c> key
+      defined (see <seealso marker="included_applications">Included Applications</seealso>), then the application is instead started
+      by calling:</p>
+    <code type="none">
+Module:start({failover, Node}, StartArgs)</code>
+    <p>where <c>Node</c> is the terminated node.</p>
+    <p>Example: If <c>cp1</c> goes down, the system checks which one of
+      the other nodes, <c>cp2</c> or <c>cp3</c>, has the least number of
+      running applications, but waits for 5 seconds for <c>cp1</c> to
+      restart. If <c>cp1</c> does not restart and <c>cp2</c> runs fewer
+      applications than <c>cp3,</c> then <c>myapp</c> is restarted on
+      <c>cp2</c>.</p>
+    <marker id="dist2"></marker>
+    <image file="../design_principles/dist2.gif">
+      <icaption>Application myapp - Situation 2</icaption>
+    </image>
+    <p>Suppose now that <c>cp2</c> goes down as well and does not
+      restart within 5 seconds. <c>myapp</c> is now restarted on
+      <c>cp3</c>.</p>
+    <marker id="dist3"></marker>
+    <image file="../design_principles/dist3.gif">
+      <icaption>Application myapp - Situation 3</icaption>
+    </image>
+  </section>
+
+  <section>
+    <title>Takeover</title>
+    <p>If a node is started, which has higher priority according
+      to <c>distributed</c>, than the node where a distributed
+      application is currently running, the application will be
+      restarted at the new node and stopped at the old node. This is
+      called a <em>takeover</em>.</p>
+    <p>The application is started by the application master calling:</p>
+    <code type="none">
+Module:start({takeover, Node}, StartArgs)</code>
+    <p>where <c>Node</c> is the old node.</p>
+    <p>Example: If <c>myapp</c> is running at <c>cp3</c>, and if
+      <c>cp2</c> now restarts, it will not restart <c>myapp</c>,
+      because the order between nodes <c>cp2</c> and <c>cp3</c> is
+      undefined.</p>
+    <marker id="dist4"></marker>
+    <image file="../design_principles/dist4.gif">
+      <icaption>Application myapp - Situation 4</icaption>
+    </image>
+    <p>However, if <c>cp1</c> restarts as well, the function
+      <c>application:takeover/2</c> moves <c>myapp</c> to <c>cp1</c>,
+      because <c>cp1</c> has a higher priority than <c>cp3</c> for this
+      application.  In this case,
+      <c>Module:start({takeover, cp3@cave}, StartArgs)</c> is executed
+      at <c>cp1</c> to start the application.</p>
+    <marker id="dist5"></marker>
+    <image file="../design_principles/dist5.gif">
+      <icaption>Application myapp - Situation 5</icaption>
+    </image>
+  </section>
+</chapter>
+