aboutsummaryrefslogtreecommitdiffstats
path: root/lib/kernel/doc/src/heart.xml
blob: 2826d3d00a57b854938db9b95d96c59fc1237417 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
<?xml version="1.0" encoding="latin1" ?>
<!DOCTYPE erlref SYSTEM "erlref.dtd">

<erlref>
  <header>
    <copyright>
      <year>1996</year><year>2011</year>
      <holder>Ericsson AB. All Rights Reserved.</holder>
    </copyright>
    <legalnotice>
      The contents of this file are subject to the Erlang Public License,
      Version 1.1, (the "License"); you may not use this file except in
      compliance with the License. You should have received a copy of the
      Erlang Public License along with this software. If not, it can be
      retrieved online at http://www.erlang.org/.
    
      Software distributed under the License is distributed on an "AS IS"
      basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
      the License for the specific language governing rights and limitations
      under the License.
    
    </legalnotice>

    <title>heart</title>
    <prepared>Magnus Fr&ouml;berg</prepared>
    <docno></docno>
    <date>1998-01-28</date>
    <rev>A</rev>
  </header>
  <module>heart</module>
  <modulesummary>Heartbeat Monitoring of an Erlang Runtime System</modulesummary>
  <description>
    <p>This modules contains the interface to the <c>heart</c> process.
      <c>heart</c> sends periodic heartbeats to an external port
      program, which is also named <c>heart</c>. The purpose of
      the heart port program is to check that the Erlang runtime system
      it is supervising is still running. If the port program has not
      received any heartbeats within <c>HEART_BEAT_TIMEOUT</c> seconds
      (default is 60 seconds), the system can be rebooted. Also, if
      the system is equipped with a hardware watchdog timer and is
      running Solaris, the watchdog can be used to supervise the entire
      system.</p>
    <p>An Erlang runtime system to be monitored by a heart program,
      should be started with the command line flag <c>-heart</c> (see
      also <seealso marker="erts:erl">erl(1)</seealso>). The <c>heart</c>
      process is then started automatically:</p>
    <pre>
% <input>erl -heart ...</input></pre>
    <p>If the system should be rebooted because of missing heart-beats,
      or a terminated Erlang runtime system, the environment variable
      <c>HEART_COMMAND</c> has to be set before the system is started.
      If this variable is not set, a warning text will be printed but
      the system will not reboot. However, if the hardware watchdog is
      used, it will trigger a reboot <c>HEART_BEAT_BOOT_DELAY</c>
      seconds later nevertheless (default is 60).</p>
    <p>To reboot on the WINDOWS platform <c>HEART_COMMAND</c> can be
      set to <c>heart -shutdown</c> (included in the Erlang delivery)
      or of course to any other suitable program which can activate a
      reboot.</p>
    <p>The hardware watchdog will not be started under Solaris if
      the environment variable <c>HW_WD_DISABLE</c> is set.</p>
    <p>The <c>HEART_BEAT_TIMEOUT</c> and <c>HEART_BEAT_BOOT_DELAY</c>
      environment variables can be used to configure the heart timeouts,
      they can be set in the operating system shell before Erlang is
      started or be specified at the command line:</p>
    <pre>
% <input>erl -heart -env HEART_BEAT_TIMEOUT 30 ...</input></pre>
    <p>The value (in seconds) must be in the range 10 &lt; X &lt;= 65535.</p>
    <p>It should be noted that if the system clock is adjusted with
      more than <c>HEART_BEAT_TIMEOUT</c> seconds, <c>heart</c> will
      timeout and try to reboot the system. This can happen, for
      example, if the system clock is adjusted automatically by use of
      NTP (Network Time Protocol).</p>

  <p> If a crash occurs, an <c><![CDATA[erl_crash.dump]]></c> will <em>not</em> be written
	  unless the environment variable <c><![CDATA[ERL_CRASH_DUMP_SECONDS]]></c> is set.
  </p>

    <pre>
% <input>erl -heart -env ERL_CRASH_DUMP_SECONDS 10 ...</input></pre>
  <p>
	  Furthermore, <c><![CDATA[ERL_CRASH_DUMP_SECONDS]]></c> has the following behaviour on
	  <c>heart</c>:
  </p>
  <taglist>
	  <tag><c><![CDATA[ERL_CRASH_DUMP_SECONDS=0]]></c></tag>
	  <item><p>
			  Suppresses the writing a crash dump file entirely,
			  thus rebooting the runtime system immediately.
			  This is the same as not setting the environment variable.
		  </p>
	  </item>
	  <tag><c><![CDATA[ERL_CRASH_DUMP_SECONDS=-1]]></c></tag>
	  <item><p> Setting the environment variable to a negative value will not reboot
			  the runtime system until the crash dump file has been completly written.
		  </p>
	  </item>
	  <tag><c><![CDATA[ERL_CRASH_DUMP_SECONDS=S]]></c></tag>
	  <item><p>
			  Heart will wait for <c>S</c> seconds to let the crash dump file be written.
			  After <c>S</c> seconds <c>heart</c> will reboot the runtime system regardless of
			  the crash dump file has been written or not.
		  </p>
	  </item>
  </taglist>

    <p>In the following descriptions, all function fails with reason
      <c>badarg</c> if <c>heart</c> is not started.</p>
  </description>
  <funcs>
    <func>
      <name name="set_cmd" arity="1"/>
      <fsummary>Set a temporary reboot command</fsummary>
      <desc>
        <p>Sets a temporary reboot command. This command is used if 
          a <c>HEART_COMMAND</c> other than the one specified with
          the environment variable should be used in order to reboot
          the system. The new Erlang runtime system will (if it
          misbehaves) use the environment variable
          <c>HEART_COMMAND</c> to reboot.</p>
        <p>Limitations: The length of the <c><anno>Cmd</anno></c> command string
          must be less than 2047 characters.</p>
      </desc>
    </func>
    <func>
      <name name="clear_cmd" arity="0"/>
      <fsummary>Clear the temporary boot command</fsummary>
      <desc>
        <p>Clears the temporary boot command. If the system terminates,
          the normal <c>HEART_COMMAND</c> is used to reboot.</p>
      </desc>
    </func>
    <func>
      <name name="get_cmd" arity="0"/>
      <fsummary>Get the temporary reboot command</fsummary>
      <desc>
        <p>Get the temporary reboot command. If the command is cleared,
          the empty string will be returned.</p>
      </desc>
    </func>
  </funcs>
</erlref>