<?xml version="1.0" encoding="latin1" ?>
<!DOCTYPE chapter SYSTEM "chapter.dtd">

<chapter>
  <header>
    <copyright>
      <year>1999</year><year>2010</year>
      <holder>Ericsson AB. All Rights Reserved.</holder>
    </copyright>
    <legalnotice>
      The contents of this file are subject to the Erlang Public License,
      Version 1.1, (the "License"); you may not use this file except in
      compliance with the License. You should have received a copy of the
      Erlang Public License along with this software. If not, it can be
      retrieved online at http://www.erlang.org/.

      Software distributed under the License is distributed on an "AS IS"
      basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
      the License for the specific language governing rights and limitations
      under the License.

    </legalnotice>

    <title>How to interpret the Erlang crash dumps</title>
    <prepared>Patrik Nyblom</prepared>
    <responsible></responsible>
    <docno></docno>
    <approved></approved>
    <checked></checked>
    <date>1999-11-11</date>
    <rev>PA1</rev>
    <file>crash_dump.xml</file>
  </header>
  <p>This document describes the <c><![CDATA[erl_crash.dump]]></c> file generated
    upon abnormal exit of the Erlang runtime system.</p>
  <p><em>Important:</em> For OTP release R9C the Erlang crash dump has
    had a major facelift. This means that the information in this
    document will not be directly applicable for older dumps. However,
    if you use the Crashdump Viewer tool on older dumps, the crash
    dumps are translated into a format similar to this.</p>
  <p>The system will write the crash dump in the current directory of
    the emulator or in the file pointed out by the environment variable
    (whatever that means on the current operating system)
    ERL_CRASH_DUMP. For a crash dump to be written, there has to be a
    writable file system mounted.</p>
  <p>Crash dumps are written mainly for one of two reasons: either the
    builtin function <c><![CDATA[erlang:halt/1]]></c> is called explicitly with a
    string argument from running Erlang code, or else the runtime
    system has detected an error that cannot be handled. The most
    usual reason that the system can't handle the error is that the
    cause is external limitations, such as running out of memory. A
    crash dump due to an internal error may be caused by the system
    reaching limits in the emulator itself (like the number of atoms
    in the system, or too many simultaneous ets tables). Usually the
    emulator or the operating system can be reconfigured to avoid the
    crash, which is why interpreting the crash dump correctly is
    important.</p>
  <p>The erlang crash dump is a readable text file, but it might not be
    very easy to read. Using the Crashdump Viewer tool in the
    <c><![CDATA[observer]]></c> application will simplify the task. This is an
    HTML based tool for browsing Erlang crash dumps.</p>

  <section>
    <marker id="general_info"></marker>
    <title>General information</title>
    <p>The first part of the dump shows the creation time for the dump,
      a slogan indicating the reason for the dump, the system version,
      of the node from which the dump originates, the compile time of
      the emulator running the originating node and the number of
      atoms in the atom table.
      </p>

    <section>
      <title>Reasons for crash dumps (slogan)</title>
      <p>The reason for the dump is noted in the beginning of the file
        as <em>Slogan: &lt;reason&gt;</em> (the word "slogan" has historical
        roots). If the system is halted by the BIF
        <c><![CDATA[erlang:halt/1]]></c>, the slogan is the string parameter
        passed to the BIF, otherwise it is a description generated by
        the emulator or the (Erlang) kernel. Normally the message
        should be enough to understand the problem, but nevertheless
        some messages are described here. Note however that the
        suggested reasons for the crash are <em>only suggestions</em>. The exact reasons for the errors may vary
        depending on the local applications and the underlying
        operating system.</p>
      <list type="bulleted">
        <item>"<em>&lt;A&gt;</em>: Cannot allocate <em>&lt;N&gt;</em>
         bytes of memory (of type "<em>&lt;T&gt;</em>")." - The system
         has run out of memory. &lt;A&gt; is the allocator that failed
         to allocate memory, &lt;N&gt; is the number of bytes that
         &lt;A&gt; tried to allocate, and &lt;T&gt; is the memory block
         type that the memory was needed for. The most common case is
         that a process stores huge amounts of data. In this case
         &lt;T&gt; is most often <c><![CDATA[heap]]></c>, <c><![CDATA[old_heap]]></c>,
        <c><![CDATA[heap_frag]]></c>, or <c><![CDATA[binary]]></c>. For more information on
         allocators see
        <seealso marker="erts_alloc">erts_alloc(3)</seealso>.</item>
        <item>"<em>&lt;A&gt;</em>: Cannot reallocate <em>&lt;N&gt;</em>
         bytes of memory (of type "<em>&lt;T&gt;</em>")." - Same as
         above with the exception that memory was being reallocated
         instead of being allocated when the system ran out of memory.</item>
        <item>"Unexpected op code <em>N</em>" - Error in compiled
         code, <c><![CDATA[beam]]></c> file damaged or error in the compiler.</item>
        <item>"Module <em>Name</em> undefined" <c><![CDATA[|]]></c> "Function
        <em>Name</em> undefined" <c><![CDATA[|]]></c> "No function
        <em>Name</em>:<em>Name</em>/1" <c><![CDATA[|]]></c> "No function
        <em>Name</em>:start/2" - The kernel/stdlib applications are
         damaged or the start script is damaged.</item>
        <item>"Driver_select called with too large file descriptor
        <c><![CDATA[N]]></c>" - The number of file descriptors for sockets
         exceed 1024 (Unix only). The limit on file-descriptors in
         some Unix flavors can be set to over 1024, but only 1024
         sockets/pipes can be used simultaneously by Erlang (due to
         limitations in the Unix <c><![CDATA[select]]></c> call).  The number of
         open regular files is not affected by this.</item>
        <item>"Received SIGUSR1" - The SIGUSR1 signal was sent to the
         Erlang machine (Unix only).</item>
        <item>"Kernel pid terminated (<em>Who</em>)
         (<em>Exit-reason</em>)" - The kernel supervisor has detected
         a failure, usually that the <c><![CDATA[application_controller]]></c>
         has shut down (<c><![CDATA[Who]]></c> = <c><![CDATA[application_controller]]></c>,
        <c><![CDATA[Why]]></c> = <c><![CDATA[shutdown]]></c>).  The application controller
         may have shut down for a number of reasons, the most usual
         being that the node name of the distributed Erlang node is
         already in use. A complete supervisor tree "crash" (i.e.,
         the top supervisors have exited) will give about the same
         result. This message comes from the Erlang code and not from
         the virtual machine itself. It is always due to some kind of
         failure in an application, either within OTP or a
         "user-written" one. Looking at the error log for your
         application is probably the first step to take.</item>
        <item>"Init terminating in do_boot ()" - The primitive Erlang boot
         sequence was terminated, most probably because the boot
         script has errors or cannot be read. This is usually a
         configuration error - the system may have been started with
         a faulty <c><![CDATA[-boot]]></c> parameter or with a boot script from
         the wrong version of OTP.</item>
        <item>"Could not start kernel pid (<em>Who</em>) ()" - One of the
         kernel processes could not start. This is probably due to
         faulty arguments (like errors in a <c><![CDATA[-config]]></c> argument)
         or faulty configuration files. Check that all files are in
         their correct location and that the configuration files (if
         any) are not damaged. Usually there are also messages
         written to the controlling terminal and/or the error log
         explaining what's wrong.</item>
      </list>
      <p>Other errors than the ones mentioned above may occur, as the
        <c><![CDATA[erlang:halt/1]]></c> BIF may generate any message. If the
        message is not generated by the BIF and does not occur in the
        list above, it may be due to an error in the emulator. There
        may however be unusual messages that I haven't mentioned, that
        still are connected to an application failure. There is a lot
        more information available, so more thorough reading of the
        crash dump may reveal the crash reason. The size of processes,
        the number of ets tables and the Erlang data on each process
        stack can be useful for tracking down the problem.</p>
    </section>

    <section>
      <title>Number of atoms</title>
      <p>The number of atoms in the system at the time of the crash is
        shown as <em>Atoms: &lt;number&gt;</em>. Some ten thousands atoms is
        perfectly normal, but more could indicate that the BIF
        <c><![CDATA[erlang:list_to_atom/1]]></c> is used to dynamically generate a
        lot of <em>different</em> atoms, which is never a good idea.</p>
    </section>
  </section>

  <section>
    <marker id="memory"></marker>
    <title>Memory information</title>
    <p>Under the tag <em>=memory</em> you will find information similar
      to what you can obtain on a living node with
      <seealso marker="erts:erlang#erlang:memory/0">erlang:memory()</seealso>.</p>
  </section>

  <section>
    <marker id="internal_tables"></marker>
    <title>Internal table information</title>
    <p>The tags <em>=hash_table:&lt;table_name&gt;</em> and
      <em>=index_table:&lt;table_name&gt;</em> presents internal
      tables. These are mostly of interest for runtime system
      developers.</p>
  </section>

  <section>
    <marker id="allocated_areas"></marker>
    <title>Allocated areas</title>
    <p>Under the tag <em>=allocated_areas</em> you will find information
      similar to what you can obtain on a living node with
      <seealso marker="erts:erlang#system_info_allocated_areas">erlang:system_info(allocated_areas)</seealso>.</p>
  </section>

  <section>
    <marker id="allocator"></marker>
    <title>Allocator</title>
    <p>Under the tag <em>=allocator:&lt;A&gt;</em> you will find
      various information about allocator &lt;A&gt;. The information
      is similar to what you can obtain on a living node with
      <seealso marker="erts:erlang#system_info_allocator_tuple">erlang:system_info({allocator, &lt;A&gt;})</seealso>.
      For more information see the documentation of
      <seealso marker="erts:erlang#system_info_allocator_tuple">erlang:system_info({allocator, &lt;A&gt;})</seealso>,
      and the
      <seealso marker="erts_alloc">erts_alloc(3)</seealso>
      documentation.</p>
  </section>

  <section>
    <marker id="processes"></marker>
    <title>Process information</title>
    <p>The Erlang crashdump contains a listing of each living Erlang
      process in the system. The process information for one process
      may look like this (line numbers have been added):
      </p>
    <p>The following fields can exist for a process:</p>
    <taglist>
      <tag><em>=proc:&lt;pid&gt;</em></tag>
      <item>Heading, states the process identifier</item>
      <tag><em>State</em></tag>
      <item>
        <p>The state of the process. This can be one of the following:</p>
        <list type="bulleted">
          <item><em>Scheduled</em> - The process was scheduled to run
           but not currently running ("in the run queue").</item>
          <item><em>Waiting</em> - The process was waiting for
           something (in <c><![CDATA[receive]]></c>).</item>
          <item><em>Running</em> - The process was currently
           running. If the BIF <c><![CDATA[erlang:halt/1]]></c> was called, this was
           the process calling it.</item>
          <item><em>Exiting</em> - The process was on its way to
           exit.</item>
          <item><em>Garbing</em> - This is bad luck, the process was
           garbage collecting when the crash dump was written, the rest
           of the information for this process is limited.</item>
          <item><em>Suspended</em> - The process is suspended, either
           by the BIF <c><![CDATA[erlang:suspend_process/1]]></c> or because it is
           trying to write to a busy port.</item>
        </list>
      </item>
      <tag><em>Registered name</em></tag>
      <item>The registered name of the process, if any.</item>
      <tag><em>Spawned as</em></tag>
      <item>The entry point of the process, i.e., what function was
       referenced in the <c><![CDATA[spawn]]></c> or <c><![CDATA[spawn_link]]></c> call that
       started the process.</item>
      <tag><em>Last scheduled in for | Current call</em></tag>
      <item>The current function of the process. These fields will not
       always exist.</item>
      <tag><em>Spawned by</em></tag>
      <item>The parent of the process, i.e. the process which executed
      <c><![CDATA[spawn]]></c> or <c><![CDATA[spawn_link]]></c>.</item>
      <tag><em>Started</em></tag>
      <item>The date and time when the process was started.</item>
      <tag><em>Message queue length</em></tag>
      <item>The number of messages in the process' message queue.</item>
      <tag><em>Number of heap fragments</em></tag>
      <item>The number of allocated heap fragments.</item>
      <tag><em>Heap fragment data</em></tag>
      <item>Size of fragmented heap data. This is data either created by
       messages being sent to the process or by the Erlang BIFs. This
       amount depends on so many things that this field is utterly
       uninteresting.</item>
      <tag><em>Link list</em></tag>
      <item>Process id's of processes linked to this one. May also contain
       ports. If process monitoring is used, this field also tells in
       which direction the monitoring is in effect, i.e., a link
       being "to" a process tells you that the "current" process was
       monitoring the other and a link "from" a process tells you
       that the other process was monitoring the current one.</item>
      <tag><em>Reductions</em></tag>
      <item>The number of reductions consumed by the process.</item>
      <tag><em>Stack+heap</em></tag>
      <item>The size of the stack and heap (they share memory segment)</item>
      <tag><em>OldHeap</em></tag>
      <item>The size of the "old heap". The Erlang virtual machine uses
       generational garbage collection with two generations. There is
       one heap for new data items and one for the data that have
       survived two garbage collections. The assumption (which is
       almost always correct) is that data that survive two garbage
       collections can be "tenured" to a heap more seldom garbage
       collected, as they will live for a long period. This is a
       quite usual technique in virtual machines.  The sum of the
       heaps and stack together constitute most of the process's
       allocated memory.</item>
      <tag><em>Heap unused, OldHeap unused</em></tag>
      <item>The amount of unused memory on each heap. This information is
       usually useless.</item>
      <tag><em>Stack</em></tag>
      <item>If the system uses shared heap, the fields
      <em>Stack+heap</em>, <em>OldHeap</em>, <em>Heap unused</em>
       and <em>OldHeap unused</em> do not exist. Instead this field
       presents the size of the process' stack.</item>
      <tag><em>Program counter</em></tag>
      <item>The current instruction pointer. This is only interesting for
       runtime system developers. The function into which the program
       counter points is the current function of the process.</item>
      <tag><em>CP</em></tag>
      <item>The continuation pointer, i.e. the return address for the
       current call. Usually useless for other than runtime system
       developers. This may be followed by the function into which
       the CP points, which is the function calling the current
       function.</item>
      <tag><em>Arity</em></tag>
      <item>The number of live argument registers. The argument registers,
       if any are live, will follow. These may contain the arguments
       of the function if they are not yet moved to the stack.</item>
    </taglist>
    <p>See also the section about <seealso marker="#proc_data">process data</seealso>.</p>
  </section>

  <section>
    <marker id="ports"></marker>
    <title>Port information</title>
    <p>This section lists the open ports, their owners, any linked
      processed, and the name of their driver or external process.</p>
  </section>

  <section>
    <marker id="ets_tables"></marker>
    <title>ETS tables</title>
    <p>This section contains information about all the ETS tables in
      the system. The following fields are interesting for each table:</p>
    <taglist>
      <tag><em>=ets:&lt;owner&gt;</em></tag>
      <item>Heading, states the owner of the table (a process identifier)</item>
      <tag><em>Table</em></tag>
      <item>The identifier for the table. If the table is a
      <c><![CDATA[named_table]]></c>, this is the name.</item>
      <tag><em>Name</em></tag>
      <item>The name of the table, regardless of whether it is a
      <c><![CDATA[named_table]]></c> or not.</item>
      <tag><em>Buckets</em></tag>
      <item>This occurs if the table is a hash table, i.e. if it is not an
      <c><![CDATA[ordered_set]]></c>.</item>
      <tag><em>Ordered set (AVL tree), Elements</em></tag>
      <item>This occurs only if the table is an <c><![CDATA[ordered_set]]></c>. (The
       number of elements is the same as the number of objects in the
       table.)</item>
      <tag><em>Objects</em></tag>
      <item>The number of objects in the table</item>
      <tag><em>Words</em></tag>
      <item>The number of words (usually 4 bytes/word) allocated to data
       in the table.</item>
    </taglist>
  </section>

  <section>
    <marker id="timers"></marker>
    <title>Timers</title>
    <p>This section contains information about all the timers started
      with the BIFs <c><![CDATA[erlang:start_timer/3]]></c> and
      <c><![CDATA[erlang:send_after/3]]></c>. The following fields exists for each
      timer:</p>
    <taglist>
      <tag><em>=timer:&lt;owner&gt;</em></tag>
      <item>Heading, states the owner of the timer (a process identifier)
       i.e. the process to receive the message when the timer
       expires.</item>
      <tag><em>Message</em></tag>
      <item>The message to be sent.</item>
      <tag><em>Time left</em></tag>
      <item>Number of milliseconds left until the message would have been
       sent.</item>
    </taglist>
  </section>

  <section>
    <marker id="distribution_info"></marker>
    <title>Distribution information</title>
    <p>If the Erlang node was alive, i.e., set up for communicating
      with other nodes, this section lists the connections that were
      active. The following fields can exist:</p>
    <taglist>
      <tag><em>=node:&lt;node_name&gt;</em></tag>
      <item>The name of the node</item>
      <tag><em>no_distribution</em></tag>
      <item>This will only occur if the node was not distributed.</item>
      <tag><em>=visible_node:&lt;channel&gt;</em></tag>
      <item>Heading for a visible nodes, i.e. an alive node with a
       connection to the node that crashed. States the channel number
       for the node.</item>
      <tag><em>=hidden_node:&lt;channel&gt;</em></tag>
      <item>Heading for a hidden node. A hidden node is the same as a
       visible node, except that it is started with the "-hidden"
       flag. States the channel number for the node.</item>
      <tag><em>=not_connected:&lt;channel&gt;</em></tag>
      <item>Heading for a node which is has been connected to the crashed
       node earlier. References (i.e. process or port identifiers)
       to the not connected node existed at the time of the crash.
       exist. States the channel number for the node.</item>
      <tag><em>Name</em></tag>
      <item>The name of the remote node.</item>
      <tag><em>Controller</em></tag>
      <item>The port which controls the communication with the remote node.</item>
      <tag><em>Creation</em></tag>
      <item>An integer (1-3) which together with the node name identifies
       a specific instance of the node.</item>
      <tag><em>Remote monitoring: &lt;local_proc&gt;  &lt;remote_proc&gt;</em></tag>
      <item>The local process was monitoring the remote process at the
       time of the crash.</item>
      <tag><em>Remotely monitored by: &lt;local_proc&gt;  &lt;remote_proc&gt;</em></tag>
      <item>The remote process was monitoring the local process at the
       time of the crash.</item>
      <tag><em>Remote link: &lt;local_proc&gt; &lt;remote_proc&gt;</em></tag>
      <item>A link existed between the local process and the remote
       process at the time of the crash.</item>
    </taglist>
  </section>

  <section>
    <marker id="loaded_modules"></marker>
    <title>Loaded module information</title>
    <p>This section contains information about all loaded modules.
      First, the memory usage by loaded code is summarized. There is
      one field for "Current code" which is code that is the current
      latest version of the modules. There is also a field for "Old
      code" which is code where there exists a newer version in the
      system, but the old version is not yet purged. The memory usage
      is in bytes.</p>
    <p>All loaded modules are then listed. The following fields exist:</p>
    <taglist>
      <tag><em>=mod:&lt;module_name&gt;</em></tag>
      <item>Heading, and the name of the module.</item>
      <tag><em>Current size</em></tag>
      <item>Memory usage for the loaded code in bytes</item>
      <tag><em>Old size</em></tag>
      <item>Memory usage for the old code, if any.</item>
      <tag><em>Current attributes</em></tag>
      <item>Module attributes for the current code. This field is decoded
       when looked at by the Crashdump Viewer tool.</item>
      <tag><em>Old attributes</em></tag>
      <item>Module attributes for the old code, if any. This field is
       decoded when looked at by the Crashdump Viewer tool.</item>
      <tag><em>Current compilation info</em></tag>
      <item>Compilation information (options) for the current code. This
       field is decoded when looked at by the Crashdump Viewer tool.</item>
      <tag><em>Old compilation info</em></tag>
      <item>Compilation information (options) for the old code, if
       any. This field is decoded when looked at by the Crashdump
       Viewer tool.</item>
    </taglist>
  </section>

  <section>
    <marker id="funs"></marker>
    <title>Fun information</title>
    <p>In this section, all funs are listed. The following fields exist
      for each fun:</p>
    <taglist>
      <tag><em>=fun</em></tag>
      <item>Heading</item>
      <tag><em>Module</em></tag>
      <item>The name of the module where the fun was defined.</item>
      <tag><em>Uniq, Index</em></tag>
      <item>Identifiers</item>
      <tag><em>Address</em></tag>
      <item>The address of the fun's code.</item>
      <tag><em>Native_address</em></tag>
      <item>The address of the fun's code when HiPE is enabled.</item>
      <tag><em>Refc</em></tag>
      <item>The number of references to the fun.</item>
    </taglist>
  </section>

  <section>
    <marker id="proc_data"></marker>
    <title>Process Data</title>
    <p>For each process there will be at least one <em>=proc_stack</em>
      and one <em>=proc_heap</em> tag followed by the raw memory
      information for the stack and heap of the process.</p>
    <p>For each process there will also be a <em>=proc_messages</em>
      tag if the process' message queue is non-empty and a
      <em>=proc_dictionary</em> tag if the process' dictionary (the
      <c><![CDATA[put/2]]></c> and <c><![CDATA[get/1]]></c> thing) is non-empty.</p>
    <p>The raw memory information can be decoded by the Crashdump
      Viewer tool. You will then be able to see the stack dump, the
      message queue (if any) and the dictionary (if any).</p>
    <p>The stack dump is a dump of the Erlang process stack. Most of
      the live data (i.e., variables currently in use) are placed on
      the stack; thus this can be quite interesting. One has to
      "guess" what's what, but as the information is symbolic,
      thorough reading of this information can be very useful. As an
      example we can find the state variable of the Erlang primitive
      loader on line <c><![CDATA[(5)]]></c> in the example below:</p>
    <code type="none"><![CDATA[
(1)  3cac44   Return addr 0x13BF58 (<terminate process normally>)
(2)  y(0)     ["/view/siri_r10_dev/clearcase/otp/erts/lib/kernel/ebin","/view/siri_r10_dev/
(3)  clearcase/otp/erts/lib/stdlib/ebin"]
(4)  y(1)     <0.1.0>
(5)  y(2)     {state,[],none,#Fun<erl_prim_loader.6.7085890>,undefined,#Fun<erl_prim_loader.7.9000327>,#Fun<erl_prim_loader.8.116480692>,#Port<0.2>,infinity,#Fun<erl_prim_loader.9.10708760>}
(6)  y(3)     infinity    ]]></code>
    <p>When interpreting the data for a process, it is helpful to know
      that anonymous function objects (funs) are given a name
      constructed from the name of the function in which they are
      created, and a number (starting with 0) indicating the number of
      that fun within that function.</p>
  </section>

  <section>
    <marker id="atoms"></marker>
    <title>Atoms</title>
    <p>Now all the atoms in the system are written. This is only
      interesting if one suspects that dynamic generation of atoms could
      be a problem, otherwise this section can be ignored.</p>
    <p>Note that the last created atom is printed first.</p>
  </section>

  <section>
    <title>Disclaimer</title>
    <p>The format of the crash dump evolves between releases of
      OTP. Some information here may not apply to your
      version. A description as this will never be complete; it is meant as
      an explanation of the crash dump in general and as a help
      when trying to find application errors, not as a complete
      specification.</p>
  </section>
</chapter>