diff options
Diffstat (limited to 'erts/doc/src/crash_dump.xml')
-rw-r--r-- | erts/doc/src/crash_dump.xml | 518 |
1 files changed, 518 insertions, 0 deletions
diff --git a/erts/doc/src/crash_dump.xml b/erts/doc/src/crash_dump.xml new file mode 100644 index 0000000000..5182929358 --- /dev/null +++ b/erts/doc/src/crash_dump.xml @@ -0,0 +1,518 @@ +<?xml version="1.0" encoding="latin1" ?> +<!DOCTYPE chapter SYSTEM "chapter.dtd"> + +<chapter> + <header> + <copyright> + <year>1999</year><year>2009</year> + <holder>Ericsson AB. All Rights Reserved.</holder> + </copyright> + <legalnotice> + The contents of this file are subject to the Erlang Public License, + Version 1.1, (the "License"); you may not use this file except in + compliance with the License. You should have received a copy of the + Erlang Public License along with this software. If not, it can be + retrieved online at http://www.erlang.org/. + + Software distributed under the License is distributed on an "AS IS" + basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See + the License for the specific language governing rights and limitations + under the License. + + </legalnotice> + + <title>How to interpret the Erlang crash dumps</title> + <prepared>Patrik Nyblom</prepared> + <responsible></responsible> + <docno></docno> + <approved></approved> + <checked></checked> + <date>1999-11-11</date> + <rev>PA1</rev> + <file>crash_dump.xml</file> + </header> + <p>This document describes the <c><![CDATA[erl_crash.dump]]></c> file generated + upon abnormal exit of the Erlang runtime system.</p> + <p><em>Important:</em> For OTP release R9C the Erlang crash dump has + had a major facelift. This means that the information in this + document will not be directly applicable for older dumps. However, + if you use the Crashdump Viewer tool on older dumps, the crash + dumps are translated into a format similar to this.</p> + <p>The system will write the crash dump in the current directory of + the emulator or in the file pointed out by the environment variable + (whatever that means on the current operating system) + ERL_CRASH_DUMP. For a crash dump to be written, there has to be a + writable file system mounted.</p> + <p>Crash dumps are written mainly for one of two reasons: either the + builtin function <c><![CDATA[erlang:halt/1]]></c> is called explicitly with a + string argument from running Erlang code, or else the runtime + system has detected an error that cannot be handled. The most + usual reason that the system can't handle the error is that the + cause is external limitations, such as running out of memory. A + crash dump due to an internal error may be caused by the system + reaching limits in the emulator itself (like the number of atoms + in the system, or too many simultaneous ets tables). Usually the + emulator or the operating system can be reconfigured to avoid the + crash, which is why interpreting the crash dump correctly is + important.</p> + <p>The erlang crash dump is a readable text file, but it might not be + very easy to read. Using the Crashdump Viewer tool in the + <c><![CDATA[observer]]></c> application will simplify the task. This is an + HTML based tool for browsing Erlang crash dumps.</p> + + <section> + <marker id="general_info"></marker> + <title>General information</title> + <p>The first part of the dump shows the creation time for the dump, + a slogan indicating the reason for the dump, the system version, + of the node from which the dump originates, the compile time of + the emulator running the originating node and the number of + atoms in the atom table. + </p> + + <section> + <title>Reasons for crash dumps (slogan)</title> + <p>The reason for the dump is noted in the beginning of the file + as <em>Slogan: <reason></em> (the word "slogan" has historical + roots). If the system is halted by the BIF + <c><![CDATA[erlang:halt/1]]></c>, the slogan is the string parameter + passed to the BIF, otherwise it is a description generated by + the emulator or the (Erlang) kernel. Normally the message + should be enough to understand the problem, but nevertheless + some messages are described here. Note however that the + suggested reasons for the crash are <em>only suggestions</em>. The exact reasons for the errors may vary + depending on the local applications and the underlying + operating system.</p> + <list type="bulleted"> + <item>"<em><A></em>: Cannot allocate <em><N></em> + bytes of memory (of type "<em><T></em>")." - The system + has run out of memory. <A> is the allocator that failed + to allocate memory, <N> is the number of bytes that + <A> tried to allocate, and <T> is the memory block + type that the memory was needed for. The most common case is + that a process stores huge amounts of data. In this case + <T> is most often <c><![CDATA[heap]]></c>, <c><![CDATA[old_heap]]></c>, + <c><![CDATA[heap_frag]]></c>, or <c><![CDATA[binary]]></c>. For more information on + allocators see + <seealso marker="erts_alloc">erts_alloc(3)</seealso>.</item> + <item>"<em><A></em>: Cannot reallocate <em><N></em> + bytes of memory\011(of type "<em><T></em>")." - Same as + above with the exception that memory was being reallocated + instead of being allocated when the system ran out of memory.</item> + <item>"Unexpected op code <em>N</em>" - Error in compiled + code, <c><![CDATA[beam]]></c> file damaged or error in the compiler.</item> + <item>"Module <em>Name</em> undefined" <c><![CDATA[|]]></c> "Function + <em>Name</em> undefined" <c><![CDATA[|]]></c> "No function + <em>Name</em>:<em>Name</em>/1" <c><![CDATA[|]]></c> "No function + <em>Name</em>:start/2" - The kernel/stdlib applications are + damaged or the start script is damaged.</item> + <item>"Driver_select called with too large file descriptor + <c><![CDATA[N]]></c>" - The number of file descriptors for sockets + exceed 1024 (Unix only). The limit on file-descriptors in + some Unix flavors can be set to over 1024, but only 1024 + sockets/pipes can be used simultaneously by Erlang (due to + limitations in the Unix <c><![CDATA[select]]></c> call). The number of + open regular files is not affected by this.</item> + <item>"Received SIGUSR1" - The SIGUSR1 signal was sent to the + Erlang machine (Unix only).</item> + <item>"Kernel pid terminated (<em>Who</em>) + (<em>Exit-reason</em>)" - The kernel supervisor has detected + a failure, usually that the <c><![CDATA[application_controller]]></c> + has shut down (<c><![CDATA[Who]]></c> = <c><![CDATA[application_controller]]></c>, + <c><![CDATA[Why]]></c> = <c><![CDATA[shutdown]]></c>). The application controller + may have shut down for a number of reasons, the most usual + being that the node name of the distributed Erlang node is + already in use. A complete supervisor tree "crash" (i.e., + the top supervisors have exited) will give about the same + result. This message comes from the Erlang code and not from + the virtual machine itself. It is always due to some kind of + failure in an application, either within OTP or a + "user-written" one. Looking at the error log for your + application is probably the first step to take.</item> + <item>"Init terminating in do_boot ()" - The primitive Erlang boot + sequence was terminated, most probably because the boot + script has errors or cannot be read. This is usually a + configuration error - the system may have been started with + a faulty <c><![CDATA[-boot]]></c> parameter or with a boot script from + the wrong version of OTP.</item> + <item>"Could not start kernel pid (<em>Who</em>) ()" - One of the + kernel processes could not start. This is probably due to + faulty arguments (like errors in a <c><![CDATA[-config]]></c> argument) + or faulty configuration files. Check that all files are in + their correct location and that the configuration files (if + any) are not damaged. Usually there are also messages + written to the controlling terminal and/or the error log + explaining what's wrong.</item> + </list> + <p>Other errors than the ones mentioned above may occur, as the + <c><![CDATA[erlang:halt/1]]></c> BIF may generate any message. If the + message is not generated by the BIF and does not occur in the + list above, it may be due to an error in the emulator. There + may however be unusual messages that I haven't mentioned, that + still are connected to an application failure. There is a lot + more information available, so more thorough reading of the + crash dump may reveal the crash reason. The size of processes, + the number of ets tables and the Erlang data on each process + stack can be useful for tracking down the problem.</p> + </section> + + <section> + <title>Number of atoms</title> + <p>The number of atoms in the system at the time of the crash is + shown as <em>Atoms: <number></em>. Some ten thousands atoms is + perfectly normal, but more could indicate that the BIF + <c><![CDATA[erlang:list_to_atom/1]]></c> is used to dynamically generate a + lot of <em>different</em> atoms, which is never a good idea.</p> + </section> + </section> + + <section> + <marker id="memory"></marker> + <title>Memory information</title> + <p>Under the tag <em>=memory</em> you will find information similar + to what you can obtain on a living node with + <seealso marker="erts:erlang#erlang:memory/0">erlang:memory()</seealso>.</p> + </section> + + <section> + <marker id="internal_tables"></marker> + <title>Internal table information</title> + <p>The tags <em>=hash_table:<table_name></em> and + <em>=index_table:<table_name></em> presents internal + tables. These are mostly of interest for runtime system + developers.</p> + </section> + + <section> + <marker id="allocated_areas"></marker> + <title>Allocated areas</title> + <p>Under the tag <em>=allocated_areas</em> you will find information + similar to what you can obtain on a living node with + <seealso marker="erts:erlang#system_info_allocated_areas">erlang:system_info(allocated_areas)</seealso>.</p> + </section> + + <section> + <marker id="allocator"></marker> + <title>Allocator</title> + <p>Under the tag <em>=allocator:<A></em> you will find + various information about allocator <A>. The information + is similar to what you can obtain on a living node with + <seealso marker="erts:erlang#system_info_allocator_tuple">erlang:system_info({allocator, <A>})</seealso>. + For more information see the documentation of + <seealso marker="erts:erlang#system_info_allocator_tuple">erlang:system_info({allocator, <A>})</seealso>, + and the + <seealso marker="erts_alloc">erts_alloc(3)</seealso> + documentation.</p> + </section> + + <section> + <marker id="processes"></marker> + <title>Process information</title> + <p>The Erlang crashdump contains a listing of each living Erlang + process in the system. The process information for one process + may look like this (line numbers have been added): + </p> + <p>The following fields can exist for a process:</p> + <taglist> + <tag><em>=proc:<pid></em></tag> + <item>Heading, states the process identifier</item> + <tag><em>State</em></tag> + <item> + <p>The state of the process. This can be one of the following:</p> + <list type="bulleted"> + <item><em>Scheduled</em> - The process was scheduled to run + but not currently running ("in the run queue").</item> + <item><em>Waiting</em> - The process was waiting for + something (in <c><![CDATA[receive]]></c>).</item> + <item><em>Running</em> - The process was currently + running. If the BIF <c><![CDATA[erlang:halt/1]]></c> was called, this was + the process calling it.</item> + <item><em>Exiting</em> - The process was on its way to + exit.</item> + <item><em>Garbing</em> - This is bad luck, the process was + garbage collecting when the crash dump was written, the rest + of the information for this process is limited.</item> + <item><em>Suspended</em> - The process is suspended, either + by the BIF <c><![CDATA[erlang:suspend_process/1]]></c> or because it is + trying to write to a busy port.</item> + </list> + </item> + <tag><em>Registered name</em></tag> + <item>The registered name of the process, if any.</item> + <tag><em>Spawned as</em></tag> + <item>The entry point of the process, i.e., what function was + referenced in the <c><![CDATA[spawn]]></c> or <c><![CDATA[spawn_link]]></c> call that + started the process.</item> + <tag><em>Last scheduled in for | Current call</em></tag> + <item>The current function of the process. These fields will not + always exist.</item> + <tag><em>Spawned by</em></tag> + <item>The parent of the process, i.e. the process which executed + <c><![CDATA[spawn]]></c> or <c><![CDATA[spawn_link]]></c>.</item> + <tag><em>Started</em></tag> + <item>The date and time when the process was started.</item> + <tag><em>Message queue length</em></tag> + <item>The number of messages in the process' message queue.</item> + <tag><em>Number of heap fragments</em></tag> + <item>The number of allocated heap fragments.</item> + <tag><em>Heap fragment data</em></tag> + <item>Size of fragmented heap data. This is data either created by + messages being sent to the process or by the Erlang BIFs. This + amount depends on so many things that this field is utterly + uninteresting.</item> + <tag><em>Link list</em></tag> + <item>Process id's of processes linked to this one. May also contain + ports. If process monitoring is used, this field also tells in + which direction the monitoring is in effect, i.e., a link + being "to" a process tells you that the "current" process was + monitoring the other and a link "from" a process tells you + that the other process was monitoring the current one.</item> + <tag><em>Reductions</em></tag> + <item>The number of reductions consumed by the process.</item> + <tag><em>Stack+heap</em></tag> + <item>The size of the stack and heap (they share memory segment)</item> + <tag><em>OldHeap</em></tag> + <item>The size of the "old heap". The Erlang virtual machine uses + generational garbage collection with two generations. There is + one heap for new data items and one for the data that have + survived two garbage collections. The assumption (which is + almost always correct) is that data that survive two garbage + collections can be "tenured" to a heap more seldom garbage + collected, as they will live for a long period. This is a + quite usual technique in virtual machines. The sum of the + heaps and stack together constitute most of the process's + allocated memory.</item> + <tag><em>Heap unused, OldHeap unused</em></tag> + <item>The amount of unused memory on each heap. This information is + usually useless.</item> + <tag><em>Stack</em></tag> + <item>If the system uses shared heap, the fields + <em>Stack+heap</em>, <em>OldHeap</em>, <em>Heap unused</em> + and <em>OldHeap unused</em> do not exist. Instead this field + presents the size of the process' stack.</item> + <tag><em>Program counter</em></tag> + <item>The current instruction pointer. This is only interesting for + runtime system developers. The function into which the program + counter points is the current function of the process.</item> + <tag><em>CP</em></tag> + <item>The continuation pointer, i.e. the return address for the + current call. Usually useless for other than runtime system + developers. This may be followed by the function into which + the CP points, which is the function calling the current + function.</item> + <tag><em>Arity</em></tag> + <item>The number of live argument registers. The argument registers, + if any are live, will follow. These may contain the arguments + of the function if they are not yet moved to the stack.</item> + </taglist> + <p>See also the section about <seealso marker="#proc_data">process data</seealso>.</p> + </section> + + <section> + <marker id="ports"></marker> + <title>Port information</title> + <p>This section lists the open ports, their owners, any linked + processed, and the name of their driver or external process.</p> + </section> + + <section> + <marker id="ets_tables"></marker> + <title>ETS tables</title> + <p>This section contains information about all the ETS tables in + the system. The following fields are interesting for each table:</p> + <taglist> + <tag><em>=ets:<owner></em></tag> + <item>Heading, states the owner of the table (a process identifier)</item> + <tag><em>Table</em></tag> + <item>The identifier for the table. If the table is a + <c><![CDATA[named_table]]></c>, this is the name.</item> + <tag><em>Name</em></tag> + <item>The name of the table, regardless of whether it is a + <c><![CDATA[named_table]]></c> or not.</item> + <tag><em>Buckets</em></tag> + <item>This occurs if the table is a hash table, i.e. if it is not an + <c><![CDATA[ordered_set]]></c>.</item> + <tag><em>Ordered set (AVL tree), Elements</em></tag> + <item>This occurs only if the table is an <c><![CDATA[ordered_set]]></c>. (The + number of elements is the same as the number of objects in the + table.)</item> + <tag><em>Objects</em></tag> + <item>The number of objects in the table</item> + <tag><em>Words</em></tag> + <item>The number of words (usually 4 bytes/word) allocated to data + in the table.</item> + </taglist> + </section> + + <section> + <marker id="timers"></marker> + <title>Timers</title> + <p>This section contains information about all the timers started + with the BIFs <c><![CDATA[erlang:start_timer/3]]></c> and + <c><![CDATA[erlang:send_after/3]]></c>. The following fields exists for each + timer:</p> + <taglist> + <tag><em>=timer:<owner></em></tag> + <item>Heading, states the owner of the timer (a process identifier) + i.e. the process to receive the message when the timer + expires.</item> + <tag><em>Message</em></tag> + <item>The message to be sent.</item> + <tag><em>Time left</em></tag> + <item>Number of milliseconds left until the message would have been + sent.</item> + </taglist> + </section> + + <section> + <marker id="distribution_info"></marker> + <title>Distribution information</title> + <p>If the Erlang node was alive, i.e., set up for communicating + with other nodes, this section lists the connections that were + active. The following fields can exist:</p> + <taglist> + <tag><em>=node:<node_name></em></tag> + <item>The name of the node</item> + <tag><em>no_distribution</em></tag> + <item>This will only occur if the node was not distributed.</item> + <tag><em>=visible_node:<channel></em></tag> + <item>Heading for a visible nodes, i.e. an alive node with a + connection to the node that crashed. States the channel number + for the node.</item> + <tag><em>=hidden_node:<channel></em></tag> + <item>Heading for a hidden node. A hidden node is the same as a + visible node, except that it is started with the "-hidden" + flag. States the channel number for the node.</item> + <tag><em>=not_connected:<channel></em></tag> + <item>Heading for a node which is has been connected to the crashed + node earlier. References (i.e. process or port identifiers) + to the not connected node existed at the time of the crash. + exist. States the channel number for the node.</item> + <tag><em>Name</em></tag> + <item>The name of the remote node.</item> + <tag><em>Controller</em></tag> + <item>The port which controls the communication with the remote node.</item> + <tag><em>Creation</em></tag> + <item>An integer (1-3) which together with the node name identifies + a specific instance of the node.</item> + <tag><em>Remote monitoring: <local_proc> <remote_proc></em></tag> + <item>The local process was monitoring the remote process at the + time of the crash.</item> + <tag><em>Remotely monitored by: <local_proc> <remote_proc></em></tag> + <item>The remote process was monitoring the local process at the + time of the crash.</item> + <tag><em>Remote link: <local_proc> <remote_proc></em></tag> + <item>A link existed between the local process and the remote + process at the time of the crash.</item> + </taglist> + </section> + + <section> + <marker id="loaded_modules"></marker> + <title>Loaded module information</title> + <p>This section contains information about all loaded modules. + First, the memory usage by loaded code is summarized. There is + one field for "Current code" which is code that is the current + latest version of the modules. There is also a field for "Old + code" which is code where there exists a newer version in the + system, but the old version is not yet purged. The memory usage + is in bytes.</p> + <p>All loaded modules are then listed. The following fields exist:</p> + <taglist> + <tag><em>=mod:<module_name></em></tag> + <item>Heading, and the name of the module.</item> + <tag><em>Current size</em></tag> + <item>Memory usage for the loaded code in bytes</item> + <tag><em>Old size</em></tag> + <item>Memory usage for the old code, if any.</item> + <tag><em>Current attributes</em></tag> + <item>Module attributes for the current code. This field is decoded + when looked at by the Crashdump Viewer tool.</item> + <tag><em>Old attributes</em></tag> + <item>Module attributes for the old code, if any. This field is + decoded when looked at by the Crashdump Viewer tool.</item> + <tag><em>Current compilation info</em></tag> + <item>Compilation information (options) for the current code. This + field is decoded when looked at by the Crashdump Viewer tool.</item> + <tag><em>Old compilation info</em></tag> + <item>Compilation information (options) for the old code, if + any. This field is decoded when looked at by the Crashdump + Viewer tool.</item> + </taglist> + </section> + + <section> + <marker id="funs"></marker> + <title>Fun information</title> + <p>In this section, all funs are listed. The following fields exist + for each fun:</p> + <taglist> + <tag><em>=fun</em></tag> + <item>Heading</item> + <tag><em>Module</em></tag> + <item>The name of the module where the fun was defined.</item> + <tag><em>Uniq, Index</em></tag> + <item>Identifiers</item> + <tag><em>Address</em></tag> + <item>The address of the fun's code.</item> + <tag><em>Native_address</em></tag> + <item>The address of the fun's code when HiPE is enabled.</item> + <tag><em>Refc</em></tag> + <item>The number of references to the fun.</item> + </taglist> + </section> + + <section> + <marker id="proc_data"></marker> + <title>Process Data</title> + <p>For each process there will be at least one <em>=proc_stack</em> + and one <em>=proc_heap</em> tag followed by the raw memory + information for the stack and heap of the process.</p> + <p>For each process there will also be a <em>=proc_messages</em> + tag if the process' message queue is non-empty and a + <em>=proc_dictionary</em> tag if the process' dictionary (the + <c><![CDATA[put/2]]></c> and <c><![CDATA[get/1]]></c> thing) is non-empty.</p> + <p>The raw memory information can be decoded by the Crashdump + Viewer tool. You will then be able to see the stack dump, the + message queue (if any) and the dictionary (if any).</p> + <p>The stack dump is a dump of the Erlang process stack. Most of + the live data (i.e., variables currently in use) are placed on + the stack; thus this can be quite interesting. One has to + "guess" what's what, but as the information is symbolic, + thorough reading of this information can be very useful. As an + example we can find the state variable of the Erlang primitive + loader on line <c><![CDATA[(5)]]></c> in the example below:</p> + <code type="none"><![CDATA[ +(1) 3cac44 Return addr 0x13BF58 (<terminate process normally>) +(2) y(0) ["/view/siri_r10_dev/clearcase/otp/erts/lib/kernel/ebin","/view/siri_r10_dev/ +(3) clearcase/otp/erts/lib/stdlib/ebin"] +(4) y(1) <0.1.0> +(5) y(2) {state,[],none,#Fun<erl_prim_loader.6.7085890>,undefined,#Fun<erl_prim_loader.7.9000327>,#Fun<erl_prim_loader.8.116480692>,#Port<0.2>,infinity,#Fun<erl_prim_loader.9.10708760>} +(6) y(3) infinity ]]></code> + <p>When interpreting the data for a process, it is helpful to know + that anonymous function objects (funs) are given a name + constructed from the name of the function in which they are + created, and a number (starting with 0) indicating the number of + that fun within that function.</p> + </section> + + <section> + <marker id="atoms"></marker> + <title>Atoms</title> + <p>Now all the atoms in the system are written. This is only + interesting if one suspects that dynamic generation of atoms could + be a problem, otherwise this section can be ignored.</p> + <p>Note that the last created atom is printed first.</p> + </section> + + <section> + <title>Disclaimer</title> + <p>The format of the crash dump evolves between releases of + OTP. Some information here may not apply to your + version. A description as this will never be complete; it is meant as + an explanation of the crash dump in general and as a help + when trying to find application errors, not as a complete + specification.</p> + </section> +</chapter> + |