<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE chapter SYSTEM "chapter.dtd"> <chapter> <header> <copyright> <year>1999</year><year>2013</year> <holder>Ericsson AB. All Rights Reserved.</holder> </copyright> <legalnotice> The contents of this file are subject to the Erlang Public License, Version 1.1, (the "License"); you may not use this file except in compliance with the License. You should have received a copy of the Erlang Public License along with this software. If not, it can be retrieved online at http://www.erlang.org/. Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. </legalnotice> <title>How to interpret the Erlang crash dumps</title> <prepared>Patrik Nyblom</prepared> <responsible></responsible> <docno></docno> <approved></approved> <checked></checked> <date>1999-11-11</date> <rev>PA1</rev> <file>crash_dump.xml</file> </header> <p>This document describes the <c><![CDATA[erl_crash.dump]]></c> file generated upon abnormal exit of the Erlang runtime system.</p> <p><em>Important:</em> For OTP release R9C the Erlang crash dump has had a major facelift. This means that the information in this document will not be directly applicable for older dumps. However, if you use the Crashdump Viewer tool on older dumps, the crash dumps are translated into a format similar to this.</p> <p>The system will write the crash dump in the current directory of the emulator or in the file pointed out by the environment variable (whatever that means on the current operating system) ERL_CRASH_DUMP. For a crash dump to be written, there has to be a writable file system mounted.</p> <p>Crash dumps are written mainly for one of two reasons: either the builtin function <c><![CDATA[erlang:halt/1]]></c> is called explicitly with a string argument from running Erlang code, or else the runtime system has detected an error that cannot be handled. The most usual reason that the system can't handle the error is that the cause is external limitations, such as running out of memory. A crash dump due to an internal error may be caused by the system reaching limits in the emulator itself (like the number of atoms in the system, or too many simultaneous ets tables). Usually the emulator or the operating system can be reconfigured to avoid the crash, which is why interpreting the crash dump correctly is important.</p> <p>On systems that support OS signals, it is also possible to stop the runtime system and generate a crash dump by sending the SIGUSR1.</p> <p>The erlang crash dump is a readable text file, but it might not be very easy to read. Using the Crashdump Viewer tool in the <c><![CDATA[observer]]></c> application will simplify the task. This is an wx-widget based tool for browsing Erlang crash dumps.</p> <section> <marker id="general_info"></marker> <title>General information</title> <p>The first part of the dump shows the creation time for the dump, a slogan indicating the reason for the dump, the system version, of the node from which the dump originates, the compile time of the emulator running the originating node, the number of atoms in the atom table and the runtime system thread that caused the crash dump to happen. </p> <section> <title>Reasons for crash dumps (slogan)</title> <p>The reason for the dump is noted in the beginning of the file as <em>Slogan: <reason></em> (the word "slogan" has historical roots). If the system is halted by the BIF <c><![CDATA[erlang:halt/1]]></c>, the slogan is the string parameter passed to the BIF, otherwise it is a description generated by the emulator or the (Erlang) kernel. Normally the message should be enough to understand the problem, but nevertheless some messages are described here. Note however that the suggested reasons for the crash are <em>only suggestions</em>. The exact reasons for the errors may vary depending on the local applications and the underlying operating system.</p> <list type="bulleted"> <item>"<em><A></em>: Cannot allocate <em><N></em> bytes of memory (of type "<em><T></em>", thread <em><I></em>em>)." - The system has run out of memory. <A> is the allocator that failed to allocate memory, <N> is the number of bytes that <A> tried to allocate, <T> is the memory block type that the memory was needed for, and <I> is the thread identifier. The most common case is that a process stores huge amounts of data. In this case <T> is most often <c><![CDATA[heap]]></c>, <c><![CDATA[old_heap]]></c>, <c><![CDATA[heap_frag]]></c>, or <c><![CDATA[binary]]></c>. For more information on allocators see <seealso marker="erts_alloc">erts_alloc(3)</seealso>.</item> <item>"<em><A></em>: Cannot reallocate <em><N></em> bytes of memory (of type "<em><T></em>", thread <em><I></em>em>)." - Same as above with the exception that memory was being reallocated instead of being allocated when the system ran out of memory.</item> <item>"Unexpected op code <em>N</em>" - Error in compiled code, <c><![CDATA[beam]]></c> file damaged or error in the compiler.</item> <item>"Module <em>Name</em> undefined" <c><![CDATA[|]]></c> "Function <em>Name</em> undefined" <c><![CDATA[|]]></c> "No function <em>Name</em>:<em>Name</em>/1" <c><![CDATA[|]]></c> "No function <em>Name</em>:start/2" - The kernel/stdlib applications are damaged or the start script is damaged.</item> <item>"Driver_select called with too large file descriptor <c><![CDATA[N]]></c>" - The number of file descriptors for sockets exceed 1024 (Unix only). The limit on file-descriptors in some Unix flavors can be set to over 1024, but only 1024 sockets/pipes can be used simultaneously by Erlang (due to limitations in the Unix <c><![CDATA[select]]></c> call). The number of open regular files is not affected by this.</item> <item>"Received SIGUSR1" - Sending the SIGUSR1 signal to a Erlang machine (Unix only) forces a crash dump. This slogan reflects that the Erlang machine crash-dumped due to receiving that signal.</item> <item>"Kernel pid terminated (<em>Who</em>) (<em>Exit-reason</em>)" - The kernel supervisor has detected a failure, usually that the <c><![CDATA[application_controller]]></c> has shut down (<c><![CDATA[Who]]></c> = <c><![CDATA[application_controller]]></c>, <c><![CDATA[Why]]></c> = <c><![CDATA[shutdown]]></c>). The application controller may have shut down for a number of reasons, the most usual being that the node name of the distributed Erlang node is already in use. A complete supervisor tree "crash" (i.e., the top supervisors have exited) will give about the same result. This message comes from the Erlang code and not from the virtual machine itself. It is always due to some kind of failure in an application, either within OTP or a "user-written" one. Looking at the error log for your application is probably the first step to take.</item> <item>"Init terminating in do_boot ()" - The primitive Erlang boot sequence was terminated, most probably because the boot script has errors or cannot be read. This is usually a configuration error - the system may have been started with a faulty <c><![CDATA[-boot]]></c> parameter or with a boot script from the wrong version of OTP.</item> <item>"Could not start kernel pid (<em>Who</em>) ()" - One of the kernel processes could not start. This is probably due to faulty arguments (like errors in a <c><![CDATA[-config]]></c> argument) or faulty configuration files. Check that all files are in their correct location and that the configuration files (if any) are not damaged. Usually there are also messages written to the controlling terminal and/or the error log explaining what's wrong.</item> </list> <p>Other errors than the ones mentioned above may occur, as the <c><![CDATA[erlang:halt/1]]></c> BIF may generate any message. If the message is not generated by the BIF and does not occur in the list above, it may be due to an error in the emulator. There may however be unusual messages that I haven't mentioned, that still are connected to an application failure. There is a lot more information available, so more thorough reading of the crash dump may reveal the crash reason. The size of processes, the number of ets tables and the Erlang data on each process stack can be useful for tracking down the problem.</p> </section> <section> <title>Number of atoms</title> <p>The number of atoms in the system at the time of the crash is shown as <em>Atoms: <number></em>. Some ten thousands atoms is perfectly normal, but more could indicate that the BIF <c><![CDATA[erlang:list_to_atom/1]]></c> is used to dynamically generate a lot of <em>different</em> atoms, which is never a good idea.</p> </section> </section> <section> <marker id="scheduler"></marker> <title>Scheduler information</title> <p>Under the tag <em>=scheduler</em> information about the current state and statistics of the schedulers in the runtime system is displayed. On OSs that do allow instant suspension of other threads, the data within this section will reflect what the runtime system looks like at the moment when the crash happens.</p> <p>The following fields can exist for a process:</p> <taglist> <tag><em>=scheduler:id</em></tag> <item>Header, states the scheduler identifier.</item> <tag><em>Scheduler Sleep Info Flags</em></tag> <item>If empty the scheduler was doing some work. If not empty the scheduler is either in some state of sleep, or suspended. This entry is only present in a SMP enabled emulator</item> <tag><em>Scheduler Sleep Info Aux Work</em></tag> <item>If not empty, a scheduler internal auxiliary work is scheduled to be done.</item> <tag><em>Current Port</em></tag> <item>The port identifier of the port that is currently being executed by the scheduler.</item> <tag><em>Current Process</em></tag> <item>The process identifier of the process that is currently being executed by the scheduler. If there is such a process this entry is followed by the <em>State</em>,<em>Internal State</em>, <em>Program Counter</em>, <em>CP</em> of that same process. See <seealso marker="#processes">Process Information</seealso> for a description what the different entries mean. Keep in mind that this is a snapshot of what the entries are exactly when the crash dump is starting to be generated. Therefore they will most likely be different (and more telling) then the entries for the same processes found in the <em>=proc</em> section. If there is no currently running process, only the <em>Current Process</em> entry will be printed. </item> <tag><em>Current Process Limited Stack Trace</em></tag> <item>This entry only shows up if there is a current process. It is very similar to <seealso marker="#proc_data"><em>=proc_stack</em></seealso>, except that only the function frames are printed (i.e. the stack variables are omited). It is also limited to only print the top and bottom part of the stack. If the stack is small (less that 512 slots) then the entire stack will be printed. If not, an entry stating <code>skipping ## slots</code> will be printed where ## is replaced by the number of slots that has been skipped.</item> <tag><em>Run Queue</em></tag> <item>Displays statistics about how many processes and ports of different priorities are scheduled on this scheduler.</item> <tag><em>** crashed **</em></tag> <item>This entry is normally not printed. It signifies that getting the rest of the information about this scheduler failed for some reason. </item> </taglist> </section> <section> <marker id="memory"></marker> <title>Memory information</title> <p>Under the tag <em>=memory</em> you will find information similar to what you can obtain on a living node with <seealso marker="erts:erlang#erlang:memory/0">erlang:memory()</seealso>.</p> </section> <section> <marker id="internal_tables"></marker> <title>Internal table information</title> <p>The tags <em>=hash_table:<table_name></em> and <em>=index_table:<table_name></em> presents internal tables. These are mostly of interest for runtime system developers.</p> </section> <section> <marker id="allocated_areas"></marker> <title>Allocated areas</title> <p>Under the tag <em>=allocated_areas</em> you will find information similar to what you can obtain on a living node with <seealso marker="erts:erlang#system_info_allocated_areas">erlang:system_info(allocated_areas)</seealso>.</p> </section> <section> <marker id="allocator"></marker> <title>Allocator</title> <p>Under the tag <em>=allocator:<A></em> you will find various information about allocator <A>. The information is similar to what you can obtain on a living node with <seealso marker="erts:erlang#system_info_allocator_tuple">erlang:system_info({allocator, <A>})</seealso>. For more information see the documentation of <seealso marker="erts:erlang#system_info_allocator_tuple">erlang:system_info({allocator, <A>})</seealso>, and the <seealso marker="erts_alloc">erts_alloc(3)</seealso> documentation.</p> </section> <section> <marker id="processes"></marker> <title>Process information</title> <p>The Erlang crashdump contains a listing of each living Erlang process in the system. The process information for one process may look like this (line numbers have been added): </p> <p>The following fields can exist for a process:</p> <taglist> <tag><em>=proc:<pid></em></tag> <item>Heading, states the process identifier</item> <tag><em>State</em></tag> <item> <p>The state of the process. This can be one of the following:</p> <list type="bulleted"> <item><em>Scheduled</em> - The process was scheduled to run but not currently running ("in the run queue").</item> <item><em>Waiting</em> - The process was waiting for something (in <c><![CDATA[receive]]></c>).</item> <item><em>Running</em> - The process was currently running. If the BIF <c><![CDATA[erlang:halt/1]]></c> was called, this was the process calling it.</item> <item><em>Exiting</em> - The process was on its way to exit.</item> <item><em>Garbing</em> - This is bad luck, the process was garbage collecting when the crash dump was written, the rest of the information for this process is limited.</item> <item><em>Suspended</em> - The process is suspended, either by the BIF <c><![CDATA[erlang:suspend_process/1]]></c> or because it is trying to write to a busy port.</item> </list> </item> <tag><em>Registered name</em></tag> <item>The registered name of the process, if any.</item> <tag><em>Spawned as</em></tag> <item>The entry point of the process, i.e., what function was referenced in the <c><![CDATA[spawn]]></c> or <c><![CDATA[spawn_link]]></c> call that started the process.</item> <tag><em>Last scheduled in for | Current call</em></tag> <item>The current function of the process. These fields will not always exist.</item> <tag><em>Run queue</em></tag> <item>The identifier of the scheduler run queue in which the process is running.</item> <tag><em>Spawned by</em></tag> <item>The parent of the process, i.e. the process which executed <c><![CDATA[spawn]]></c> or <c><![CDATA[spawn_link]]></c>.</item> <tag><em>Started</em></tag> <item>The date and time when the process was started.</item> <tag><em>Message queue length</em></tag> <item>The number of messages in the process' message queue.</item> <tag><em>Number of heap fragments</em></tag> <item>The number of allocated heap fragments.</item> <tag><em>Heap fragment data</em></tag> <item>Size of fragmented heap data. This is data either created by messages being sent to the process or by the Erlang BIFs. This amount depends on so many things that this field is utterly uninteresting.</item> <tag><em>Link list</em></tag> <item>Process id's of processes linked to this one. May also contain ports. If process monitoring is used, this field also tells in which direction the monitoring is in effect, i.e., a link being "to" a process tells you that the "current" process was monitoring the other and a link "from" a process tells you that the other process was monitoring the current one.</item> <tag><em>Reductions</em></tag> <item>The number of reductions consumed by the process.</item> <tag><em>Stack+heap</em></tag> <item>The size of the stack and heap (they share memory segment)</item> <tag><em>OldHeap</em></tag> <item>The size of the "old heap". The Erlang virtual machine uses generational garbage collection with two generations. There is one heap for new data items and one for the data that have survived two garbage collections. The assumption (which is almost always correct) is that data that survive two garbage collections can be "tenured" to a heap more seldom garbage collected, as they will live for a long period. This is a quite usual technique in virtual machines. The sum of the heaps and stack together constitute most of the process's allocated memory.</item> <tag><em>Heap unused, OldHeap unused</em></tag> <item>The amount of unused memory on each heap. This information is usually useless.</item> <tag><em>Stack</em></tag> <item>If the system uses shared heap, the fields <em>Stack+heap</em>, <em>OldHeap</em>, <em>Heap unused</em> and <em>OldHeap unused</em> do not exist. Instead this field presents the size of the process' stack.</item> <tag><em>Memory</em></tag> <item>The total memory used by this process. This includes call stack, heap and internal structures. Same as <seealso marker="erlang#process_info-2">erlang:process_info(Pid,memory)</seealso>. </item> <tag><em>Program counter</em></tag> <item>The current instruction pointer. This is only interesting for runtime system developers. The function into which the program counter points is the current function of the process.</item> <tag><em>CP</em></tag> <item>The continuation pointer, i.e. the return address for the current call. Usually useless for other than runtime system developers. This may be followed by the function into which the CP points, which is the function calling the current function.</item> <tag><em>Arity</em></tag> <item>The number of live argument registers. The argument registers, if any are live, will follow. These may contain the arguments of the function if they are not yet moved to the stack.</item> <item><em>Internal State</em></item> <item>A more detailed internal represantation of the state of this process.</item> </taglist> <p>See also the section about <seealso marker="#proc_data">process data</seealso>.</p> </section> <section> <marker id="ports"></marker> <title>Port information</title> <p>This section lists the open ports, their owners, any linked processed, and the name of their driver or external process.</p> </section> <section> <marker id="ets_tables"></marker> <title>ETS tables</title> <p>This section contains information about all the ETS tables in the system. The following fields are interesting for each table:</p> <taglist> <tag><em>=ets:<owner></em></tag> <item>Heading, states the owner of the table (a process identifier)</item> <tag><em>Table</em></tag> <item>The identifier for the table. If the table is a <c><![CDATA[named_table]]></c>, this is the name.</item> <tag><em>Name</em></tag> <item>The name of the table, regardless of whether it is a <c><![CDATA[named_table]]></c> or not.</item> <tag><em>Hash table, Buckets</em></tag> <item>This occurs if the table is a hash table, i.e. if it is not an <c><![CDATA[ordered_set]]></c>.</item> <tag><em>Hash table, Chain Length</em></tag> <item>Only applicable for hash tables. Contains statistics about the hash table, such as the max, min and avg chain length. Having a max much larger than the avg, and a std dev much larger that the expected std dev is a sign that the hashing of the terms is behaving badly for some reason.</item> <tag><em>Ordered set (AVL tree), Elements</em></tag> <item>This occurs only if the table is an <c><![CDATA[ordered_set]]></c>. (The number of elements is the same as the number of objects in the table.)</item> <tag><em>Fixed</em></tag> <item>If the table is fixed using ets:safe_fixtable or some internal mechanism.</item> <tag><em>Objects</em></tag> <item>The number of objects in the table</item> <tag><em>Words</em></tag> <item>The number of words (usually 4 bytes/word) allocated to data in the table.</item> <tag><em>Type</em></tag> <item>The type of the table, i.e. <c>set</c>, <c>bag</c>, <c>dublicate_bag</c> or <c>ordered_set</c>.</item> <tag><em>Compressed</em></tag> <item>If this table was compressed.</item> <tag><em>Protection</em></tag> <item>The protection of this table.</item> <tag><em>Write Concurrency</em></tag> <item>If write_concurrency was enabled for this table.</item> <tag><em>Read Concurrency</em></tag> <item>If read_concurrency was enabled for this table.</item> </taglist> </section> <section> <marker id="timers"></marker> <title>Timers</title> <p>This section contains information about all the timers started with the BIFs <c><![CDATA[erlang:start_timer/3]]></c> and <c><![CDATA[erlang:send_after/3]]></c>. The following fields exists for each timer:</p> <taglist> <tag><em>=timer:<owner></em></tag> <item>Heading, states the owner of the timer (a process identifier) i.e. the process to receive the message when the timer expires.</item> <tag><em>Message</em></tag> <item>The message to be sent.</item> <tag><em>Time left</em></tag> <item>Number of milliseconds left until the message would have been sent.</item> </taglist> </section> <section> <marker id="distribution_info"></marker> <title>Distribution information</title> <p>If the Erlang node was alive, i.e., set up for communicating with other nodes, this section lists the connections that were active. The following fields can exist:</p> <taglist> <tag><em>=node:<node_name></em></tag> <item>The name of the node</item> <tag><em>no_distribution</em></tag> <item>This will only occur if the node was not distributed.</item> <tag><em>=visible_node:<channel></em></tag> <item>Heading for a visible nodes, i.e. an alive node with a connection to the node that crashed. States the channel number for the node.</item> <tag><em>=hidden_node:<channel></em></tag> <item>Heading for a hidden node. A hidden node is the same as a visible node, except that it is started with the "-hidden" flag. States the channel number for the node.</item> <tag><em>=not_connected:<channel></em></tag> <item>Heading for a node which is has been connected to the crashed node earlier. References (i.e. process or port identifiers) to the not connected node existed at the time of the crash. exist. States the channel number for the node.</item> <tag><em>Name</em></tag> <item>The name of the remote node.</item> <tag><em>Controller</em></tag> <item>The port which controls the communication with the remote node.</item> <tag><em>Creation</em></tag> <item>An integer (1-3) which together with the node name identifies a specific instance of the node.</item> <tag><em>Remote monitoring: <local_proc> <remote_proc></em></tag> <item>The local process was monitoring the remote process at the time of the crash.</item> <tag><em>Remotely monitored by: <local_proc> <remote_proc></em></tag> <item>The remote process was monitoring the local process at the time of the crash.</item> <tag><em>Remote link: <local_proc> <remote_proc></em></tag> <item>A link existed between the local process and the remote process at the time of the crash.</item> </taglist> </section> <section> <marker id="loaded_modules"></marker> <title>Loaded module information</title> <p>This section contains information about all loaded modules. First, the memory usage by loaded code is summarized. There is one field for "Current code" which is code that is the current latest version of the modules. There is also a field for "Old code" which is code where there exists a newer version in the system, but the old version is not yet purged. The memory usage is in bytes.</p> <p>All loaded modules are then listed. The following fields exist:</p> <taglist> <tag><em>=mod:<module_name></em></tag> <item>Heading, and the name of the module.</item> <tag><em>Current size</em></tag> <item>Memory usage for the loaded code in bytes</item> <tag><em>Old size</em></tag> <item>Memory usage for the old code, if any.</item> <tag><em>Current attributes</em></tag> <item>Module attributes for the current code. This field is decoded when looked at by the Crashdump Viewer tool.</item> <tag><em>Old attributes</em></tag> <item>Module attributes for the old code, if any. This field is decoded when looked at by the Crashdump Viewer tool.</item> <tag><em>Current compilation info</em></tag> <item>Compilation information (options) for the current code. This field is decoded when looked at by the Crashdump Viewer tool.</item> <tag><em>Old compilation info</em></tag> <item>Compilation information (options) for the old code, if any. This field is decoded when looked at by the Crashdump Viewer tool.</item> </taglist> </section> <section> <marker id="funs"></marker> <title>Fun information</title> <p>In this section, all funs are listed. The following fields exist for each fun:</p> <taglist> <tag><em>=fun</em></tag> <item>Heading</item> <tag><em>Module</em></tag> <item>The name of the module where the fun was defined.</item> <tag><em>Uniq, Index</em></tag> <item>Identifiers</item> <tag><em>Address</em></tag> <item>The address of the fun's code.</item> <tag><em>Native_address</em></tag> <item>The address of the fun's code when HiPE is enabled.</item> <tag><em>Refc</em></tag> <item>The number of references to the fun.</item> </taglist> </section> <section> <marker id="proc_data"></marker> <title>Process Data</title> <p>For each process there will be at least one <em>=proc_stack</em> and one <em>=proc_heap</em> tag followed by the raw memory information for the stack and heap of the process.</p> <p>For each process there will also be a <em>=proc_messages</em> tag if the process' message queue is non-empty and a <em>=proc_dictionary</em> tag if the process' dictionary (the <c><![CDATA[put/2]]></c> and <c><![CDATA[get/1]]></c> thing) is non-empty.</p> <p>The raw memory information can be decoded by the Crashdump Viewer tool. You will then be able to see the stack dump, the message queue (if any) and the dictionary (if any).</p> <p>The stack dump is a dump of the Erlang process stack. Most of the live data (i.e., variables currently in use) are placed on the stack; thus this can be quite interesting. One has to "guess" what's what, but as the information is symbolic, thorough reading of this information can be very useful. As an example we can find the state variable of the Erlang primitive loader on line <c><![CDATA[(5)]]></c> in the example below:</p> <code type="none"><![CDATA[ (1) 3cac44 Return addr 0x13BF58 (<terminate process normally>) (2) y(0) ["/view/siri_r10_dev/clearcase/otp/erts/lib/kernel/ebin","/view/siri_r10_dev/ (3) clearcase/otp/erts/lib/stdlib/ebin"] (4) y(1) <0.1.0> (5) y(2) {state,[],none,#Fun<erl_prim_loader.6.7085890>,undefined,#Fun<erl_prim_loader.7.9000327>,#Fun<erl_prim_loader.8.116480692>,#Port<0.2>,infinity,#Fun<erl_prim_loader.9.10708760>} (6) y(3) infinity ]]></code> <p>When interpreting the data for a process, it is helpful to know that anonymous function objects (funs) are given a name constructed from the name of the function in which they are created, and a number (starting with 0) indicating the number of that fun within that function.</p> </section> <section> <marker id="atoms"></marker> <title>Atoms</title> <p>Now all the atoms in the system are written. This is only interesting if one suspects that dynamic generation of atoms could be a problem, otherwise this section can be ignored.</p> <p>Note that the last created atom is printed first.</p> </section> <section> <title>Disclaimer</title> <p>The format of the crash dump evolves between releases of OTP. Some information here may not apply to your version. A description as this will never be complete; it is meant as an explanation of the crash dump in general and as a help when trying to find application errors, not as a complete specification.</p> </section> </chapter>