From 84adefa331c4159d432d22840663c38f155cd4c1 Mon Sep 17 00:00:00 2001 From: Erlang/OTP Date: Fri, 20 Nov 2009 14:54:40 +0000 Subject: The R13B03 release. --- erts/doc/src/crash_dump.xml | 518 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 518 insertions(+) create mode 100644 erts/doc/src/crash_dump.xml (limited to 'erts/doc/src/crash_dump.xml') diff --git a/erts/doc/src/crash_dump.xml b/erts/doc/src/crash_dump.xml new file mode 100644 index 0000000000..5182929358 --- /dev/null +++ b/erts/doc/src/crash_dump.xml @@ -0,0 +1,518 @@ + + + + +
+ + 19992009 + Ericsson AB. All Rights Reserved. + + + The contents of this file are subject to the Erlang Public License, + Version 1.1, (the "License"); you may not use this file except in + compliance with the License. You should have received a copy of the + Erlang Public License along with this software. If not, it can be + retrieved online at http://www.erlang.org/. + + Software distributed under the License is distributed on an "AS IS" + basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See + the License for the specific language governing rights and limitations + under the License. + + + + How to interpret the Erlang crash dumps + Patrik Nyblom + + + + + 1999-11-11 + PA1 + crash_dump.xml +
+

This document describes the file generated + upon abnormal exit of the Erlang runtime system.

+

Important: For OTP release R9C the Erlang crash dump has + had a major facelift. This means that the information in this + document will not be directly applicable for older dumps. However, + if you use the Crashdump Viewer tool on older dumps, the crash + dumps are translated into a format similar to this.

+

The system will write the crash dump in the current directory of + the emulator or in the file pointed out by the environment variable + (whatever that means on the current operating system) + ERL_CRASH_DUMP. For a crash dump to be written, there has to be a + writable file system mounted.

+

Crash dumps are written mainly for one of two reasons: either the + builtin function is called explicitly with a + string argument from running Erlang code, or else the runtime + system has detected an error that cannot be handled. The most + usual reason that the system can't handle the error is that the + cause is external limitations, such as running out of memory. A + crash dump due to an internal error may be caused by the system + reaching limits in the emulator itself (like the number of atoms + in the system, or too many simultaneous ets tables). Usually the + emulator or the operating system can be reconfigured to avoid the + crash, which is why interpreting the crash dump correctly is + important.

+

The erlang crash dump is a readable text file, but it might not be + very easy to read. Using the Crashdump Viewer tool in the + application will simplify the task. This is an + HTML based tool for browsing Erlang crash dumps.

+ +
+ + General information +

The first part of the dump shows the creation time for the dump, + a slogan indicating the reason for the dump, the system version, + of the node from which the dump originates, the compile time of + the emulator running the originating node and the number of + atoms in the atom table. +

+ +
+ Reasons for crash dumps (slogan) +

The reason for the dump is noted in the beginning of the file + as Slogan: <reason> (the word "slogan" has historical + roots). If the system is halted by the BIF + , the slogan is the string parameter + passed to the BIF, otherwise it is a description generated by + the emulator or the (Erlang) kernel. Normally the message + should be enough to understand the problem, but nevertheless + some messages are described here. Note however that the + suggested reasons for the crash are only suggestions. The exact reasons for the errors may vary + depending on the local applications and the underlying + operating system.

+ + "<A>: Cannot allocate <N> + bytes of memory (of type "<T>")." - The system + has run out of memory. <A> is the allocator that failed + to allocate memory, <N> is the number of bytes that + <A> tried to allocate, and <T> is the memory block + type that the memory was needed for. The most common case is + that a process stores huge amounts of data. In this case + <T> is most often , , + , or . For more information on + allocators see + erts_alloc(3). + "<A>: Cannot reallocate <N> + bytes of memory\011(of type "<T>")." - Same as + above with the exception that memory was being reallocated + instead of being allocated when the system ran out of memory. + "Unexpected op code N" - Error in compiled + code, file damaged or error in the compiler. + "Module Name undefined" "Function + Name undefined" "No function + Name:Name/1" "No function + Name:start/2" - The kernel/stdlib applications are + damaged or the start script is damaged. + "Driver_select called with too large file descriptor + " - The number of file descriptors for sockets + exceed 1024 (Unix only). The limit on file-descriptors in + some Unix flavors can be set to over 1024, but only 1024 + sockets/pipes can be used simultaneously by Erlang (due to + limitations in the Unix call). The number of + open regular files is not affected by this. + "Received SIGUSR1" - The SIGUSR1 signal was sent to the + Erlang machine (Unix only). + "Kernel pid terminated (Who) + (Exit-reason)" - The kernel supervisor has detected + a failure, usually that the + has shut down ( = , + = ). The application controller + may have shut down for a number of reasons, the most usual + being that the node name of the distributed Erlang node is + already in use. A complete supervisor tree "crash" (i.e., + the top supervisors have exited) will give about the same + result. This message comes from the Erlang code and not from + the virtual machine itself. It is always due to some kind of + failure in an application, either within OTP or a + "user-written" one. Looking at the error log for your + application is probably the first step to take. + "Init terminating in do_boot ()" - The primitive Erlang boot + sequence was terminated, most probably because the boot + script has errors or cannot be read. This is usually a + configuration error - the system may have been started with + a faulty parameter or with a boot script from + the wrong version of OTP. + "Could not start kernel pid (Who) ()" - One of the + kernel processes could not start. This is probably due to + faulty arguments (like errors in a argument) + or faulty configuration files. Check that all files are in + their correct location and that the configuration files (if + any) are not damaged. Usually there are also messages + written to the controlling terminal and/or the error log + explaining what's wrong. + +

Other errors than the ones mentioned above may occur, as the + BIF may generate any message. If the + message is not generated by the BIF and does not occur in the + list above, it may be due to an error in the emulator. There + may however be unusual messages that I haven't mentioned, that + still are connected to an application failure. There is a lot + more information available, so more thorough reading of the + crash dump may reveal the crash reason. The size of processes, + the number of ets tables and the Erlang data on each process + stack can be useful for tracking down the problem.

+
+ +
+ Number of atoms +

The number of atoms in the system at the time of the crash is + shown as Atoms: <number>. Some ten thousands atoms is + perfectly normal, but more could indicate that the BIF + is used to dynamically generate a + lot of different atoms, which is never a good idea.

+
+
+ +
+ + Memory information +

Under the tag =memory you will find information similar + to what you can obtain on a living node with + erlang:memory().

+
+ +
+ + Internal table information +

The tags =hash_table:<table_name> and + =index_table:<table_name> presents internal + tables. These are mostly of interest for runtime system + developers.

+
+ +
+ + Allocated areas +

Under the tag =allocated_areas you will find information + similar to what you can obtain on a living node with + erlang:system_info(allocated_areas).

+
+ +
+ + Allocator +

Under the tag =allocator:<A> you will find + various information about allocator <A>. The information + is similar to what you can obtain on a living node with + erlang:system_info({allocator, <A>}). + For more information see the documentation of + erlang:system_info({allocator, <A>}), + and the + erts_alloc(3) + documentation.

+
+ +
+ + Process information +

The Erlang crashdump contains a listing of each living Erlang + process in the system. The process information for one process + may look like this (line numbers have been added): +

+

The following fields can exist for a process:

+ + =proc:<pid> + Heading, states the process identifier + State + +

The state of the process. This can be one of the following:

+ + Scheduled - The process was scheduled to run + but not currently running ("in the run queue"). + Waiting - The process was waiting for + something (in ). + Running - The process was currently + running. If the BIF was called, this was + the process calling it. + Exiting - The process was on its way to + exit. + Garbing - This is bad luck, the process was + garbage collecting when the crash dump was written, the rest + of the information for this process is limited. + Suspended - The process is suspended, either + by the BIF or because it is + trying to write to a busy port. + +
+ Registered name + The registered name of the process, if any. + Spawned as + The entry point of the process, i.e., what function was + referenced in the or call that + started the process. + Last scheduled in for | Current call + The current function of the process. These fields will not + always exist. + Spawned by + The parent of the process, i.e. the process which executed + or . + Started + The date and time when the process was started. + Message queue length + The number of messages in the process' message queue. + Number of heap fragments + The number of allocated heap fragments. + Heap fragment data + Size of fragmented heap data. This is data either created by + messages being sent to the process or by the Erlang BIFs. This + amount depends on so many things that this field is utterly + uninteresting. + Link list + Process id's of processes linked to this one. May also contain + ports. If process monitoring is used, this field also tells in + which direction the monitoring is in effect, i.e., a link + being "to" a process tells you that the "current" process was + monitoring the other and a link "from" a process tells you + that the other process was monitoring the current one. + Reductions + The number of reductions consumed by the process. + Stack+heap + The size of the stack and heap (they share memory segment) + OldHeap + The size of the "old heap". The Erlang virtual machine uses + generational garbage collection with two generations. There is + one heap for new data items and one for the data that have + survived two garbage collections. The assumption (which is + almost always correct) is that data that survive two garbage + collections can be "tenured" to a heap more seldom garbage + collected, as they will live for a long period. This is a + quite usual technique in virtual machines. The sum of the + heaps and stack together constitute most of the process's + allocated memory. + Heap unused, OldHeap unused + The amount of unused memory on each heap. This information is + usually useless. + Stack + If the system uses shared heap, the fields + Stack+heap, OldHeap, Heap unused + and OldHeap unused do not exist. Instead this field + presents the size of the process' stack. + Program counter + The current instruction pointer. This is only interesting for + runtime system developers. The function into which the program + counter points is the current function of the process. + CP + The continuation pointer, i.e. the return address for the + current call. Usually useless for other than runtime system + developers. This may be followed by the function into which + the CP points, which is the function calling the current + function. + Arity + The number of live argument registers. The argument registers, + if any are live, will follow. These may contain the arguments + of the function if they are not yet moved to the stack. +
+

See also the section about process data.

+
+ +
+ + Port information +

This section lists the open ports, their owners, any linked + processed, and the name of their driver or external process.

+
+ +
+ + ETS tables +

This section contains information about all the ETS tables in + the system. The following fields are interesting for each table:

+ + =ets:<owner> + Heading, states the owner of the table (a process identifier) + Table + The identifier for the table. If the table is a + , this is the name. + Name + The name of the table, regardless of whether it is a + or not. + Buckets + This occurs if the table is a hash table, i.e. if it is not an + . + Ordered set (AVL tree), Elements + This occurs only if the table is an . (The + number of elements is the same as the number of objects in the + table.) + Objects + The number of objects in the table + Words + The number of words (usually 4 bytes/word) allocated to data + in the table. + +
+ +
+ + Timers +

This section contains information about all the timers started + with the BIFs and + . The following fields exists for each + timer:

+ + =timer:<owner> + Heading, states the owner of the timer (a process identifier) + i.e. the process to receive the message when the timer + expires. + Message + The message to be sent. + Time left + Number of milliseconds left until the message would have been + sent. + +
+ +
+ + Distribution information +

If the Erlang node was alive, i.e., set up for communicating + with other nodes, this section lists the connections that were + active. The following fields can exist:

+ + =node:<node_name> + The name of the node + no_distribution + This will only occur if the node was not distributed. + =visible_node:<channel> + Heading for a visible nodes, i.e. an alive node with a + connection to the node that crashed. States the channel number + for the node. + =hidden_node:<channel> + Heading for a hidden node. A hidden node is the same as a + visible node, except that it is started with the "-hidden" + flag. States the channel number for the node. + =not_connected:<channel> + Heading for a node which is has been connected to the crashed + node earlier. References (i.e. process or port identifiers) + to the not connected node existed at the time of the crash. + exist. States the channel number for the node. + Name + The name of the remote node. + Controller + The port which controls the communication with the remote node. + Creation + An integer (1-3) which together with the node name identifies + a specific instance of the node. + Remote monitoring: <local_proc> <remote_proc> + The local process was monitoring the remote process at the + time of the crash. + Remotely monitored by: <local_proc> <remote_proc> + The remote process was monitoring the local process at the + time of the crash. + Remote link: <local_proc> <remote_proc> + A link existed between the local process and the remote + process at the time of the crash. + +
+ +
+ + Loaded module information +

This section contains information about all loaded modules. + First, the memory usage by loaded code is summarized. There is + one field for "Current code" which is code that is the current + latest version of the modules. There is also a field for "Old + code" which is code where there exists a newer version in the + system, but the old version is not yet purged. The memory usage + is in bytes.

+

All loaded modules are then listed. The following fields exist:

+ + =mod:<module_name> + Heading, and the name of the module. + Current size + Memory usage for the loaded code in bytes + Old size + Memory usage for the old code, if any. + Current attributes + Module attributes for the current code. This field is decoded + when looked at by the Crashdump Viewer tool. + Old attributes + Module attributes for the old code, if any. This field is + decoded when looked at by the Crashdump Viewer tool. + Current compilation info + Compilation information (options) for the current code. This + field is decoded when looked at by the Crashdump Viewer tool. + Old compilation info + Compilation information (options) for the old code, if + any. This field is decoded when looked at by the Crashdump + Viewer tool. + +
+ +
+ + Fun information +

In this section, all funs are listed. The following fields exist + for each fun:

+ + =fun + Heading + Module + The name of the module where the fun was defined. + Uniq, Index + Identifiers + Address + The address of the fun's code. + Native_address + The address of the fun's code when HiPE is enabled. + Refc + The number of references to the fun. + +
+ +
+ + Process Data +

For each process there will be at least one =proc_stack + and one =proc_heap tag followed by the raw memory + information for the stack and heap of the process.

+

For each process there will also be a =proc_messages + tag if the process' message queue is non-empty and a + =proc_dictionary tag if the process' dictionary (the + and thing) is non-empty.

+

The raw memory information can be decoded by the Crashdump + Viewer tool. You will then be able to see the stack dump, the + message queue (if any) and the dictionary (if any).

+

The stack dump is a dump of the Erlang process stack. Most of + the live data (i.e., variables currently in use) are placed on + the stack; thus this can be quite interesting. One has to + "guess" what's what, but as the information is symbolic, + thorough reading of this information can be very useful. As an + example we can find the state variable of the Erlang primitive + loader on line in the example below:

+ ) +(2) y(0) ["/view/siri_r10_dev/clearcase/otp/erts/lib/kernel/ebin","/view/siri_r10_dev/ +(3) clearcase/otp/erts/lib/stdlib/ebin"] +(4) y(1) <0.1.0> +(5) y(2) {state,[],none,#Fun,undefined,#Fun,#Fun,#Port<0.2>,infinity,#Fun} +(6) y(3) infinity ]]> +

When interpreting the data for a process, it is helpful to know + that anonymous function objects (funs) are given a name + constructed from the name of the function in which they are + created, and a number (starting with 0) indicating the number of + that fun within that function.

+
+ +
+ + Atoms +

Now all the atoms in the system are written. This is only + interesting if one suspects that dynamic generation of atoms could + be a problem, otherwise this section can be ignored.

+

Note that the last created atom is printed first.

+
+ +
+ Disclaimer +

The format of the crash dump evolves between releases of + OTP. Some information here may not apply to your + version. A description as this will never be complete; it is meant as + an explanation of the crash dump in general and as a help + when trying to find application errors, not as a complete + specification.

+
+
+ -- cgit v1.2.3