diff options
Diffstat (limited to 'HOWTO/DTRACE.md')
-rw-r--r-- | HOWTO/DTRACE.md | 394 |
1 files changed, 394 insertions, 0 deletions
diff --git a/HOWTO/DTRACE.md b/HOWTO/DTRACE.md new file mode 100644 index 0000000000..1d495b0eaf --- /dev/null +++ b/HOWTO/DTRACE.md @@ -0,0 +1,394 @@ +DTrace and Erlang/OTP +===================== + +History +------- + +The first implementation of DTrace probes for the Erlang virtual +machine was presented at the [2008 Erlang User Conference] [1]. That +work, based on the Erlang/OTP R12 release, was discontinued due to +what appears to be miscommunication with the original developers. + +Several users have created Erlang port drivers, linked-in drivers, or +NIFs that allow Erlang code to try to activate a probe, +e.g. `foo_module:dtrace_probe("message goes here!")`. + +Goals +----- + +1. Annotate as much of the Erlang VM as is practical. + * The initial goal is to trace file I/O operations. +2. Support all platforms that implement DTrace: OS X, Solaris, + and (I hope) FreeBSD and NetBSD. +3. To the extent that it's practical, support SystemTap on Linux + via DTrace provider compatibility. +4. Allow Erlang code to supply annotations. + +Supported platforms +------------------- + +* OS X 10.6.x / Snow Leopard. It should also work for 10.7 / Lion, + but I haven't personally tested it. +* Solaris 10. I have done limited testing on Solaris 11 and + OpenIndiana release 151a, and both appear to work. +* FreeBSD 9.0, though please see the "FreeBSD 9.0 Release Notes" + section below! +* Linux via SystemTap compatibility. Please see + [$ERL_TOP/HOWTO/SYSTEMTAP.md][] for more details. + +Just add the `--with-dynamic-trace=dtrace` option to your command when you +run the `configure` script. If you are using systemtap, the configure option +is `--with-dynamic-trace=systemtap` + +Status +------ + +As of R15B01, the dynamic trace code is included in the main OTP distribution, +although it's considered experimental. The main development of the dtrace code +still happens outside of Ericsson, but there is no need to fetch a patched +version of OTP to get the basic funtionality. + +Implementation summary +---------------------- + +So far, most effort has been focused on the `efile_drv.c` code, +which implements most file I/O on behalf of the Erlang virtual +machine. This driver also presents a big challenge: its use of an I/O +worker pool (enabled by using the `erl +A 8` flag, for example) makes +it much more difficult to trace I/O activity because each of the +following may be executed in a different Pthread: + +* I/O initiation (Erlang code) +* I/O proxy process handling, e.g. read/write when file is not opened + in `raw` mode, operations executed by the code & file server processes. + (Erlang code) +* `efile_drv` command setup (C code) +* `efile_drv` command execution (C code) +* `efile_drv` status return (C code) + +**TODO: keep this description up-to-date.** + +Example output from `lib/dtrace/examples/efile_drv.d` while executing +`file:rename("old-name", "new-name")`: + + efile_drv enter tag={3,84} user tag some-user-tag | RENAME (12) | args: old-name new-name , 0 0 (port #Port<0.59>) + async I/O worker tag={3,83} | RENAME (12) | efile_drv-int_entry + async I/O worker tag={3,83} | RENAME (12) | efile_drv-int_return + efile_drv return tag={3,83} user tag | RENAME (12) | errno 2 + +... where the following key can help decipher the output: + +* `{3,83}` is the Erlang scheduler thread number (3) and operation + counter number (83) assigned to this I/O operation. Together, + these two numbers form a unique ID for the I/O operation. +* `12` is the command number for the rename operation. See the + definition for `FILE_RENAME` in the source code file `efile_drv.c` + or the `BEGIN` section of the D script `lib/dtrace/examples/efile_drv.d`. +* `old-name` and `new-name` are the two string arguments for the + source and destination of the `rename(2)` system call. + The two integer arguments are unused; the simple formatting code + prints the arguments anyway, 0 and 0. +* The worker pool code was called on behalf of Erlang port `#Port<0.59>`. +* The system call failed with a POSIX errno value of 2: `ENOENT`, + because the path `old-name` does not exist. +* The `efile_drv-int_entry` and `efile_drv_int_return` probes are + provided in case the user is + interested in measuring only the latency of code executed by + `efile_drv` asynchronous functions by I/O worker pool threads + and the OS system call that they encapsulate. + +So, where does the `some-user-tag` string come from? + +At the moment, the user tag comes from code like the following: + + put(dtrace_utag, "some-user-tag"), + file:rename("old-name", "new-name"). + +This method of tagging I/O at the Erlang level is subject to change. + +Example DTrace probe specification +---------------------------------- + + /** + * Fired when a message is sent from one local process to another. + * + * NOTE: The 'size' parameter is in machine-dependent words and + * that the actual size of any binary terms in the message + * are not included. + * + * @param sender the PID (string form) of the sender + * @param receiver the PID (string form) of the receiver + * @param size the size of the message being delivered (words) + * @param token_label for the sender's sequential trace token + * @param token_previous count for the sender's sequential trace token + * @param token_current count for the sender's sequential trace token + */ + probe message__send(char *sender, char *receiver, uint32_t size, + int token_label, int token_previous, int token_current); + + /** + * Fired when a message is sent from a local process to a remote process. + * + * NOTE: The 'size' parameter is in machine-dependent words and + * that the actual size of any binary terms in the message + * are not included. + * + * @param sender the PID (string form) of the sender + * @param node_name the Erlang node name (string form) of the receiver + * @param receiver the PID/name (string form) of the receiver + * @param size the size of the message being delivered (words) + * @param token_label for the sender's sequential trace token + * @param token_previous count for the sender's sequential trace token + * @param token_current count for the sender's sequential trace token + */ + probe message__send__remote(char *sender, char *node_name, char *receiver, + uint32_t size, + int token_label, int token_previous, int token_current); + + /** + * Fired when a message is queued to a local process. This probe + * will not fire if the sender's pid == receiver's pid. + * + * NOTE: The 'size' parameter is in machine-dependent words and + * that the actual size of any binary terms in the message + * are not included. + * + * @param receiver the PID (string form) of the receiver + * @param size the size of the message being delivered (words) + * @param queue_len length of the queue of the receiving process + * @param token_label for the sender's sequential trace token + * @param token_previous count for the sender's sequential trace token + * @param token_current count for the sender's sequential trace token + */ + probe message__queued(char *receiver, uint32_t size, uint32_t queue_len, + int token_label, int token_previous, int token_current); + + /** + * Fired when a message is 'receive'd by a local process and removed + * from its mailbox. + * + * NOTE: The 'size' parameter is in machine-dependent words and + * that the actual size of any binary terms in the message + * are not included. + * + * @param receiver the PID (string form) of the receiver + * @param size the size of the message being delivered (words) + * @param queue_len length of the queue of the receiving process + * @param token_label for the sender's sequential trace token + * @param token_previous count for the sender's sequential trace token + * @param token_current count for the sender's sequential trace token + */ + probe message__receive(char *receiver, uint32_t size, uint32_t queue_len, + int token_label, int token_previous, int token_current); + + /* ... */ + + /* Async driver pool */ + + /** + * Show the post-add length of the async driver thread pool member's queue. + * + * NOTE: The port name is not available: additional lock(s) must + * be acquired in order to get the port name safely in an SMP + * environment. The same is true for the aio__pool_get probe. + * + * @param port the Port (string form) + * @param new queue length + */ + probe aio_pool__add(char *, int); + + /** + * Show the post-get length of the async driver thread pool member's queue. + * + * @param port the Port (string form) + * @param new queue length + */ + probe aio_pool__get(char *, int); + + /* Probes for efile_drv.c */ + + /** + * Entry into the efile_drv.c file I/O driver + * + * For a list of command numbers used by this driver, see the section + * "Guide to probe arguments" in ../../../README.md. That section + * also contains explanation of the various integer and string + * arguments that may be present when any particular probe fires. + * + * TODO: Adding the port string, args[10], is a pain. Making that + * port string available to all the other efile_drv.c probes + * will be more pain. Is the pain worth it? If yes, then + * add them everywhere else and grit our teeth. If no, then + * rip it out. + * + * @param thread-id number of the scheduler Pthread arg0 + * @param tag number: {thread-id, tag} uniquely names a driver operation + * @param user-tag string arg2 + * @param command number arg3 + * @param string argument 1 arg4 + * @param string argument 2 arg5 + * @param integer argument 1 arg6 + * @param integer argument 2 arg7 + * @param integer argument 3 arg8 + * @param integer argument 4 arg9 + * @param port the port ID of the busy port args[10] + */ + probe efile_drv__entry(int, int, char *, int, char *, char *, + int64_t, int64_t, int64_t, int64_t, char *); + + /** + * Entry into the driver's internal work function. Computation here + * is performed by a async worker pool Pthread. + * + * @param thread-id number + * @param tag number + * @param command number + */ + probe efile_drv__int_entry(int, int, int); + + /** + * Return from the driver's internal work function. + * + * @param thread-id number + * @param tag number + * @param command number + */ + probe efile_drv__int_return(int, int, int); + + /** + * Return from the efile_drv.c file I/O driver + * + * @param thread-id number arg0 + * @param tag number arg1 + * @param user-tag string arg2 + * @param command number arg3 + * @param Success? 1 is success, 0 is failure arg4 + * @param If failure, the errno of the error. arg5 + */ + probe efile_drv__return(int, int, char *, int, int, int); + +Guide to efile_drv.c probe arguments +------------------------------------ + + /* Driver op code: used by efile_drv-entry arg3 */ + /* used by efile_drv-int_entry arg3 */ + /* used by efile_drv-int_return arg3 */ + /* used by efile_drv-return arg3 */ + + #define FILE_OPEN 1 (probe arg3) + probe arg6 = C driver dt_i1 = flags; + probe arg4 = C driver dt_s1 = path; + + #define FILE_READ 2 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + probe arg8 = C driver dt_i3 = size; + + #define FILE_LSEEK 3 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = offset; + probe arg8 = C driver dt_i3 = origin; + + #define FILE_WRITE 4 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + probe arg8 = C driver dt_i3 = size; + + #define FILE_FSTAT 5 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + + #define FILE_PWD 6 (probe arg3) + none + + #define FILE_READDIR 7 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_CHDIR 8 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_FSYNC 9 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + + #define FILE_MKDIR 10 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_DELETE 11 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_RENAME 12 (probe arg3) + probe arg4 = C driver dt_s1 = old_name; + probe arg5 = C driver dt_s2 = new_name; + + #define FILE_RMDIR 13 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_TRUNCATE 14 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + + #define FILE_READ_FILE 15 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_WRITE_INFO 16 (probe arg3) + probe arg6 = C driver dt_i1 = mode; + probe arg7 = C driver dt_i2 = uid; + probe arg8 = C driver dt_i3 = gid; + + #define FILE_LSTAT 19 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_READLINK 20 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_LINK 21 (probe arg3) + probe arg4 = C driver dt_s1 = existing_path; + probe arg5 = C driver dt_s2 = new_path; + + #define FILE_SYMLINK 22 (probe arg3) + probe arg4 = C driver dt_s1 = existing_path; + probe arg5 = C driver dt_s2 = new_path; + + #define FILE_CLOSE 23 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + + #define FILE_PWRITEV 24 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + probe arg8 = C driver dt_i3 = size; + + #define FILE_PREADV 25 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + probe arg8 = C driver dt_i3 = size; + + #define FILE_SETOPT 26 (probe arg3) + probe arg6 = C driver dt_i1 = opt_name; + probe arg7 = C driver dt_i2 = opt_specific_value; + + #define FILE_IPREAD 27 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + probe arg8 = C driver dt_i3 = offsets[0]; + probe arg9 = C driver dt_i4 = size; + + #define FILE_ALTNAME 28 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_READ_LINE 29 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + probe arg8 = C driver dt_i3 = read_offset; + probe arg9 = C driver dt_i4 = read_ahead; + + #define FILE_FDATASYNC 30 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + + #define FILE_FADVISE 31 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = offset; + probe arg8 = C driver dt_i3 = length; + probe arg9 = C driver dt_i4 = advise_type; + + [1]: http://www.erlang.org/euc/08/ + [$ERL_TOP/HOWTO/SYSTEMTAP.md]: SYSTEMTAP.md |