diff options
Diffstat (limited to 'README.dtrace.md')
-rw-r--r-- | README.dtrace.md | 422 |
1 files changed, 422 insertions, 0 deletions
diff --git a/README.dtrace.md b/README.dtrace.md new file mode 100644 index 0000000000..71023dfe69 --- /dev/null +++ b/README.dtrace.md @@ -0,0 +1,422 @@ +DTrace and Erlang/OTP +===================== + +History +------- + +The first implementation of DTrace probes for the Erlang virtual +machine was presented at the [2008 Erlang User Conference] [4]. That +work, based on the Erlang/OTP R12 release, was discontinued due to +what appears to be miscommunication with the original developers. + +Several users have created Erlang port drivers, linked-in drivers, or +NIFs that allow Erlang code to try to activate a probe, +e.g. `foo_module:dtrace_probe("message goes here!")`. + +Goals +----- + +1. Annotate as much of the Erlang VM as is practical. + * The initial goal is to trace file I/O operations. +2. Support all platforms that implement DTrace: OS X, Solaris, + and (I hope) FreeBSD and NetBSD. +3. To the extent that it's practical, support SystemTap on Linux + via DTrace provider compatibility. +4. Allow Erlang code to supply annotations. + +Supported platforms +------------------- + +The autoconf procedure is supported, I believe, for OS X/Snow Leopard +and OpenSolaris/64-bit. Just add the `--enable-dtrace` option your +command to run the `configure` script. + +The code has been only very lightly tested on OS X. It ought to +compile on a Solaris 10 or OpenSolaris machine, but no promises yet. + +The autoconf stuff is ugly right now. It could use some cleaning up. +For example: + +* After editing the `erlang_dtrace.d` file, you need to re-run the +* top-level "configure" script in order to update `erlang_dtrace.h`. +* `make clean` will remove `erlang_dtrace.h`. A build will fail + unless the top-level "configure" script is re-run to re-create that + file. +* The `erlang_dtrace.h` file's location should probably be moved to an + OTP platform-specific build dir, for example, + `path/to/somewhere/i386-apple-darwin10.8.0` +* There are probably some other build by-products that are also being + put into the "wrong" directory, for example, `erlang_dtrace.o` for + Solaris platforms. + +Contributions +------------- + +Code contributions are welcome! This is a side project for me (SLF), +so things would go faster if other people are willing to pitch in. +Please use the GitHub pull request mechanism or send me an email +message. + +To build from scratch, use this recipe. If you're an experienced Git +user and wish to add my repository as a remote repository, be my +guest. Just resume the recipe at command #4. + + % git clone git://github.com/slfritchie/otp.git + % cd otp + % git checkout -b dtrace-experiment origin/dtrace-experiment + % env ERL_TOP=`pwd` ./otp_build autoconf + % env ERL_TOP=`pwd` ./configure --enable-dtrace + whatever args you need + % env ERL_TOP=`pwd` make + +Then `make install` and then start an Erlang shell via +`/path/to/installed/bin/erl +A 8`. The Erlang shell's banner should +include `[dtrace]`. + +Try using this (ugly) DTrace command to watch file I/O probes in use +(tested on OS X only, sorry): + + dtrace -Z -n 'erlang*:::efile_drv-entry {printf("%d %d %s | %d | %s %s , %d %d %d", arg0, arg1, arg2 == NULL ? "" : copyinstr(arg2), arg3, arg4 == NULL ? "" : copyinstr(arg4), arg5 == NULL ? "" : copyinstr(arg5), arg6, arg7, arg8)} erlang*:::efile_drv-int* {printf("%d %d %d | %d", arg0, arg1, arg2, arg3);} erlang*:::efile_drv-return {printf("%d %d %s | %d | %d %d %d", arg0, arg1, arg2 == NULL ? "" : copyinstr(arg2), arg3, arg4, arg5, arg6 ) ; }' + +Implementation summary +---------------------- + +So far, most effort has been focused on the `efile_drv.erl` code, +which implements most file I/O on behalf of the Erlang virtual +machine. This driver also presents a big challenge: its use of an I/O +worker pool (enabled by using the `erl +A 8` flag, for example) makes +it much more difficult to trace I/O activity because each of the +following may be executed in a different Pthread: + +* I/O initiation (Erlang code) +* I/O proxy process handling, e.g. read/write when file is not opened + in `raw` mode, operations executed by the code & file server processes. + (Erlang code) +* `efile_drv` command setup (C code) +* `efile_drv` command execution (C code) +* `efile_drv` status return (C code) + +**TODO: keep this description up-to-date.** + +Example output from `lib/dtrace/examples/efile_drv.d` while executing +`file:rename("old-name", "new-name")`: + + efile_drv enter tag={3,84} user tag some-user-tag | RENAME (12) | args: old-name new-name , 0 0 (port #Port<0.59>) + async I/O worker tag={3,83} | RENAME (12) | efile_drv-int_entry + async I/O worker tag={3,83} | RENAME (12) | efile_drv-int_return + efile_drv return tag={3,83} user tag | RENAME (12) | errno 2 + +... where the following key can help decipher the output: + +* `{3,83}` is the Erlang scheduler thread number (3) and operation + counter number (83) assigned to this I/O operation. Together, + these two numbers form a unique ID for the I/O operation. +* `12` is the command number for the rename operation. See the + definition for `FILE_RENAME` in the source code file `efile_drv.c` + or the `BEGIN` section of the D script `lib/dtrace/examples/efile_drv.d`. +* `old-name` and `new-name` are the two string arguments for the + source and destination of the `rename(2)` system call. + The two integer arguments are unused; the simple formatting code + prints the arguments anyway, 0 and 0. +* The worker pool code was called on behalf of Erlang port `#Port<0.59>`. +* The system call failed with a POSIX errno value of 2: `ENOENT`, + because the path `old-name` does not exist. +* The `efile_drv-int_entry` and `efile_drv_int_return` probes are + provided in case the user is + interested in measuring only the latency of code executed by + `efile_drv` asynchronous functions by I/O worker pool threads + and the OS system call that they encapsulate. + +So, where does the `some-user-tag` string come from? + +At the moment, the user tag comes from code like the following: + + put(dtrace_utag, "some-user-tag"), + file:rename("old-name", "new-name"). + +This method of tagging I/O at the Erlang level is subject to change. + +Example DTrace probe specification +---------------------------------- + + /** + * Fired when a message is sent from one local process to another. + * + * NOTE: The 'size' parameter is in machine-dependent words and + * that the actual size of any binary terms in the message + * are not included. + * + * @param sender the PID (string form) of the sender + * @param receiver the PID (string form) of the receiver + * @param size the size of the message being delivered (words) + * @param token_label for the sender's sequential trace token + * @param token_previous count for the sender's sequential trace token + * @param token_current count for the sender's sequential trace token + */ + probe message__send(char *sender, char *receiver, uint32_t size, + int token_label, int token_previous, int token_current); + + /** + * Fired when a message is sent from a local process to a remote process. + * + * NOTE: The 'size' parameter is in machine-dependent words and + * that the actual size of any binary terms in the message + * are not included. + * + * @param sender the PID (string form) of the sender + * @param node_name the Erlang node name (string form) of the receiver + * @param receiver the PID/name (string form) of the receiver + * @param size the size of the message being delivered (words) + * @param token_label for the sender's sequential trace token + * @param token_previous count for the sender's sequential trace token + * @param token_current count for the sender's sequential trace token + */ + probe message__send__remote(char *sender, char *node_name, char *receiver, + uint32_t size, + int token_label, int token_previous, int token_current); + + /** + * Fired when a message is queued to a local process. This probe + * will not fire if the sender's pid == receiver's pid. + * + * NOTE: The 'size' parameter is in machine-dependent words and + * that the actual size of any binary terms in the message + * are not included. + * + * @param receiver the PID (string form) of the receiver + * @param size the size of the message being delivered (words) + * @param queue_len length of the queue of the receiving process + * @param token_label for the sender's sequential trace token + * @param token_previous count for the sender's sequential trace token + * @param token_current count for the sender's sequential trace token + */ + probe message__queued(char *receiver, uint32_t size, uint32_t queue_len, + int token_label, int token_previous, int token_current); + + /** + * Fired when a message is 'receive'd by a local process and removed + * from its mailbox. + * + * NOTE: The 'size' parameter is in machine-dependent words and + * that the actual size of any binary terms in the message + * are not included. + * + * @param receiver the PID (string form) of the receiver + * @param size the size of the message being delivered (words) + * @param queue_len length of the queue of the receiving process + * @param token_label for the sender's sequential trace token + * @param token_previous count for the sender's sequential trace token + * @param token_current count for the sender's sequential trace token + */ + probe message__receive(char *receiver, uint32_t size, uint32_t queue_len, + int token_label, int token_previous, int token_current); + + /* ... */ + + /* Async driver pool */ + + /** + * Show the post-add length of the async driver thread pool member's queue. + * + * NOTE: The port name is not available: additional lock(s) must + * be acquired in order to get the port name safely in an SMP + * environment. The same is true for the aio__pool_get probe. + * + * @param port the Port (string form) + * @param new queue length + */ + probe aio_pool__add(char *, int); + + /** + * Show the post-get length of the async driver thread pool member's queue. + * + * @param port the Port (string form) + * @param new queue length + */ + probe aio_pool__get(char *, int); + + /* Probes for efile_drv.c */ + + /** + * Entry into the efile_drv.c file I/O driver + * + * For a list of command numbers used by this driver, see the section + * "Guide to probe arguments" in ../../../README.md. That section + * also contains explanation of the various integer and string + * arguments that may be present when any particular probe fires. + * + * TODO: Adding the port string, args[10], is a pain. Making that + * port string available to all the other efile_drv.c probes + * will be more pain. Is the pain worth it? If yes, then + * add them everywhere else and grit our teeth. If no, then + * rip it out. + * + * @param thread-id number of the scheduler Pthread arg0 + * @param tag number: {thread-id, tag} uniquely names a driver operation + * @param user-tag string arg2 + * @param command number arg3 + * @param string argument 1 arg4 + * @param string argument 2 arg5 + * @param integer argument 1 arg6 + * @param integer argument 2 arg7 + * @param integer argument 3 arg8 + * @param integer argument 4 arg9 + * @param port the port ID of the busy port args[10] + */ + probe efile_drv__entry(int, int, char *, int, char *, char *, + int64_t, int64_t, int64_t, int64_t, char *); + + /** + * Entry into the driver's internal work function. Computation here + * is performed by a async worker pool Pthread. + * + * @param thread-id number + * @param tag number + * @param command number + */ + probe efile_drv__int_entry(int, int, int); + + /** + * Return from the driver's internal work function. + * + * @param thread-id number + * @param tag number + * @param command number + */ + probe efile_drv__int_return(int, int, int); + + /** + * Return from the efile_drv.c file I/O driver + * + * @param thread-id number arg0 + * @param tag number arg1 + * @param user-tag string arg2 + * @param command number arg3 + * @param Success? 1 is success, 0 is failure arg4 + * @param If failure, the errno of the error. arg5 + */ + probe efile_drv__return(int, int, char *, int, int, int); + +Guide to efile_drv.c probe arguments +------------------------------------ + + /* Driver op code: used by efile_drv-entry arg3 */ + /* used by efile_drv-int_entry arg3 */ + /* used by efile_drv-int_return arg3 */ + /* used by efile_drv-return arg3 */ + + #define FILE_OPEN 1 (probe arg3) + probe arg6 = C driver dt_i1 = flags; + probe arg4 = C driver dt_s1 = path; + + #define FILE_READ 2 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + probe arg8 = C driver dt_i3 = size; + + #define FILE_LSEEK 3 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = offset; + probe arg8 = C driver dt_i3 = origin; + + #define FILE_WRITE 4 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + probe arg8 = C driver dt_i3 = size; + + #define FILE_FSTAT 5 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + + #define FILE_PWD 6 (probe arg3) + none + + #define FILE_READDIR 7 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_CHDIR 8 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_FSYNC 9 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + + #define FILE_MKDIR 10 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_DELETE 11 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_RENAME 12 (probe arg3) + probe arg4 = C driver dt_s1 = old_name; + probe arg5 = C driver dt_s2 = new_name; + + #define FILE_RMDIR 13 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_TRUNCATE 14 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + + #define FILE_READ_FILE 15 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_WRITE_INFO 16 (probe arg3) + probe arg6 = C driver dt_i1 = mode; + probe arg7 = C driver dt_i2 = uid; + probe arg8 = C driver dt_i3 = gid; + + #define FILE_LSTAT 19 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_READLINK 20 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_LINK 21 (probe arg3) + probe arg4 = C driver dt_s1 = existing_path; + probe arg5 = C driver dt_s2 = new_path; + + #define FILE_SYMLINK 22 (probe arg3) + probe arg4 = C driver dt_s1 = existing_path; + probe arg5 = C driver dt_s2 = new_path; + + #define FILE_CLOSE 23 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + + #define FILE_PWRITEV 24 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + probe arg8 = C driver dt_i3 = size; + + #define FILE_PREADV 25 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + probe arg8 = C driver dt_i3 = size; + + #define FILE_SETOPT 26 (probe arg3) + probe arg6 = C driver dt_i1 = opt_name; + probe arg7 = C driver dt_i2 = opt_specific_value; + + #define FILE_IPREAD 27 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + probe arg8 = C driver dt_i3 = offsets[0]; + probe arg9 = C driver dt_i4 = size; + + #define FILE_ALTNAME 28 (probe arg3) + probe arg4 = C driver dt_s1 = path; + + #define FILE_READ_LINE 29 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = flags; + probe arg8 = C driver dt_i3 = read_offset; + probe arg9 = C driver dt_i4 = read_ahead; + + #define FILE_FDATASYNC 30 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + + #define FILE_FADVISE 31 (probe arg3) + probe arg6 = C driver dt_i1 = fd; + probe arg7 = C driver dt_i2 = offset; + probe arg8 = C driver dt_i3 = length; + probe arg9 = C driver dt_i4 = advise_type; + + [1]: http://www.erlang.org/euc/08/ |