aboutsummaryrefslogtreecommitdiffstats
path: root/README.dtrace.md
diff options
context:
space:
mode:
authorScott Lystig Fritchie <[email protected]>2011-11-17 00:42:55 -0600
committerPatrik Nyblom <[email protected]>2012-03-22 18:16:13 +0100
commit0721ac40f91295bb3995f86966e5dd031028ca85 (patch)
tree74eccfe667b98b91d44b4be837fe7b5a0ce7b4c1 /README.dtrace.md
parent1e13b92d5c6543c82219610aa1336dbdf1f4dc2d (diff)
downloadotp-0721ac40f91295bb3995f86966e5dd031028ca85.tar.gz
otp-0721ac40f91295bb3995f86966e5dd031028ca85.tar.bz2
otp-0721ac40f91295bb3995f86966e5dd031028ca85.zip
Add DTrace support for OS X, Solaris, and Linux (via SystemTap), 1/4
Since it's been quite a while since I've written C code, *and* I haven't done any significant hacking on the VM itself in years, it's quite likely that I haven't done things in 100% proper style. Or my co-collaborators Dustin Sallings (CouchBase) or Michal Ptaszek (Erlang Solutions). My intent for this patch is to start discussion and review of DTrace support for consideration for the R15 release. For additional background on the motivation for this work, please see the slides for the presentation at the Erlang User Conference 2011 in Stockholm: https://www.erlang-factory.com/upload/presentations/462/euc2011-draft2.pdf Changes relative to dtrace-review2 branch: * Fix errors in OTP test suite 'kernel' file_SUITE * Fix errors in OTP test suite 'kernel' prim_file_SUITE * Fix bad pointer bug in efile_drv.c flush_write() * Move the DTrace material from the top of `README.md` into a new file, `README.dtrace.md` Changes since last push to GitHub (relative to commit 5828a4fb28, which was the former `dtrace-review1` branch): * Rebased onto 14 Nov 2011's "master" branch * Recent changes to the async task queuing mechanism means that the async worker queue length is not available. A bogus value of -1 is hard-coded until there's a good way to peek into the new queue structure and find the queue length. * Small fixes based on review comments by Mikael Pettersson, Andrew Thompson, and Andreas Schultz. Add autoconf support: use "./configure --enable-dtrace" on all supported platforms: * OS X Snow Leopard or later * Solaris 10 or OpenSolaris * Linux, via SystemTap's DTrace compatibility packages * FreeBSD 9.0RC1. FreeBSD 8 and earlier do not have support for USDT, DTrace's User-land Statically Defined Tracing. See the file `erts/emulator/beam/erlang_dtrace.d` for the definition of all DTrace probes in the virtual machine so far. Example D scripts can be found in `lib/dtrace/examples`. Note that if you see the error message `{name of probe} does not match any probes`, then there is no Erlang VM process + DTrace probes running. To fix, start a DTrace-enabled VM or remove `-q` from the `dtrace` command line. The `lib/dtrace` directory contains a small code-only OTP application that contains code that allows Erlang code to trigger a DTrace probe. Dynamic creation & deletion of DTrace probes is not currently supported, so the `dtrace:p()` function is hacked to allow a variable number of arguments (up to four integers and up to four strings) to be used. See the comments at the top of `lib/dtrace/src/dtrace.c` for more detail. One feature that may be controversial is the notion I've introduced of a special process dictionary key that can be used by Erlang code to tag I/O operations for an application-specific purpose. Right now, that tag's name is `dtrace_utag`. The dictionary keys used by `sys` and other modules start with a dollar sign. Perhaps there is some convention (but not a dollar sign?) that this tag should use? The purpose of the process dictionary key is to allow the tag to be included in trace messages, e.g. for file I/O, without changing the API of the `file.erl` module's functions. For example, here's a use of the tag when calling the `file:rename/2` function: (bar@sbb2)1> put(dtrace_utag, "GGOOOAAALL!!!!!"). undefined (bar@sbb2)2> dtrace:init(). ok %% Now start both the `user-probe.d` and `efile_drv.d` D scripts %% found in the `lib/dtrace/examples` directory. (bar@sbb2)3> dtrace:p(7, 8, 9, "one", "four"). true %% The output from the `user-probe.d` script: <0.40.0> GGOOOAAALL!!!!! 7 8 9 0 'one' 'four' '' '' (bar@sbb2)4> file:rename("old-name", "new-name"). {error,enoent} %% The output from the `efile_drv.d` script: async I/O pool port #Port<0.59> queue len 1 async I/O pool port #Port<0.59> queue len 0 efile_drv enter tag={1,110} user tag GGOOOAAALL!!!!! | RENAME (12) | args: old-name new-name , 0 0 (port #Port<0.59>) async I/O worker tag={1,110} | RENAME (12) | efile_drv-int_entry async I/O worker tag={1,110} | RENAME (12) | efile_drv-int_return efile_drv return tag={1,110} user tag GGOOOAAALL!!!!! | RENAME (12) | errno 2 I'm not exactly happy with this choice of tagging, namely using `put(dtrace_utag, Tag::list())`. But this is an experiment, so we'll see how it goes. I can't imagine changing the API for all file.erl functions in order pass the tag explicitly. Some modules have some extensive (ab)use of the C preprocessor to reduce the amount of #ifdefs that clutter the code. In several places, I have not #ifdef'ed automatic variables because of clutter. For the same reason, there are a handful of cases where I added DTrace-related members to a struct definition without an #ifdef. I feel that the result is easier to read than earlier drafts where I did use many more `https://github.com/slfritchie/otp/tree/dtrace-experiment+michal2` if you're curious.) I expect there may be some debate about whether the bloat of the affected structs is worthwhile. I erred on adding stuff to structs, especially in the efile_drv.c driver, not having a full grasp on what was thread-safe and what was not ... so I erred on the side of caution. The efile_drv.c has a work-around for a crazy GCC optimization bug. Thank goodness for Google, I dunno how I would've found a work-around for this silly thing. Many thanks to Trond Norbye for writing clearly about the problem in a membase Git repo commit message. /* * A note on probe naming: if "__" appears in a provider probe * definition, then two things happen during compilation: * * 1. The "__" will turn into a hypen, "-", for the probe name. * 2. The "__" will turn into a single underscore, "_", for the * macro names and function definitions that the compiler and * C developers will see. * * We'll try to use the following naming convention. We're a bit * limited because, as a USDT probe, we can only specify the 4th part * of the probe name, e.g. erlang*:::mumble. The 2nd part of the * probe name is always going to be "beam" or "beam.smp", and the 3rd * part of the probe name will always be the name of the function * that's calling the probe. * * So, all probes will be have names defined in this file using the * convention category__name or category__sub_category__name. This * will translate to probe names of category-name or * category-sub_category-name. * * Each of "category", "sub_category", and "name" may have underscores * but may not have hyphens. */ Add tentative support for sequential tracing sending, queueing, and receiving a message. I don't believe I've fully covered all the major places where it would be useful to have the sequential trace token info in a probe -- guidance from the OTP team would be helpful, if there's time to do that kind of review. Add global variable `erts_this_node_sysname`.
Diffstat (limited to 'README.dtrace.md')
-rw-r--r--README.dtrace.md422
1 files changed, 422 insertions, 0 deletions
diff --git a/README.dtrace.md b/README.dtrace.md
new file mode 100644
index 0000000000..71023dfe69
--- /dev/null
+++ b/README.dtrace.md
@@ -0,0 +1,422 @@
+DTrace and Erlang/OTP
+=====================
+
+History
+-------
+
+The first implementation of DTrace probes for the Erlang virtual
+machine was presented at the [2008 Erlang User Conference] [4]. That
+work, based on the Erlang/OTP R12 release, was discontinued due to
+what appears to be miscommunication with the original developers.
+
+Several users have created Erlang port drivers, linked-in drivers, or
+NIFs that allow Erlang code to try to activate a probe,
+e.g. `foo_module:dtrace_probe("message goes here!")`.
+
+Goals
+-----
+
+1. Annotate as much of the Erlang VM as is practical.
+ * The initial goal is to trace file I/O operations.
+2. Support all platforms that implement DTrace: OS X, Solaris,
+ and (I hope) FreeBSD and NetBSD.
+3. To the extent that it's practical, support SystemTap on Linux
+ via DTrace provider compatibility.
+4. Allow Erlang code to supply annotations.
+
+Supported platforms
+-------------------
+
+The autoconf procedure is supported, I believe, for OS X/Snow Leopard
+and OpenSolaris/64-bit. Just add the `--enable-dtrace` option your
+command to run the `configure` script.
+
+The code has been only very lightly tested on OS X. It ought to
+compile on a Solaris 10 or OpenSolaris machine, but no promises yet.
+
+The autoconf stuff is ugly right now. It could use some cleaning up.
+For example:
+
+* After editing the `erlang_dtrace.d` file, you need to re-run the
+* top-level "configure" script in order to update `erlang_dtrace.h`.
+* `make clean` will remove `erlang_dtrace.h`. A build will fail
+ unless the top-level "configure" script is re-run to re-create that
+ file.
+* The `erlang_dtrace.h` file's location should probably be moved to an
+ OTP platform-specific build dir, for example,
+ `path/to/somewhere/i386-apple-darwin10.8.0`
+* There are probably some other build by-products that are also being
+ put into the "wrong" directory, for example, `erlang_dtrace.o` for
+ Solaris platforms.
+
+Contributions
+-------------
+
+Code contributions are welcome! This is a side project for me (SLF),
+so things would go faster if other people are willing to pitch in.
+Please use the GitHub pull request mechanism or send me an email
+message.
+
+To build from scratch, use this recipe. If you're an experienced Git
+user and wish to add my repository as a remote repository, be my
+guest. Just resume the recipe at command #4.
+
+ % git clone git://github.com/slfritchie/otp.git
+ % cd otp
+ % git checkout -b dtrace-experiment origin/dtrace-experiment
+ % env ERL_TOP=`pwd` ./otp_build autoconf
+ % env ERL_TOP=`pwd` ./configure --enable-dtrace + whatever args you need
+ % env ERL_TOP=`pwd` make
+
+Then `make install` and then start an Erlang shell via
+`/path/to/installed/bin/erl +A 8`. The Erlang shell's banner should
+include `[dtrace]`.
+
+Try using this (ugly) DTrace command to watch file I/O probes in use
+(tested on OS X only, sorry):
+
+ dtrace -Z -n 'erlang*:::efile_drv-entry {printf("%d %d %s | %d | %s %s , %d %d %d", arg0, arg1, arg2 == NULL ? "" : copyinstr(arg2), arg3, arg4 == NULL ? "" : copyinstr(arg4), arg5 == NULL ? "" : copyinstr(arg5), arg6, arg7, arg8)} erlang*:::efile_drv-int* {printf("%d %d %d | %d", arg0, arg1, arg2, arg3);} erlang*:::efile_drv-return {printf("%d %d %s | %d | %d %d %d", arg0, arg1, arg2 == NULL ? "" : copyinstr(arg2), arg3, arg4, arg5, arg6 ) ; }'
+
+Implementation summary
+----------------------
+
+So far, most effort has been focused on the `efile_drv.erl` code,
+which implements most file I/O on behalf of the Erlang virtual
+machine. This driver also presents a big challenge: its use of an I/O
+worker pool (enabled by using the `erl +A 8` flag, for example) makes
+it much more difficult to trace I/O activity because each of the
+following may be executed in a different Pthread:
+
+* I/O initiation (Erlang code)
+* I/O proxy process handling, e.g. read/write when file is not opened
+ in `raw` mode, operations executed by the code & file server processes.
+ (Erlang code)
+* `efile_drv` command setup (C code)
+* `efile_drv` command execution (C code)
+* `efile_drv` status return (C code)
+
+**TODO: keep this description up-to-date.**
+
+Example output from `lib/dtrace/examples/efile_drv.d` while executing
+`file:rename("old-name", "new-name")`:
+
+ efile_drv enter tag={3,84} user tag some-user-tag | RENAME (12) | args: old-name new-name , 0 0 (port #Port<0.59>)
+ async I/O worker tag={3,83} | RENAME (12) | efile_drv-int_entry
+ async I/O worker tag={3,83} | RENAME (12) | efile_drv-int_return
+ efile_drv return tag={3,83} user tag | RENAME (12) | errno 2
+
+... where the following key can help decipher the output:
+
+* `{3,83}` is the Erlang scheduler thread number (3) and operation
+ counter number (83) assigned to this I/O operation. Together,
+ these two numbers form a unique ID for the I/O operation.
+* `12` is the command number for the rename operation. See the
+ definition for `FILE_RENAME` in the source code file `efile_drv.c`
+ or the `BEGIN` section of the D script `lib/dtrace/examples/efile_drv.d`.
+* `old-name` and `new-name` are the two string arguments for the
+ source and destination of the `rename(2)` system call.
+ The two integer arguments are unused; the simple formatting code
+ prints the arguments anyway, 0 and 0.
+* The worker pool code was called on behalf of Erlang port `#Port<0.59>`.
+* The system call failed with a POSIX errno value of 2: `ENOENT`,
+ because the path `old-name` does not exist.
+* The `efile_drv-int_entry` and `efile_drv_int_return` probes are
+ provided in case the user is
+ interested in measuring only the latency of code executed by
+ `efile_drv` asynchronous functions by I/O worker pool threads
+ and the OS system call that they encapsulate.
+
+So, where does the `some-user-tag` string come from?
+
+At the moment, the user tag comes from code like the following:
+
+ put(dtrace_utag, "some-user-tag"),
+ file:rename("old-name", "new-name").
+
+This method of tagging I/O at the Erlang level is subject to change.
+
+Example DTrace probe specification
+----------------------------------
+
+ /**
+ * Fired when a message is sent from one local process to another.
+ *
+ * NOTE: The 'size' parameter is in machine-dependent words and
+ * that the actual size of any binary terms in the message
+ * are not included.
+ *
+ * @param sender the PID (string form) of the sender
+ * @param receiver the PID (string form) of the receiver
+ * @param size the size of the message being delivered (words)
+ * @param token_label for the sender's sequential trace token
+ * @param token_previous count for the sender's sequential trace token
+ * @param token_current count for the sender's sequential trace token
+ */
+ probe message__send(char *sender, char *receiver, uint32_t size,
+ int token_label, int token_previous, int token_current);
+
+ /**
+ * Fired when a message is sent from a local process to a remote process.
+ *
+ * NOTE: The 'size' parameter is in machine-dependent words and
+ * that the actual size of any binary terms in the message
+ * are not included.
+ *
+ * @param sender the PID (string form) of the sender
+ * @param node_name the Erlang node name (string form) of the receiver
+ * @param receiver the PID/name (string form) of the receiver
+ * @param size the size of the message being delivered (words)
+ * @param token_label for the sender's sequential trace token
+ * @param token_previous count for the sender's sequential trace token
+ * @param token_current count for the sender's sequential trace token
+ */
+ probe message__send__remote(char *sender, char *node_name, char *receiver,
+ uint32_t size,
+ int token_label, int token_previous, int token_current);
+
+ /**
+ * Fired when a message is queued to a local process. This probe
+ * will not fire if the sender's pid == receiver's pid.
+ *
+ * NOTE: The 'size' parameter is in machine-dependent words and
+ * that the actual size of any binary terms in the message
+ * are not included.
+ *
+ * @param receiver the PID (string form) of the receiver
+ * @param size the size of the message being delivered (words)
+ * @param queue_len length of the queue of the receiving process
+ * @param token_label for the sender's sequential trace token
+ * @param token_previous count for the sender's sequential trace token
+ * @param token_current count for the sender's sequential trace token
+ */
+ probe message__queued(char *receiver, uint32_t size, uint32_t queue_len,
+ int token_label, int token_previous, int token_current);
+
+ /**
+ * Fired when a message is 'receive'd by a local process and removed
+ * from its mailbox.
+ *
+ * NOTE: The 'size' parameter is in machine-dependent words and
+ * that the actual size of any binary terms in the message
+ * are not included.
+ *
+ * @param receiver the PID (string form) of the receiver
+ * @param size the size of the message being delivered (words)
+ * @param queue_len length of the queue of the receiving process
+ * @param token_label for the sender's sequential trace token
+ * @param token_previous count for the sender's sequential trace token
+ * @param token_current count for the sender's sequential trace token
+ */
+ probe message__receive(char *receiver, uint32_t size, uint32_t queue_len,
+ int token_label, int token_previous, int token_current);
+
+ /* ... */
+
+ /* Async driver pool */
+
+ /**
+ * Show the post-add length of the async driver thread pool member's queue.
+ *
+ * NOTE: The port name is not available: additional lock(s) must
+ * be acquired in order to get the port name safely in an SMP
+ * environment. The same is true for the aio__pool_get probe.
+ *
+ * @param port the Port (string form)
+ * @param new queue length
+ */
+ probe aio_pool__add(char *, int);
+
+ /**
+ * Show the post-get length of the async driver thread pool member's queue.
+ *
+ * @param port the Port (string form)
+ * @param new queue length
+ */
+ probe aio_pool__get(char *, int);
+
+ /* Probes for efile_drv.c */
+
+ /**
+ * Entry into the efile_drv.c file I/O driver
+ *
+ * For a list of command numbers used by this driver, see the section
+ * "Guide to probe arguments" in ../../../README.md. That section
+ * also contains explanation of the various integer and string
+ * arguments that may be present when any particular probe fires.
+ *
+ * TODO: Adding the port string, args[10], is a pain. Making that
+ * port string available to all the other efile_drv.c probes
+ * will be more pain. Is the pain worth it? If yes, then
+ * add them everywhere else and grit our teeth. If no, then
+ * rip it out.
+ *
+ * @param thread-id number of the scheduler Pthread arg0
+ * @param tag number: {thread-id, tag} uniquely names a driver operation
+ * @param user-tag string arg2
+ * @param command number arg3
+ * @param string argument 1 arg4
+ * @param string argument 2 arg5
+ * @param integer argument 1 arg6
+ * @param integer argument 2 arg7
+ * @param integer argument 3 arg8
+ * @param integer argument 4 arg9
+ * @param port the port ID of the busy port args[10]
+ */
+ probe efile_drv__entry(int, int, char *, int, char *, char *,
+ int64_t, int64_t, int64_t, int64_t, char *);
+
+ /**
+ * Entry into the driver's internal work function. Computation here
+ * is performed by a async worker pool Pthread.
+ *
+ * @param thread-id number
+ * @param tag number
+ * @param command number
+ */
+ probe efile_drv__int_entry(int, int, int);
+
+ /**
+ * Return from the driver's internal work function.
+ *
+ * @param thread-id number
+ * @param tag number
+ * @param command number
+ */
+ probe efile_drv__int_return(int, int, int);
+
+ /**
+ * Return from the efile_drv.c file I/O driver
+ *
+ * @param thread-id number arg0
+ * @param tag number arg1
+ * @param user-tag string arg2
+ * @param command number arg3
+ * @param Success? 1 is success, 0 is failure arg4
+ * @param If failure, the errno of the error. arg5
+ */
+ probe efile_drv__return(int, int, char *, int, int, int);
+
+Guide to efile_drv.c probe arguments
+------------------------------------
+
+ /* Driver op code: used by efile_drv-entry arg3 */
+ /* used by efile_drv-int_entry arg3 */
+ /* used by efile_drv-int_return arg3 */
+ /* used by efile_drv-return arg3 */
+
+ #define FILE_OPEN 1 (probe arg3)
+ probe arg6 = C driver dt_i1 = flags;
+ probe arg4 = C driver dt_s1 = path;
+
+ #define FILE_READ 2 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+ probe arg7 = C driver dt_i2 = flags;
+ probe arg8 = C driver dt_i3 = size;
+
+ #define FILE_LSEEK 3 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+ probe arg7 = C driver dt_i2 = offset;
+ probe arg8 = C driver dt_i3 = origin;
+
+ #define FILE_WRITE 4 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+ probe arg7 = C driver dt_i2 = flags;
+ probe arg8 = C driver dt_i3 = size;
+
+ #define FILE_FSTAT 5 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+
+ #define FILE_PWD 6 (probe arg3)
+ none
+
+ #define FILE_READDIR 7 (probe arg3)
+ probe arg4 = C driver dt_s1 = path;
+
+ #define FILE_CHDIR 8 (probe arg3)
+ probe arg4 = C driver dt_s1 = path;
+
+ #define FILE_FSYNC 9 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+
+ #define FILE_MKDIR 10 (probe arg3)
+ probe arg4 = C driver dt_s1 = path;
+
+ #define FILE_DELETE 11 (probe arg3)
+ probe arg4 = C driver dt_s1 = path;
+
+ #define FILE_RENAME 12 (probe arg3)
+ probe arg4 = C driver dt_s1 = old_name;
+ probe arg5 = C driver dt_s2 = new_name;
+
+ #define FILE_RMDIR 13 (probe arg3)
+ probe arg4 = C driver dt_s1 = path;
+
+ #define FILE_TRUNCATE 14 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+ probe arg7 = C driver dt_i2 = flags;
+
+ #define FILE_READ_FILE 15 (probe arg3)
+ probe arg4 = C driver dt_s1 = path;
+
+ #define FILE_WRITE_INFO 16 (probe arg3)
+ probe arg6 = C driver dt_i1 = mode;
+ probe arg7 = C driver dt_i2 = uid;
+ probe arg8 = C driver dt_i3 = gid;
+
+ #define FILE_LSTAT 19 (probe arg3)
+ probe arg4 = C driver dt_s1 = path;
+
+ #define FILE_READLINK 20 (probe arg3)
+ probe arg4 = C driver dt_s1 = path;
+
+ #define FILE_LINK 21 (probe arg3)
+ probe arg4 = C driver dt_s1 = existing_path;
+ probe arg5 = C driver dt_s2 = new_path;
+
+ #define FILE_SYMLINK 22 (probe arg3)
+ probe arg4 = C driver dt_s1 = existing_path;
+ probe arg5 = C driver dt_s2 = new_path;
+
+ #define FILE_CLOSE 23 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+ probe arg7 = C driver dt_i2 = flags;
+
+ #define FILE_PWRITEV 24 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+ probe arg7 = C driver dt_i2 = flags;
+ probe arg8 = C driver dt_i3 = size;
+
+ #define FILE_PREADV 25 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+ probe arg7 = C driver dt_i2 = flags;
+ probe arg8 = C driver dt_i3 = size;
+
+ #define FILE_SETOPT 26 (probe arg3)
+ probe arg6 = C driver dt_i1 = opt_name;
+ probe arg7 = C driver dt_i2 = opt_specific_value;
+
+ #define FILE_IPREAD 27 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+ probe arg7 = C driver dt_i2 = flags;
+ probe arg8 = C driver dt_i3 = offsets[0];
+ probe arg9 = C driver dt_i4 = size;
+
+ #define FILE_ALTNAME 28 (probe arg3)
+ probe arg4 = C driver dt_s1 = path;
+
+ #define FILE_READ_LINE 29 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+ probe arg7 = C driver dt_i2 = flags;
+ probe arg8 = C driver dt_i3 = read_offset;
+ probe arg9 = C driver dt_i4 = read_ahead;
+
+ #define FILE_FDATASYNC 30 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+
+ #define FILE_FADVISE 31 (probe arg3)
+ probe arg6 = C driver dt_i1 = fd;
+ probe arg7 = C driver dt_i2 = offset;
+ probe arg8 = C driver dt_i3 = length;
+ probe arg9 = C driver dt_i4 = advise_type;
+
+ [1]: http://www.erlang.org/euc/08/