aboutsummaryrefslogblamecommitdiffstats
path: root/README.dtrace.md
blob: 71023dfe69aa60422d3c48adb4b19425a1bc6bf6 (plain) (tree)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422





































































































































































































































































































































































































































                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
DTrace and Erlang/OTP
=====================

History
-------

The first implementation of DTrace probes for the Erlang virtual
machine was presented at the [2008 Erlang User Conference] [4].  That
work, based on the Erlang/OTP R12 release, was discontinued due to
what appears to be miscommunication with the original developers.

Several users have created Erlang port drivers, linked-in drivers, or
NIFs that allow Erlang code to try to activate a probe,
e.g. `foo_module:dtrace_probe("message goes here!")`.

Goals
-----

1. Annotate as much of the Erlang VM as is practical.
   * The initial goal is to trace file I/O operations.
2. Support all platforms that implement DTrace: OS X, Solaris,
   and (I hope) FreeBSD and NetBSD.
3. To the extent that it's practical, support SystemTap on Linux
   via DTrace provider compatibility.
4. Allow Erlang code to supply annotations.

Supported platforms
-------------------

The autoconf procedure is supported, I believe, for OS X/Snow Leopard
and OpenSolaris/64-bit.  Just add the `--enable-dtrace` option your
command to run the `configure` script.

The code has been only very lightly tested on OS X.  It ought to
compile on a Solaris 10 or OpenSolaris machine, but no promises yet.

The autoconf stuff is ugly right now.  It could use some cleaning up.
For example:

* After editing the `erlang_dtrace.d` file, you need to re-run the
* top-level "configure" script in order to update `erlang_dtrace.h`.
* `make clean` will remove `erlang_dtrace.h`.  A build will fail
  unless the top-level "configure" script is re-run to re-create that
  file.
* The `erlang_dtrace.h` file's location should probably be moved to an
  OTP platform-specific build dir, for example,
  `path/to/somewhere/i386-apple-darwin10.8.0`
* There are probably some other build by-products that are also being
  put into the "wrong" directory, for example, `erlang_dtrace.o` for
  Solaris platforms.

Contributions
-------------

Code contributions are welcome!  This is a side project for me (SLF),
so things would go faster if other people are willing to pitch in.
Please use the GitHub pull request mechanism or send me an email
message.

To build from scratch, use this recipe.  If you're an experienced Git
user and wish to add my repository as a remote repository, be my
guest.  Just resume the recipe at command #4.

    % git clone git://github.com/slfritchie/otp.git
    % cd otp
    % git checkout -b dtrace-experiment origin/dtrace-experiment
    % env ERL_TOP=`pwd` ./otp_build autoconf
    % env ERL_TOP=`pwd` ./configure --enable-dtrace + whatever args you need
    % env ERL_TOP=`pwd` make

Then `make install` and then start an Erlang shell via
`/path/to/installed/bin/erl +A 8`.  The Erlang shell's banner should
include `[dtrace]`.

Try using this (ugly) DTrace command to watch file I/O probes in use
(tested on OS X only, sorry):

    dtrace -Z -n 'erlang*:::efile_drv-entry {printf("%d %d %s | %d | %s %s , %d %d %d", arg0, arg1, arg2 == NULL ? "" : copyinstr(arg2), arg3, arg4 == NULL ? "" : copyinstr(arg4), arg5 == NULL ? "" : copyinstr(arg5), arg6, arg7, arg8)} erlang*:::efile_drv-int* {printf("%d %d %d | %d", arg0, arg1, arg2, arg3);} erlang*:::efile_drv-return {printf("%d %d %s | %d | %d %d %d", arg0, arg1, arg2 == NULL ? "" : copyinstr(arg2), arg3, arg4, arg5, arg6 ) ; }'

Implementation summary
----------------------

So far, most effort has been focused on the `efile_drv.erl` code,
which implements most file I/O on behalf of the Erlang virtual
machine.  This driver also presents a big challenge: its use of an I/O
worker pool (enabled by using the `erl +A 8` flag, for example) makes
it much more difficult to trace I/O activity because each of the
following may be executed in a different Pthread:

* I/O initiation (Erlang code)
* I/O proxy process handling, e.g. read/write when file is not opened
  in `raw` mode, operations executed by the code & file server processes.
  (Erlang code)
* `efile_drv` command setup (C code)
* `efile_drv` command execution (C code)
* `efile_drv` status return (C code)

**TODO: keep this description up-to-date.**

Example output from `lib/dtrace/examples/efile_drv.d` while executing
`file:rename("old-name", "new-name")`:

    efile_drv enter tag={3,84} user tag some-user-tag | RENAME (12) | args: old-name new-name , 0 0 (port #Port<0.59>)
    async I/O worker tag={3,83} | RENAME (12) | efile_drv-int_entry
    async I/O worker tag={3,83} | RENAME (12) | efile_drv-int_return
    efile_drv return tag={3,83} user tag  | RENAME (12) | errno 2

... where the following key can help decipher the output:

* `{3,83}` is the Erlang scheduler thread number (3) and operation
  counter number (83) assigned to this I/O operation.  Together,
  these two numbers form a unique ID for the I/O operation.
* `12` is the command number for the rename operation.  See the
  definition for `FILE_RENAME` in the source code file `efile_drv.c`
  or the `BEGIN` section of the D script `lib/dtrace/examples/efile_drv.d`.
* `old-name` and `new-name` are the two string arguments for the
  source and destination of the `rename(2)` system call.
  The two integer arguments are unused; the simple formatting code
  prints the arguments anyway, 0 and 0.
* The worker pool code was called on behalf of Erlang port `#Port<0.59>`.
* The system call failed with a POSIX errno value of 2: `ENOENT`,
  because the path `old-name` does not exist.
* The `efile_drv-int_entry` and `efile_drv_int_return` probes are
  provided in case the user is
  interested in measuring only the latency of code executed by
  `efile_drv` asynchronous functions by I/O worker pool threads
  and the OS system call that they encapsulate.

So, where does the `some-user-tag` string come from?

At the moment, the user tag comes from code like the following:

    put(dtrace_utag, "some-user-tag"),
    file:rename("old-name", "new-name").

This method of tagging I/O at the Erlang level is subject to change.

Example DTrace probe specification
----------------------------------

    /**
     * Fired when a message is sent from one local process to another.
     *
     * NOTE: The 'size' parameter is in machine-dependent words and
     *       that the actual size of any binary terms in the message
     *       are not included.
     *
     * @param sender the PID (string form) of the sender
     * @param receiver the PID (string form) of the receiver
     * @param size the size of the message being delivered (words)
     * @param token_label for the sender's sequential trace token
     * @param token_previous count for the sender's sequential trace token
     * @param token_current count for the sender's sequential trace token
     */
    probe message__send(char *sender, char *receiver, uint32_t size,
                        int token_label, int token_previous, int token_current);

    /**
     * Fired when a message is sent from a local process to a remote process.
     *
     * NOTE: The 'size' parameter is in machine-dependent words and
     *       that the actual size of any binary terms in the message
     *       are not included.
     *
     * @param sender the PID (string form) of the sender
     * @param node_name the Erlang node name (string form) of the receiver
     * @param receiver the PID/name (string form) of the receiver
     * @param size the size of the message being delivered (words)
     * @param token_label for the sender's sequential trace token
     * @param token_previous count for the sender's sequential trace token
     * @param token_current count for the sender's sequential trace token
     */
    probe message__send__remote(char *sender, char *node_name, char *receiver,
                                uint32_t size,
                        int token_label, int token_previous, int token_current);

    /**
     * Fired when a message is queued to a local process.  This probe
     * will not fire if the sender's pid == receiver's pid.
     *
     * NOTE: The 'size' parameter is in machine-dependent words and
     *       that the actual size of any binary terms in the message
     *       are not included.
     *
     * @param receiver the PID (string form) of the receiver
     * @param size the size of the message being delivered (words)
     * @param queue_len length of the queue of the receiving process
     * @param token_label for the sender's sequential trace token
     * @param token_previous count for the sender's sequential trace token
     * @param token_current count for the sender's sequential trace token
     */
    probe message__queued(char *receiver, uint32_t size, uint32_t queue_len,
                        int token_label, int token_previous, int token_current);

    /**
     * Fired when a message is 'receive'd by a local process and removed
     * from its mailbox.
     *
     * NOTE: The 'size' parameter is in machine-dependent words and
     *       that the actual size of any binary terms in the message
     *       are not included.
     *
     * @param receiver the PID (string form) of the receiver
     * @param size the size of the message being delivered (words)
     * @param queue_len length of the queue of the receiving process
     * @param token_label for the sender's sequential trace token
     * @param token_previous count for the sender's sequential trace token
     * @param token_current count for the sender's sequential trace token
     */
    probe message__receive(char *receiver, uint32_t size, uint32_t queue_len,
                        int token_label, int token_previous, int token_current);

    /* ... */

    /* Async driver pool */

    /**
     * Show the post-add length of the async driver thread pool member's queue.
     *
     * NOTE: The port name is not available: additional lock(s) must
     *       be acquired in order to get the port name safely in an SMP
     *       environment.  The same is true for the aio__pool_get probe.
     *
     * @param port the Port (string form)
     * @param new queue length
     */
    probe aio_pool__add(char *, int);

    /**
     * Show the post-get length of the async driver thread pool member's queue.
     *
     * @param port the Port (string form)
     * @param new queue length
     */
    probe aio_pool__get(char *, int);

    /* Probes for efile_drv.c */

    /**
     * Entry into the efile_drv.c file I/O driver
     *
     * For a list of command numbers used by this driver, see the section
     * "Guide to probe arguments" in ../../../README.md.  That section
     * also contains explanation of the various integer and string
     * arguments that may be present when any particular probe fires.
     *
     * TODO: Adding the port string, args[10], is a pain.  Making that
     *       port string available to all the other efile_drv.c probes
     *       will be more pain.  Is the pain worth it?  If yes, then
     *       add them everywhere else and grit our teeth.  If no, then
     *       rip it out.
     *
     * @param thread-id number of the scheduler Pthread                   arg0
     * @param tag number: {thread-id, tag} uniquely names a driver operation
     * @param user-tag string                                             arg2
     * @param command number                                              arg3
     * @param string argument 1                                           arg4
     * @param string argument 2                                           arg5
     * @param integer argument 1                                          arg6
     * @param integer argument 2                                          arg7
     * @param integer argument 3                                          arg8
     * @param integer argument 4                                          arg9
     * @param port the port ID of the busy port                       args[10]
     */
    probe efile_drv__entry(int, int, char *, int, char *, char *,
                           int64_t, int64_t, int64_t, int64_t, char *);

    /**
     * Entry into the driver's internal work function.  Computation here
     * is performed by a async worker pool Pthread.
     *
     * @param thread-id number
     * @param tag number
     * @param command number
     */
    probe efile_drv__int_entry(int, int, int);

    /**
     * Return from the driver's internal work function.
     *
     * @param thread-id number
     * @param tag number
     * @param command number
     */
    probe efile_drv__int_return(int, int, int);

    /**
     * Return from the efile_drv.c file I/O driver
     *
     * @param thread-id number                                            arg0
     * @param tag number                                                  arg1
     * @param user-tag string                                             arg2
     * @param command number                                              arg3
     * @param Success? 1 is success, 0 is failure                         arg4
     * @param If failure, the errno of the error.                         arg5
     */
    probe efile_drv__return(int, int, char *, int, int, int);

Guide to efile_drv.c probe arguments
------------------------------------

    /* Driver op code: used by efile_drv-entry      arg3 */
    /*                 used by efile_drv-int_entry  arg3 */
    /*                 used by efile_drv-int_return arg3 */
    /*                 used by efile_drv-return     arg3 */

    #define FILE_OPEN            1                 (probe arg3)
            probe arg6 = C driver dt_i1 = flags;
            probe arg4 = C driver dt_s1 = path;

    #define FILE_READ            2                 (probe arg3)
            probe arg6 = C driver dt_i1 = fd;
            probe arg7 = C driver dt_i2 = flags;
            probe arg8 = C driver dt_i3 = size;

    #define FILE_LSEEK           3                 (probe arg3)
            probe arg6 = C driver dt_i1 = fd;
            probe arg7 = C driver dt_i2 = offset;
            probe arg8 = C driver dt_i3 = origin;

    #define FILE_WRITE           4                 (probe arg3)
            probe arg6 = C driver dt_i1 = fd;
            probe arg7 = C driver dt_i2 = flags;
            probe arg8 = C driver dt_i3 = size;

    #define FILE_FSTAT           5                 (probe arg3)
            probe arg6 = C driver dt_i1 = fd;

    #define FILE_PWD             6                 (probe arg3)
            none

    #define FILE_READDIR         7                 (probe arg3)
            probe arg4 = C driver dt_s1 = path;

    #define FILE_CHDIR           8                 (probe arg3)
            probe arg4 = C driver dt_s1 = path;

    #define FILE_FSYNC           9                 (probe arg3)
                probe arg6 = C driver dt_i1 = fd;

    #define FILE_MKDIR          10                 (probe arg3)
            probe arg4 = C driver dt_s1 = path;

    #define FILE_DELETE         11                 (probe arg3)
            probe arg4 = C driver dt_s1 = path;

    #define FILE_RENAME         12                 (probe arg3)
            probe arg4 = C driver dt_s1 = old_name;
            probe arg5 = C driver dt_s2 = new_name;

    #define FILE_RMDIR          13                 (probe arg3)
            probe arg4 = C driver dt_s1 = path;

    #define FILE_TRUNCATE       14                 (probe arg3)
            probe arg6 = C driver dt_i1 = fd;
            probe arg7 = C driver dt_i2 = flags;

    #define FILE_READ_FILE      15                 (probe arg3)
            probe arg4 = C driver dt_s1 = path;

    #define FILE_WRITE_INFO     16                 (probe arg3)
            probe arg6 = C driver dt_i1 = mode;
            probe arg7 = C driver dt_i2 = uid;
            probe arg8 = C driver dt_i3 = gid;

    #define FILE_LSTAT          19                 (probe arg3)
            probe arg4 = C driver dt_s1 = path;

    #define FILE_READLINK       20                 (probe arg3)
            probe arg4 = C driver dt_s1 = path;

    #define FILE_LINK           21                 (probe arg3)
            probe arg4 = C driver dt_s1 = existing_path;
            probe arg5 = C driver dt_s2 = new_path;

    #define FILE_SYMLINK        22                 (probe arg3)
            probe arg4 = C driver dt_s1 = existing_path;
            probe arg5 = C driver dt_s2 = new_path;

    #define FILE_CLOSE          23                 (probe arg3)
            probe arg6 = C driver dt_i1 = fd;
            probe arg7 = C driver dt_i2 = flags;

    #define FILE_PWRITEV        24                 (probe arg3)
            probe arg6 = C driver dt_i1 = fd;
            probe arg7 = C driver dt_i2 = flags;
            probe arg8 = C driver dt_i3 = size;

    #define FILE_PREADV         25                 (probe arg3)
            probe arg6 = C driver dt_i1 = fd;
            probe arg7 = C driver dt_i2 = flags;
            probe arg8 = C driver dt_i3 = size;

    #define FILE_SETOPT         26                 (probe arg3)
            probe arg6 = C driver dt_i1 = opt_name;
            probe arg7 = C driver dt_i2 = opt_specific_value;

    #define FILE_IPREAD         27                 (probe arg3)
            probe arg6 = C driver dt_i1 = fd;
            probe arg7 = C driver dt_i2 = flags;
            probe arg8 = C driver dt_i3 = offsets[0];
            probe arg9 = C driver dt_i4 = size;

    #define FILE_ALTNAME        28                 (probe arg3)
            probe arg4 = C driver dt_s1 = path;

    #define FILE_READ_LINE      29                 (probe arg3)
            probe arg6 = C driver dt_i1 = fd;
            probe arg7 = C driver dt_i2 = flags;
            probe arg8 = C driver dt_i3 = read_offset;
            probe arg9 = C driver dt_i4 = read_ahead;

    #define FILE_FDATASYNC      30                 (probe arg3)
            probe arg6 = C driver dt_i1 = fd;

    #define FILE_FADVISE        31                 (probe arg3)
            probe arg6 = C driver dt_i1 = fd;
            probe arg7 = C driver dt_i2 = offset;
            probe arg8 = C driver dt_i3 = length;
            probe arg9 = C driver dt_i4 = advise_type;

   [1]: http://www.erlang.org/euc/08/