diff options
author | Erlang/OTP <[email protected]> | 2010-02-10 13:03:54 +0000 |
---|---|---|
committer | Erlang/OTP <[email protected]> | 2010-02-10 13:03:54 +0000 |
commit | ada6afd00530d6569c41741cfd9d63311ff60f25 (patch) | |
tree | 3ff92e054e2babf7478fa0a96e7d82fd42ab5595 /lib/tools/doc/src/lcnt_chapter.xml | |
parent | e7d6097b0f015d5a489ea3a3ad71e064a87c0576 (diff) | |
parent | 9a22cca549f88f955163e165b8849a6129925e8b (diff) | |
download | otp-ada6afd00530d6569c41741cfd9d63311ff60f25.tar.gz otp-ada6afd00530d6569c41741cfd9d63311ff60f25.tar.bz2 otp-ada6afd00530d6569c41741cfd9d63311ff60f25.zip |
Merge branch 'egil/lcnt' into ccase/r13b04_dev
* egil/lcnt:
Add test suite for lcnt in tools
Add lcnt:rt_opt/1 bindings to erts_debug
Add runtime option to enable/disable lcnt stats
Add auto width on string output
Add lcnt documentation
Add lock profiling tool
OTP-8424 Add lock profiling tool.
The Lock profiling tool, lcnt, can make use of the internal lock
statistics when the runtime system is built with this feature
enabled.
This provides a mechanism to examine potential lock bottlenecks
within the runtime itself.
- Add erts_debug:lock_counters({copy_save, bool()}). This option
enables or disables statistics saving for destroyed processes and
ets-tables. Enabling this might consume a lot of memory.
- Add id-numbering for lock classes which is otherwise undefined.
Diffstat (limited to 'lib/tools/doc/src/lcnt_chapter.xml')
-rw-r--r-- | lib/tools/doc/src/lcnt_chapter.xml | 301 |
1 files changed, 301 insertions, 0 deletions
diff --git a/lib/tools/doc/src/lcnt_chapter.xml b/lib/tools/doc/src/lcnt_chapter.xml new file mode 100644 index 0000000000..8f44b23f59 --- /dev/null +++ b/lib/tools/doc/src/lcnt_chapter.xml @@ -0,0 +1,301 @@ +<?xml version="1.0" encoding="latin1" ?> +<!DOCTYPE chapter SYSTEM "chapter.dtd"> + +<chapter> + <header> + <copyright> + <year>2009</year><year>2010</year> + <holder>Ericsson AB. All Rights Reserved.</holder> + </copyright> + <legalnotice> + The contents of this file are subject to the Erlang Public License, + Version 1.1, (the "License"); you may not use this file except in + compliance with the License. You should have received a copy of the + Erlang Public License along with this software. If not, it can be + retrieved online at http://www.erlang.org/. + + Software distributed under the License is distributed on an "AS IS" + basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See + the License for the specific language governing rights and limitations + under the License. + + </legalnotice> + + <title>lcnt - The Lock Profiler</title> + <prepared>Björn-Egil Dahlberg</prepared> + <responsible>nobody</responsible> + <docno></docno> + <approved>nobody</approved> + <checked>no</checked> + <date>2009-11-26</date> + <rev>PA1</rev> + <file>lcnt_chapter.xml</file> + </header> + <p> + Internally in the Erlang runtime system locks are used to protect resources from being updated from multiple threads in a fatal way. Locks are necessary + to ensure that the runtime system works properly but it also introduces a couple of limitations. Lock contention and locking overhead. + </p> + <p> + With lock contention we mean when one thread locks a resource and another thread, or threads, tries to acquire the same resource at the same time. The lock will deny + the other thread access to the resource and the thread will be blocked from continuing its execution. The second thread has to wait until the first thread has + completed its access to the resource and unlocked it. The <c>lcnt</c> tool measures these lock conflicts. + </p> + <p> + Locks has an inherent cost in execution time and memory space. It takes time initialize, destroy, aquiring or releasing locks. To decrease lock contention it + some times necessary to use finer grained locking strategies. This will usually also increase the locking overhead and hence there is a tradeoff + between lock contention and overhead. In general, lock contention increases with the number of threads running concurrently. The <c>lcnt</c> tool does not measure locking overhead. + </p> + <section> + <title> Enabling lock-counting </title> + <p>For investigation of locks in the emulator we use an internal tool called <c>lcnt</c> (short for lock-count). The VM needs to be compiled with this option enabled. To enable this, use:</p> + + <pre> +cd $ERL_TOP +./configure --enable-lock-counter + </pre> + + <p> + Another way to enable this alongside a normal VM is to compile it at emulator directory level, much like a debug build. To compile it this way do the following, + </p> + <pre> +cd $ERL_TOP/erts/emulator +make lcnt FLAVOR=smp + </pre> + <p> and then starting Erlang with,</p> + <pre> +$ERL_TOP/bin/cerl -lcnt + </pre> + <p>To verify that you lock-counting enabled check that <c>[lock-counting]</c> appears in the status text when the VM is started.</p> + <pre> +Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:8:8] [rq:8] [async-threads:0] [hipe] + [kernel-poll:false] [lock-counting] + </pre> + </section> + <section> + <title>Getting started</title> + <p>Once you have a lock counting enabled VM the module <c>lcnt</c> can be used. The module is intended to be used from the current running nodes shell. To access remote nodes use <c>lcnt:clear(Node)</c> and <c>lcnt:collect(Node)</c>. </p> + + <p>All locks are continuously monitored and its statistics updated. Use <c>lcnt:clear/0</c> to initially clear all counters before running any specific tests. This command will also reset the duration timer internally.</p> + <p>To retrieve lock statistics information use, <c>lcnt:collect/0,1</c>. The collect operation will start a <c>lcnt</c> server if it not already started. All collected data will be built into an erlang term and uploaded to the server and a duration time will also be uploaded. This duration is the time between <c>lcnt:clear/0,1</c> and <c>lcnt:collect/0,1</c>.</p> + + <p>Once the data is collected to the server it can be filtered, sorted and printed in many different ways.</p> + <p>See the <seealso marker="lcnt">reference manual</seealso> for a description of each function.</p> + </section> + <section> + <title> Example of usage </title> + <p>From the Erlang shell:</p> + <pre> +Erlang R13B03 (erts-5.7.4) [source] [smp:8:8] [rq:8] [async-threads:0] [hipe] + [kernel-poll:false] [lock-counting] +1> lcnt:rt_opt({copy_save, true}). +false +2> lcnt:clear(), big:bang(1000), lcnt:collect(). +ok +3> lcnt:conflicts(). + lock id #tries #collisions collisions [%] time [us] duration [%] + ----- --- ------- ------------ --------------- ---------- ------------- + alcu_allocator 50 4113692 158921 3.8632 215464 4.4962 + pix_lock 256 4007140 4882 0.1218 12221 0.2550 + run_queue 8 2287246 6949 0.3038 9825 0.2050 + proc_main 1029 3115778 25755 0.8266 1199 0.0250 + proc_msgq 1029 2467022 1910 0.0774 1048 0.0219 + proc_status 1029 5708439 2435 0.0427 706 0.0147 + message_pre_alloc_lock 8 2008569 134 0.0067 90 0.0019 + timeofday 1 54065 8 0.0148 22 0.0005 + gc_info 1 7071 7 0.0990 5 0.0001 +ok +</pre> +<p> + Another way to to profile a specific function is to use <c>lcnt:apply/3</c> or <c>lcnt:apply/1</c> which does <c>lcnt:clear/0</c> before the function and <c>lcnt:collect/0</c> after its invocation. + It also sets <c>copy_save</c> to <c>true</c> for the duration of the function call +</p> +<pre> +Erlang R13B03 (erts-5.7.4) [source] [smp:8:8] [rq:8] [async-threads:0] [hipe] + [kernel-poll:false] [lock-counting] +1> lcnt:apply(fun() -> big:bang(1000) end). +4384.338 +2> lcnt:conflicts(). + lock id #tries #collisions collisions [%] time [us] duration [%] + ----- --- ------- ------------ --------------- ---------- ------------- + alcu_allocator 50 4117913 183091 4.4462 234232 5.1490 + run_queue 8 2050398 3801 0.1854 6700 0.1473 + pix_lock 256 4007080 4943 0.1234 2847 0.0626 + proc_main 1028 3000178 28247 0.9415 1022 0.0225 + proc_msgq 1028 2293677 1352 0.0589 545 0.0120 + proc_status 1028 5258029 1744 0.0332 442 0.0097 + message_pre_alloc_lock 8 2009322 147 0.0073 82 0.0018 + timeofday 1 48616 9 0.0185 13 0.0003 + gc_info 1 7455 12 0.1610 9 0.0002 +ok +</pre> +<p> The process locks are sorted after its class like all other locks. It is convenient to look at specific processes and ports as classes. We can do this by swapping class and class identifiers with <c>lcnt:swap_pid_keys/0</c>. </p> +<pre> +3> lcnt:swap_pid_keys(). +ok +4> lcnt:conflicts([{print, [name, tries, ratio, time]}]). + lock #tries collisions [%] time [us] + ----- ------- --------------- ---------- + alcu_allocator 4117913 4.4462 234232 + run_queue 2050398 0.1854 6700 + pix_lock 4007080 0.1234 2847 + message_pre_alloc_lock 2009322 0.0073 82 + <[email protected]> 13493 1.4452 41 + <[email protected]> 13504 1.1404 36 + <[email protected]> 13181 1.6235 35 + <[email protected]> 13534 0.8202 22 + <[email protected]> 8744 5.8326 22 + <[email protected]> 13335 1.1174 19 + <[email protected]> 13452 1.3678 19 + <[email protected]> 13497 1.8745 18 + <[email protected]> 11009 2.5343 18 + <[email protected]> 13131 1.2566 16 + <[email protected]> 13216 1.7327 15 + <[email protected]> 13156 1.1098 15 + <[email protected]> 13420 0.7303 14 + <[email protected]> 13141 1.6437 14 + <[email protected]> 13346 1.2064 13 + <[email protected]> 13076 1.1701 13 +ok +</pre> + </section> + <section> + <title> Example with Mnesia Transaction Benchmark </title> + <p>From the Erlang shell:</p> +<pre> +Erlang R13B03 (erts-5.7.4) [source] [smp:8:8] [rq:8] [async-threads:0] [hipe] + [kernel-poll:false] [lock-counting] + +Eshell V5.7.4 (abort with ^G) +1> Conf=[{db_nodes, [node()]}, {driver_nodes, [node()]}, {replica_nodes, [node()]}, + {n_drivers_per_node, 10}, {n_branches, 1000}, {n_accounts_per_branch, 10}, + {replica_type, ram_copies}, {stop_after, 60000}, {reuse_history_id, true}]. +[{db_nodes,[nonode@nohost]}, + {driver_nodes,[nonode@nohost]}, + {replica_nodes,[nonode@nohost]}, + {n_drivers_per_node,10}, + {n_branches,1000}, + {n_accounts_per_branch,10}, + {replica_type,ram_copies}, + {stop_after,60000}, + {reuse_history_id,true}] +2> mnesia_tpcb:init([{use_running_mnesia, false}|Conf]). +ignore +</pre> +<p>Initial configuring of the benchmark is done. It is time to profile the actual benchmark and Mnesia</p> +<pre> +3> lcnt:apply(fun() -> {ok,{time, Tps,_,_,_,_}} = mnesia_tpcb:run([{use_running_mnesia, + true}|Conf]), Tps/60 end). +12037.483333333334 +ok +4> lcnt:swap_pid_keys(). +ok +</pre> +<p>The <c>id</c> header represents the number of unique identifiers under a class when the option <c>{combine, true}</c> is used (which is on by default). It will otherwise show the specific identifier. +The <c>db_tab</c> listing shows 722287 unique locks, it is one for each ets-table created and Mnesia creates one for each transaction. +</p> +<pre> +5> lcnt:conflicts(). + lock id #tries #collisions collisions [%] time [us] duration [%] + ----- --- ------- ------------ --------------- ---------- ------------- + alcu_allocator 50 56355118 732662 1.3001 2934747 4.8862 + db_tab 722287 94513441 63203 0.0669 1958797 3.2613 + timeofday 1 2701048 175854 6.5106 1746079 2.9071 + pix_lock 256 24306168 163214 0.6715 918309 1.5289 + run_queue 8 11813811 152637 1.2920 357040 0.5945 + message_pre_alloc_lock 8 17671449 57203 0.3237 263043 0.4380 + mnesia_locker 4 17477633 1618548 9.2607 97092 0.1617 + mnesia_tm 4 9891408 463788 4.6888 86353 0.1438 + gc_info 1 823460 628 0.0763 24826 0.0413 + meta_main_tab_slot 16 41393400 7193 0.0174 11393 0.0190 + <[email protected]> 4 4331412 333 0.0077 7148 0.0119 + timer_wheel 1 203185 30 0.0148 3108 0.0052 + <[email protected]> 4 4291098 210 0.0049 885 0.0015 + <[email protected]> 4 4294702 288 0.0067 442 0.0007 + <[email protected]> 4 4346066 235 0.0054 390 0.0006 + <[email protected]> 4 4348159 287 0.0066 379 0.0006 + <[email protected]> 4 4279309 290 0.0068 325 0.0005 + <[email protected]> 4 4292190 302 0.0070 315 0.0005 + <[email protected]> 4 4208858 265 0.0063 276 0.0005 + <[email protected]> 4 4377502 267 0.0061 276 0.0005 +ok +</pre> +<p>The listing shows <c>mnesia_locker</c>, a process, has highly contended locks.</p> +<pre> +6> lcnt:inspect(mnesia_locker). + lock id #tries #collisions collisions [%] time [us] duration [%] + ----- --- ------- ------------ --------------- ---------- ------------- + mnesia_locker proc_msgq 5449930 59374 1.0894 69781 0.1162 + mnesia_locker proc_main 4462782 1487374 33.3284 14398 0.0240 + mnesia_locker proc_status 7564921 71800 0.9491 12913 0.0215 + mnesia_locker proc_link 0 0 0.0000 0 0.0000 +ok +</pre> +<p>Listing without class combiner.</p> + +<pre> +7> lcnt:conflicts([{combine, false}, {print, [name, id, tries, ratio, time]}]). + lock id #tries collisions [%] time [us] + ----- --- ------- --------------- ---------- + db_tab mnesia_transient_decision 722250 3.9463 1856852 + timeofday undefined 2701048 6.5106 1746079 + alcu_allocator ets_alloc 7490696 2.2737 692655 + alcu_allocator ets_alloc 7081771 2.3294 664522 + alcu_allocator ets_alloc 7047750 2.2520 658495 + alcu_allocator ets_alloc 5883537 2.3177 610869 + pix_lock 58 11011355 1.1924 564808 + pix_lock 60 4426484 0.7120 262490 + alcu_allocator ets_alloc 1897004 2.4248 219543 + message_pre_alloc_lock undefined 4211267 0.3242 128299 + run_queue 3 2801555 1.3003 116792 + run_queue 2 2799988 1.2700 100091 + run_queue 1 2966183 1.2712 78834 + mnesia_locker proc_msgq 5449930 1.0894 69781 + message_pre_alloc_lock undefined 3495672 0.3262 65773 + message_pre_alloc_lock undefined 4189752 0.3174 58607 + mnesia_tm proc_msgq 2094144 1.7184 56361 + run_queue 4 2343585 1.3115 44300 + db_tab branch 1446529 0.5229 38244 + gc_info undefined 823460 0.0763 24826 +ok +</pre> +<p> +In this scenario the lock that protects ets-table <c>mnesia_transient_decision</c> has spent most of its waiting for. That is 1.8 seconds in a test that run for 60 seconds. The time is also spread on eight different scheduler threads. +</p> +<pre> +8> lcnt:inspect(db_tab, [{print, [name, id, tries, colls, ratio, duration]}]). + lock id #tries #collisions collisions [%] duration [%] + ----- --- ------- ------------ --------------- ------------- + db_tab mnesia_transient_decision 722250 28502 3.9463 3.0916 + db_tab branch 1446529 7564 0.5229 0.0637 + db_tab account 1464500 8203 0.5601 0.0357 + db_tab teller 1464529 8110 0.5538 0.0291 + db_tab history 722250 3767 0.5216 0.0232 + db_tab mnesia_stats 750332 7057 0.9405 0.0180 + db_tab mnesia_trans_store 61 0 0.0000 0.0000 + db_tab mnesia_trans_store 61 0 0.0000 0.0000 + db_tab mnesia_trans_store 53 0 0.0000 0.0000 + db_tab mnesia_trans_store 53 0 0.0000 0.0000 + db_tab mnesia_trans_store 53 0 0.0000 0.0000 + db_tab mnesia_trans_store 53 0 0.0000 0.0000 + db_tab mnesia_trans_store 53 0 0.0000 0.0000 + db_tab mnesia_trans_store 53 0 0.0000 0.0000 + db_tab mnesia_trans_store 53 0 0.0000 0.0000 + db_tab mnesia_trans_store 53 0 0.0000 0.0000 + db_tab mnesia_trans_store 53 0 0.0000 0.0000 + db_tab mnesia_trans_store 53 0 0.0000 0.0000 + db_tab mnesia_trans_store 53 0 0.0000 0.0000 + db_tab mnesia_trans_store 53 0 0.0000 0.0000 +ok +</pre> + + </section> + <section> + <title> Deciphering the output </title> + <p> Typically high <c>time</c> values are bad and this is often the thing to look for. However, one should also look for high lock acquisition frequencies (#tries) since locks generate overhead and because high frequency could become problematic if they begin to have conflicts even if it is not shown in a particular test. </p> + </section> + + <section> + <title>See Also</title> + <p> <seealso marker="lcnt">LCNT Reference Manual</seealso></p> + </section> +</chapter> |