aboutsummaryrefslogtreecommitdiffstats
path: root/lib/tools/doc/src/lcnt_chapter.xml
blob: 1981d66117f11cf2e63e7e31750700e0b71f8b61 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE chapter SYSTEM "chapter.dtd">

<chapter>
    <header>
	<copyright>
	    <year>2009</year><year>2016</year>
	    <holder>Ericsson AB. All Rights Reserved.</holder>
	</copyright>
	<legalnotice>
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
 
          http://www.apache.org/licenses/LICENSE-2.0

      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License.

	</legalnotice>

	<title>lcnt - The Lock Profiler</title>
	<prepared>Björn-Egil Dahlberg</prepared>
	<responsible>nobody</responsible>
	<docno></docno>
	<approved>nobody</approved>
	<checked>no</checked>
	<date>2009-11-26</date>
	<rev>PA1</rev>
	<file>lcnt_chapter.xml</file>
    </header>
    <p>
	Internally in the Erlang runtime system locks are used to protect resources from being updated from multiple threads in a fatal way. Locks are necessary
	to ensure that the runtime system works properly but it also introduces a couple of limitations. Lock contention and locking overhead.
    </p>
    <p>
	With lock contention we mean when one thread locks a resource and another thread, or threads, tries to acquire the same resource at the same time. The lock will deny
	the other thread access to the resource and the thread will be blocked from continuing its execution. The second thread has to wait until the first thread has
	completed its access to the resource and unlocked it. The <c>lcnt</c> tool measures these lock conflicts.
    </p>
    <p>
	Locks have an inherent cost in execution time and memory space. It takes time initialize, destroy, aquiring or releasing locks. To decrease lock contention it
	some times necessary to use finer grained locking strategies. This will usually also increase the locking overhead and hence there is a tradeoff
	between lock contention and overhead. In general, lock contention increases with the number of threads running concurrently. The <c>lcnt</c> tool does not measure locking overhead.
    </p>
    <section>
	<title> Enabling lock-counting </title>
	<p>For investigation of locks in the emulator we use an internal tool called <c>lcnt</c> (short for lock-count). The VM needs to be compiled with this option enabled.
	To compile a lock-counting VM along with a normal VM, use:</p>

	<pre>
cd $ERL_TOP
./configure --enable-lock-counter</pre>
	<p>Start the lock-counting VM like this:</p>
	<pre>
$ERL_TOP/bin/erl -emu_type lcnt</pre>
	<p>To verify that lock counting is enabled check that <c>[lock-counting]</c> appears in the status text when the VM is started.</p>
	<pre>
Erlang/OTP 20 [erts-9.0] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:10] [hipe]
 [kernel-poll:false] [lock-counting]</pre>
    </section>
    <section>
	<title>Getting started</title>
	<p>Once you have a lock counting enabled VM the module <c>lcnt</c> can be used. The module is intended to be used from the current running nodes shell. To access remote nodes use <c>lcnt:clear(Node)</c> and <c>lcnt:collect(Node)</c>. </p>

	<p>All locks are continuously monitored and its statistics updated. Use <c>lcnt:clear/0</c> to initially clear all counters before running any specific tests. This command will also reset the duration timer internally.</p>
	<p>To retrieve lock statistics information, use <c>lcnt:collect/0,1</c>. The collect operation will start a <c>lcnt</c> server if it not already started. All collected data will be built into an Erlang term and uploaded to the server and a duration time will also be uploaded. This duration is the time between <c>lcnt:clear/0,1</c> and <c>lcnt:collect/0,1</c>.</p>

	<p>Once the data is collected to the server it can be filtered, sorted and printed in many different ways.</p>
	<p>See the <seealso marker="lcnt">reference manual</seealso> for a description of each function.</p>
    </section>
    <section>
	<title> Example of usage </title>
	<p>From the Erlang shell:</p>
	<pre>
Erlang R13B03 (erts-5.7.4) [source] [smp:8:8] [rq:8] [async-threads:0] [hipe]
 [kernel-poll:false] [lock-counting]
1> lcnt:rt_opt({copy_save, true}).
false
2> lcnt:clear(), big:bang(1000), lcnt:collect().
ok
3> lcnt:conflicts().
                   lock   id  #tries  #collisions  collisions [%]  time [us]  duration [%]
                  -----  --- ------- ------------ --------------- ---------- -------------
         alcu_allocator   50 4113692       158921          3.8632     215464        4.4962
               pix_lock  256 4007140         4882          0.1218      12221        0.2550
              run_queue    8 2287246         6949          0.3038       9825        0.2050
              proc_main 1029 3115778        25755          0.8266       1199        0.0250
              proc_msgq 1029 2467022         1910          0.0774       1048        0.0219
            proc_status 1029 5708439         2435          0.0427        706        0.0147
 message_pre_alloc_lock    8 2008569          134          0.0067         90        0.0019
              timeofday    1   54065            8          0.0148         22        0.0005
                gc_info    1    7071            7          0.0990          5        0.0001
ok
</pre>
<p>
    Another way to to profile a specific function is to use <c>lcnt:apply/3</c> or <c>lcnt:apply/1</c> which does <c>lcnt:clear/0</c> before the function and <c>lcnt:collect/0</c> after its invocation.
    It also sets <c>copy_save</c> to <c>true</c> for the duration of the function call
</p>
<pre>
Erlang R13B03 (erts-5.7.4) [source] [smp:8:8] [rq:8] [async-threads:0] [hipe]
 [kernel-poll:false] [lock-counting]
1> lcnt:apply(fun() -> big:bang(1000) end).
4384.338
2> lcnt:conflicts().
                   lock   id  #tries  #collisions  collisions [%]  time [us]  duration [%]
                  -----  --- ------- ------------ --------------- ---------- -------------
         alcu_allocator   50 4117913       183091          4.4462     234232        5.1490
              run_queue    8 2050398         3801          0.1854       6700        0.1473
               pix_lock  256 4007080         4943          0.1234       2847        0.0626
              proc_main 1028 3000178        28247          0.9415       1022        0.0225
              proc_msgq 1028 2293677         1352          0.0589        545        0.0120
            proc_status 1028 5258029         1744          0.0332        442        0.0097
 message_pre_alloc_lock    8 2009322          147          0.0073         82        0.0018
              timeofday    1   48616            9          0.0185         13        0.0003
                gc_info    1    7455           12          0.1610          9        0.0002
ok
</pre>
<p> The process locks are sorted after its class like all other locks. It is convenient to look at specific processes and ports as classes. We can do this by swapping class and class identifiers with <c>lcnt:swap_pid_keys/0</c>.  </p>
<pre>
3> lcnt:swap_pid_keys().
ok
4> lcnt:conflicts([{print, [name, tries, ratio, time]}]).
                   lock  #tries  collisions [%]  time [us]
                  ----- ------- --------------- ----------
         alcu_allocator 4117913          4.4462     234232
              run_queue 2050398          0.1854       6700
               pix_lock 4007080          0.1234       2847
 message_pre_alloc_lock 2009322          0.0073         82
  &lt;[email protected]&gt;   13493          1.4452         41
  &lt;[email protected]&gt;   13504          1.1404         36
  &lt;[email protected]&gt;   13181          1.6235         35
  &lt;[email protected]&gt;   13534          0.8202         22
   &lt;[email protected]&gt;    8744          5.8326         22
  &lt;[email protected]&gt;   13335          1.1174         19
  &lt;[email protected]&gt;   13452          1.3678         19
  &lt;[email protected]&gt;   13497          1.8745         18
  &lt;[email protected]&gt;   11009          2.5343         18
  &lt;[email protected]&gt;   13131          1.2566         16
  &lt;[email protected]&gt;   13216          1.7327         15
  &lt;[email protected]&gt;   13156          1.1098         15
  &lt;[email protected]&gt;   13420          0.7303         14
  &lt;[email protected]&gt;   13141          1.6437         14
  &lt;[email protected]&gt;   13346          1.2064         13
  &lt;[email protected]&gt;   13076          1.1701         13
ok
</pre>
      </section>
      <section>
	<title> Example with Mnesia Transaction Benchmark </title>
	<p>From the Erlang shell:</p>
<pre>
Erlang R13B03 (erts-5.7.4) [source] [smp:8:8] [rq:8] [async-threads:0] [hipe]
 [kernel-poll:false] [lock-counting]

Eshell V5.7.4  (abort with ^G)
1> Conf=[{db_nodes, [node()]}, {driver_nodes, [node()]}, {replica_nodes, [node()]},
 {n_drivers_per_node, 10}, {n_branches, 1000}, {n_accounts_per_branch, 10},
 {replica_type, ram_copies}, {stop_after, 60000}, {reuse_history_id, true}].
[{db_nodes,[nonode@nohost]},
 {driver_nodes,[nonode@nohost]},
 {replica_nodes,[nonode@nohost]},
 {n_drivers_per_node,10},
 {n_branches,1000},
 {n_accounts_per_branch,10},
 {replica_type,ram_copies},
 {stop_after,60000},
 {reuse_history_id,true}]
2> mnesia_tpcb:init([{use_running_mnesia, false}|Conf]).
ignore
</pre>
<p>Initial configuring of the benchmark is done. It is time to profile the actual benchmark and Mnesia</p>
<pre>
3> lcnt:apply(fun() -> {ok,{time, Tps,_,_,_,_}} = mnesia_tpcb:run([{use_running_mnesia,
 true}|Conf]), Tps/60 end).
12037.483333333334
ok
4> lcnt:swap_pid_keys().
ok
</pre>
<p>The <c>id</c> header represents the number of unique identifiers under a class when the option <c>{combine, true}</c> is used (which is on by default). It will otherwise show the specific identifier.
The <c>db_tab</c> listing shows 722287 unique locks, it is one for each ets-table created and Mnesia creates one for each transaction.
</p>
<pre>
5> lcnt:conflicts().
                   lock     id   #tries  #collisions  collisions [%]  time [us]  duration [%]
                  -----    ---  ------- ------------ --------------- ---------- -------------
         alcu_allocator     50 56355118       732662          1.3001    2934747        4.8862
                 db_tab 722287 94513441        63203          0.0669    1958797        3.2613
              timeofday      1  2701048       175854          6.5106    1746079        2.9071
               pix_lock    256 24306168       163214          0.6715     918309        1.5289
              run_queue      8 11813811       152637          1.2920     357040        0.5945
 message_pre_alloc_lock      8 17671449        57203          0.3237     263043        0.4380
          mnesia_locker      4 17477633      1618548          9.2607      97092        0.1617
              mnesia_tm      4  9891408       463788          4.6888      86353        0.1438
                gc_info      1   823460          628          0.0763      24826        0.0413
     meta_main_tab_slot     16 41393400         7193          0.0174      11393        0.0190
 &lt;[email protected]&gt;      4  4331412          333          0.0077       7148        0.0119
            timer_wheel      1   203185           30          0.0148       3108        0.0052
 &lt;[email protected]&gt;      4  4291098          210          0.0049        885        0.0015
 &lt;[email protected]&gt;      4  4294702          288          0.0067        442        0.0007
 &lt;[email protected]&gt;      4  4346066          235          0.0054        390        0.0006
 &lt;[email protected]&gt;      4  4348159          287          0.0066        379        0.0006
 &lt;[email protected]&gt;      4  4279309          290          0.0068        325        0.0005
 &lt;[email protected]&gt;      4  4292190          302          0.0070        315        0.0005
 &lt;[email protected]&gt;      4  4208858          265          0.0063        276        0.0005
 &lt;[email protected]&gt;      4  4377502          267          0.0061        276        0.0005
ok
</pre>
<p>The listing shows <c>mnesia_locker</c>, a process, has highly contended locks.</p>
<pre>
6> lcnt:inspect(mnesia_locker).
          lock          id  #tries  #collisions  collisions [%]  time [us]  duration [%]
         -----         --- ------- ------------ --------------- ---------- -------------
 mnesia_locker   proc_msgq 5449930        59374          1.0894      69781        0.1162
 mnesia_locker   proc_main 4462782      1487374         33.3284      14398        0.0240
 mnesia_locker proc_status 7564921        71800          0.9491      12913        0.0215
 mnesia_locker   proc_link       0            0          0.0000          0        0.0000
ok
</pre>
<p>Listing without class combiner.</p>

<pre>
7> lcnt:conflicts([{combine, false}, {print, [name, id, tries, ratio, time]}]).
                   lock                        id   #tries  collisions [%]  time [us]
                  -----                       ---  ------- --------------- ----------
                 db_tab mnesia_transient_decision   722250          3.9463    1856852
              timeofday                 undefined  2701048          6.5106    1746079
         alcu_allocator                 ets_alloc  7490696          2.2737     692655
         alcu_allocator                 ets_alloc  7081771          2.3294     664522
         alcu_allocator                 ets_alloc  7047750          2.2520     658495
         alcu_allocator                 ets_alloc  5883537          2.3177     610869
               pix_lock                        58 11011355          1.1924     564808
               pix_lock                        60  4426484          0.7120     262490
         alcu_allocator                 ets_alloc  1897004          2.4248     219543
 message_pre_alloc_lock                 undefined  4211267          0.3242     128299
              run_queue                         3  2801555          1.3003     116792
              run_queue                         2  2799988          1.2700     100091
              run_queue                         1  2966183          1.2712      78834
          mnesia_locker                 proc_msgq  5449930          1.0894      69781
 message_pre_alloc_lock                 undefined  3495672          0.3262      65773
 message_pre_alloc_lock                 undefined  4189752          0.3174      58607
              mnesia_tm                 proc_msgq  2094144          1.7184      56361
              run_queue                         4  2343585          1.3115      44300
                 db_tab                    branch  1446529          0.5229      38244
                gc_info                 undefined   823460          0.0763      24826
ok
</pre>
<p>
In this scenario the lock that protects ets-table <c>mnesia_transient_decision</c> has spent most of its waiting for. That is 1.8 seconds in a test that run for 60 seconds. The time is also spread on eight different scheduler threads.
</p>
<pre>
8> lcnt:inspect(db_tab, [{print, [name, id, tries, colls, ratio, duration]}]).
   lock                        id  #tries  #collisions  collisions [%]  duration [%]
  -----                       --- ------- ------------ --------------- -------------
 db_tab mnesia_transient_decision  722250        28502          3.9463        3.0916
 db_tab                    branch 1446529         7564          0.5229        0.0637
 db_tab                   account 1464500         8203          0.5601        0.0357
 db_tab                    teller 1464529         8110          0.5538        0.0291
 db_tab                   history  722250         3767          0.5216        0.0232
 db_tab              mnesia_stats  750332         7057          0.9405        0.0180
 db_tab        mnesia_trans_store      61            0          0.0000        0.0000
 db_tab        mnesia_trans_store      61            0          0.0000        0.0000
 db_tab        mnesia_trans_store      53            0          0.0000        0.0000
 db_tab        mnesia_trans_store      53            0          0.0000        0.0000
 db_tab        mnesia_trans_store      53            0          0.0000        0.0000
 db_tab        mnesia_trans_store      53            0          0.0000        0.0000
 db_tab        mnesia_trans_store      53            0          0.0000        0.0000
 db_tab        mnesia_trans_store      53            0          0.0000        0.0000
 db_tab        mnesia_trans_store      53            0          0.0000        0.0000
 db_tab        mnesia_trans_store      53            0          0.0000        0.0000
 db_tab        mnesia_trans_store      53            0          0.0000        0.0000
 db_tab        mnesia_trans_store      53            0          0.0000        0.0000
 db_tab        mnesia_trans_store      53            0          0.0000        0.0000
 db_tab        mnesia_trans_store      53            0          0.0000        0.0000
ok
</pre>

    </section>
    <section>
	<title> Deciphering the output </title>
	<p> Typically high <c>time</c> values are bad and this is often the thing to look for. However, one should also look for high lock acquisition frequencies (#tries) since locks generate overhead and because high frequency could become problematic if they begin to have conflicts even if it is not shown in a particular test.  </p>
    </section>

    <section>
	<title>See Also</title>
	<p> <seealso marker="lcnt">LCNT Reference Manual</seealso></p>
    </section>
</chapter>