system/doc/efficiency_guide/processes.xml


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264

<?xml version="1.0" encoding="latin1" ?>
<!DOCTYPE chapter SYSTEM "chapter.dtd">

<chapter>
  <header>
    <copyright>
      <year>2001</year><year>2009</year>
      <holder>Ericsson AB. All Rights Reserved.</holder>
    </copyright>
    <legalnotice>
      The contents of this file are subject to the Erlang Public License,
      Version 1.1, (the "License"); you may not use this file except in
      compliance with the License. You should have received a copy of the
      Erlang Public License along with this software. If not, it can be
      retrieved online at http://www.erlang.org/.
    
      Software distributed under the License is distributed on an "AS IS"
      basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
      the License for the specific language governing rights and limitations
      under the License.
    
    </legalnotice>

    <title>Processes</title>
    <prepared>Bjorn Gustavsson</prepared>
    <docno></docno>
    <date>2007-11-21</date>
    <rev></rev>
    <file>processes.xml</file>
  </header>

  <section>
    <title>Creation of an Erlang process</title>

    <p>An Erlang process is lightweight compared to operating
    systems threads and processes.</p>

    <p>A newly spawned Erlang process uses 309 words of memory
    in the non-SMP emulator without HiPE support. (SMP support
    and HiPE support will both add to this size.) The size can
    be found out like this:</p>

    <pre>
Erlang (BEAM) emulator version 5.6 [async-threads:0] [kernel-poll:false]

Eshell V5.6  (abort with ^G)
1> <input>Fun = fun() -> receive after infinity -> ok end end.</input>
#Fun&lt;...>
2> <input>{_,Bytes} = process_info(spawn(Fun), memory).</input>
{memory,1232}
3> <input>Bytes div erlang:system_info(wordsize).</input>
309</pre>
    
    <p>The size includes 233 words for the heap area (which includes the stack).
    The garbage collector will increase the heap as needed.</p>

    <p>The main (outer) loop for a process <em>must</em> be tail-recursive.
    If not, the stack will grow until the process terminates.</p>

    <p><em>DO NOT</em></p>
    <code type="erl">
loop() -> 
  receive
     {sys, Msg} ->
         handle_sys_msg(Msg),
         loop();
     {From, Msg} ->
          Reply = handle_msg(Msg),
          From ! Reply,
          loop()
  end,
  io:format("Message is processed~n", []).</code>

    <p>The call to <c>io:format/2</c> will never be executed, but a
    return address will still be pushed to the stack each time
    <c>loop/0</c> is called recursively. The correct tail-recursive
    version of the function looks like this:</p>

    <p><em>DO</em></p>
<code type="erl">
   loop() -> 
      receive
         {sys, Msg} ->
            handle_sys_msg(Msg),
            loop();
         {From, Msg} ->
            Reply = handle_msg(Msg),
            From ! Reply,
            loop()
    end.</code>

    <section>
      <title>Initial heap size</title>

      <p>The default initial heap size of 233 words is quite conservative
      in order to support Erlang systems with hundreds of thousands or
      even millions of processes. The garbage collector will grow and
      shrink the heap as needed.</p>

      <p>In a system that use comparatively few processes, performance
      <em>might</em> be improved by increasing the minimum heap size using either
      the <c>+h</c> option for
      <seealso marker="erts:erl">erl</seealso> or on a process-per-process
      basis using the <c>min_heap_size</c> option for
      <seealso marker="erts:erlang#spawn_opt/4">spawn_opt/4</seealso>.</p>

      <p>The gain is twofold: Firstly, although the garbage collector will
      grow the heap, it will grow it step by step, which will be more
      costly than directly establishing a larger heap when the process
      is spawned. Secondly, the garbage collector may also shrink the
      heap if it is much larger than the amount of data stored on it;
      setting the minimum heap size will prevent that.</p>

      <warning><p>The emulator will probably use more memory, and because garbage
      collections occur less frequently, huge binaries could be
      kept much longer.</p></warning>

      <p>In systems with many processes, computation tasks that run
      for a short time could be spawned off into a new process with
      a higher minimum heap size. When the process is done, it will
      send the result of the computation to another process and terminate.
      If the minimum heap size is calculated properly, the process may not
      have to do any garbage collections at all.
      <em>This optimization should not be attempted
      without proper measurements.</em></p>
    </section>

  </section>

  <section>
    <title>Process messages</title>

    <p>All data in messages between Erlang processes is copied, with
      the exception of
      <seealso marker="binaryhandling#refc_binary">refc binaries</seealso>
      on the same Erlang node.</p>

    <p>When a message is sent to a process on another Erlang node,
      it will first be encoded to the Erlang External Format before
      being sent via an TCP/IP socket. The receiving Erlang node decodes
      the message and distributes it to the right process.</p>

    <section>
      <title>The constant pool</title>

      <p>Constant Erlang terms (also called <em>literals</em>) are now
      kept in constant pools; each loaded module has its own pool.
      The following function</p>

    <p><em>DO</em> (in R12B and later)</p>
      <code type="erl">
days_in_month(M) ->
    element(M, {31,28,31,30,31,30,31,31,30,31,30,31}).</code>     

      <p>will no longer build the tuple every time it is called (only
      to have it discarded the next time the garbage collector was run), but
      the tuple will be located in the module's constant pool.</p>

      <p>But if a constant is sent to another process (or stored in
      an ETS table), it will be <em>copied</em>.
      The reason is that the run-time system must be able
      to keep track of all references to constants in order to properly
      unload code containing constants. (When the code is unloaded,
      the constants will be copied to the heap of the processes that refer
      to them.) The copying of constants might be eliminated in a future
      release.</p>
    </section>

    <section>
      <title>Loss of sharing</title>

      <p>Shared sub-terms are <em>not</em> preserved when a term is sent
      to another process, passed as the initial process arguments in
      the <c>spawn</c> call, or stored in an ETS table.
      That is an optimization. Most applications do not send messages
      with shared sub-terms.</p>

      <p>Here is an example of how a shared sub-term can be created:</p>

      <code type="erl">
kilo_byte() ->
    kilo_byte(10, [42]).

kilo_byte(0, Acc) ->
    Acc;
kilo_byte(N, Acc) ->
    kilo_byte(N-1, [Acc|Acc]).</code>

       <p><c>kilo_byte/1</c> creates a deep list. If we call
       <c>list_to_binary/1</c>, we can convert the deep list to a binary
       of 1024 bytes:</p>

      <pre>
1> <input>byte_size(list_to_binary(efficiency_guide:kilo_byte())).</input>
1024</pre>

       <p>Using the <c>erts_debug:size/1</c> BIF we can see that the
       deep list only requires 22 words of heap space:</p>

      <pre>
2> <input>erts_debug:size(efficiency_guide:kilo_byte()).</input>
22</pre>

       <p>Using the <c>erts_debug:flat_size/1</c> BIF, we can calculate
       the size of the deep list if sharing is ignored. It will be
       the size of the list when it has been sent to another process
       or stored in an ETS table:</p>

      <pre>
3> <input>erts_debug:flat_size(efficiency_guide:kilo_byte()).</input>
4094</pre>

      <p>We can verify that sharing will be lost if we insert the
      data into an ETS table:</p>

      <pre>
4> <input>T = ets:new(tab, []).</input>
17
5> <input>ets:insert(T, {key,efficiency_guide:kilo_byte()}).</input>
true
6> <input>erts_debug:size(element(2, hd(ets:lookup(T, key)))).</input>
4094
7> <input>erts_debug:flat_size(element(2, hd(ets:lookup(T, key)))).</input>
4094</pre>

      <p>When the data has passed through an ETS table,
      <c>erts_debug:size/1</c> and <c>erts_debug:flat_size/1</c>
      return the same value. Sharing has been lost.</p>

      <p>In a future release of Erlang/OTP, we might implement a
      way to (optionally) preserve sharing. We have no plans to make
      preserving of sharing the default behaviour, since that would
      penalize the vast majority of Erlang applications.</p>
    </section>
  </section>

  <section>
    <title>The SMP emulator</title>

    <p>The SMP emulator (introduced in R11B) will take advantage of a
    multi-core or multi-CPU computer by running several Erlang scheduler
    threads (typically, the same as the number of cores). Each scheduler
    thread schedules Erlang processes in the same way as the Erlang scheduler
    in the non-SMP emulator.</p>

    <p>To gain performance by using the SMP emulator, your application
    <em>must have more than one runnable Erlang process</em> most of the time.
    Otherwise, the Erlang emulator can still only run one Erlang process
    at the time, but you must still pay the overhead for locking. Although
    we try to reduce the locking overhead as much as possible, it will never
    become exactly zero.</p>

    <p>Benchmarks that may seem to be concurrent are often sequential.
    The estone benchmark, for instance, is entirely sequential. So is also
    the most common implementation of the "ring benchmark"; usually one process
    is active, while the others wait in a <c>receive</c> statement.</p>

    <p>The <seealso marker="percept:percept">percept</seealso> application
    can be used to profile your application to see how much potential (or lack
    thereof) it has for concurrency.</p>
  </section>

</chapter>