<?xml version="1.0" encoding="latin1" ?>
<!DOCTYPE chapter SYSTEM "chapter.dtd">
<chapter>
<header>
<copyright>
<year>2001</year><year>2009</year>
<holder>Ericsson AB. All Rights Reserved.</holder>
</copyright>
<legalnotice>
The contents of this file are subject to the Erlang Public License,
Version 1.1, (the "License"); you may not use this file except in
compliance with the License. You should have received a copy of the
Erlang Public License along with this software. If not, it can be
retrieved online at http://www.erlang.org/.
Software distributed under the License is distributed on an "AS IS"
basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
the License for the specific language governing rights and limitations
under the License.
</legalnotice>
<title>Processes</title>
<prepared>Bjorn Gustavsson</prepared>
<docno></docno>
<date>2007-11-21</date>
<rev></rev>
<file>processes.xml</file>
</header>
<section>
<title>Creation of an Erlang process</title>
<p>An Erlang process is lightweight compared to operating
systems threads and processes.</p>
<p>A newly spawned Erlang process uses 309 words of memory
in the non-SMP emulator without HiPE support. (SMP support
and HiPE support will both add to this size.) The size can
be found out like this:</p>
<pre>
Erlang (BEAM) emulator version 5.6 [async-threads:0] [kernel-poll:false]
Eshell V5.6 (abort with ^G)
1> <input>Fun = fun() -> receive after infinity -> ok end end.</input>
#Fun<...>
2> <input>{_,Bytes} = process_info(spawn(Fun), memory).</input>
{memory,1232}
3> <input>Bytes div erlang:system_info(wordsize).</input>
309</pre>
<p>The size includes 233 words for the heap area (which includes the stack).
The garbage collector will increase the heap as needed.</p>
<p>The main (outer) loop for a process <em>must</em> be tail-recursive.
If not, the stack will grow until the process terminates.</p>
<p><em>DO NOT</em></p>
<code type="erl">
loop() ->
receive
{sys, Msg} ->
handle_sys_msg(Msg),
loop();
{From, Msg} ->
Reply = handle_msg(Msg),
From ! Reply,
loop()
end,
io:format("Message is processed~n", []).</code>
<p>The call to <c>io:format/2</c> will never be executed, but a
return address will still be pushed to the stack each time
<c>loop/0</c> is called recursively. The correct tail-recursive
version of the function looks like this:</p>
<p><em>DO</em></p>
<code type="erl">
loop() ->
receive
{sys, Msg} ->
handle_sys_msg(Msg),
loop();
{From, Msg} ->
Reply = handle_msg(Msg),
From ! Reply,
loop()
end.</code>
<section>
<title>Initial heap size</title>
<p>The default initial heap size of 233 words is quite conservative
in order to support Erlang systems with hundreds of thousands or
even millions of processes. The garbage collector will grow and
shrink the heap as needed.</p>
<p>In a system that use comparatively few processes, performance
<em>might</em> be improved by increasing the minimum heap size using either
the <c>+h</c> option for
<seealso marker="erts:erl">erl</seealso> or on a process-per-process
basis using the <c>min_heap_size</c> option for
<seealso marker="erts:erlang#spawn_opt/4">spawn_opt/4</seealso>.</p>
<p>The gain is twofold: Firstly, although the garbage collector will
grow the heap, it will it grow it step by step, which will be more
costly than directly establishing a larger heap when the process
is spawned. Secondly, the garbage collector may also shrink the
heap if it is much larger than the amount of data stored on it;
setting the minimum heap size will prevent that.</p>
<warning><p>The emulator will probably use more memory, and because garbage
collections occur less frequently, huge binaries could be
kept much longer.</p></warning>
<p>In systems with many processes, computation tasks that run
for a short time could be spawned off into a new process with
a higher minimum heap size. When the process is done, it will
send the result of the computation to another process and terminate.
If the minimum heap size is calculated properly, the process may not
have to do any garbage collections at all.
<em>This optimization should not be attempted
without proper measurements.</em></p>
</section>
</section>
<section>
<title>Process messages</title>
<p>All data in messages between Erlang processes is copied, with
the exception of
<seealso marker="binaryhandling#refc_binary">refc binaries</seealso>
on the same Erlang node.</p>
<p>When a message is sent to a process on another Erlang node,
it will first be encoded to the Erlang External Format before
being sent via an TCP/IP socket. The receiving Erlang node decodes
the message and distributes it to the right process.</p>
<section>
<title>The constant pool</title>
<p>Constant Erlang terms (also called <em>literals</em>) are now
kept in constant pools; each loaded module has its own pool.
The following function</p>
<p><em>DO</em> (in R12B and later)</p>
<code type="erl">
days_in_month(M) ->
element(M, {31,28,31,30,31,30,31,31,30,31,30,31}).</code>
<p>will no longer build the tuple every time it is called (only
to have it discarded the next time the garbage collector was run), but
the tuple will be located in the module's constant pool.</p>
<p>But if a constant is sent to another process (or stored in
an ETS table), it will be <em>copied</em>.
The reason is that the run-time system must be able
to keep track of all references to constants in order to properly
unload code containing constants. (When the code is unloaded,
the constants will be copied to the heap of the processes that refer
to them.) The copying of constants might be eliminated in a future
release.</p>
</section>
<section>
<title>Loss of sharing</title>
<p>Shared sub-terms are <em>not</em> preserved when a term is sent
to another process, passed as the initial process arguments in
the <c>spawn</c> call, or stored in an ETS table.
That is an optimization. Most applications do not send message
with shared sub-terms.</p>
<p>Here is an example of how a shared sub-term can be created:</p>
<code type="erl">
kilo_byte() ->
kilo_byte(10, [42]).
kilo_byte(0, Acc) ->
Acc;
kilo_byte(N, Acc) ->
kilo_byte(N-1, [Acc|Acc]).</code>
<p><c>kilo_byte/1</c> creates a deep list. If we call
<c>list_to_binary/1</c>, we can convert the deep list to a binary
of 1024 bytes:</p>
<pre>
1> <input>byte_size(list_to_binary(efficiency_guide:kilo_byte())).</input>
1024</pre>
<p>Using the <c>erts_debug:size/1</c> BIF we can see that the
deep list only requires 22 words of heap space:</p>
<pre>
2> <input>erts_debug:size(efficiency_guide:kilo_byte()).</input>
22</pre>
<p>Using the <c>erts_debug:flat_size/1</c> BIF, we can calculate
the size of the deep list if sharing is ignored. It will be
the size of the list when it has been sent to another process
or stored in an ETS table:</p>
<pre>
3> <input>erts_debug:flat_size(efficiency_guide:kilo_byte()).</input>
4094</pre>
<p>We can verify that sharing will be lost if we insert the
data into an ETS table:</p>
<pre>
4> <input>T = ets:new(tab, []).</input>
17
5> <input>ets:insert(T, {key,efficiency_guide:kilo_byte()}).</input>
true
6> <input>erts_debug:size(element(2, hd(ets:lookup(T, key)))).</input>
4094
7> <input>erts_debug:flat_size(element(2, hd(ets:lookup(T, key)))).</input>
4094</pre>
<p>When the data has passed through an ETS table,
<c>erts_debug:size/1</c> and <c>erts_debug:flat_size/1</c>
return the same value. Sharing has been lost.</p>
<p>In a future release of Erlang/OTP, we might implement a
way to (optionally) preserve sharing. We have no plans to make
preserving of sharing the default behaviour, since that would
penalize the vast majority of Erlang applications.</p>
</section>
</section>
<section>
<title>The SMP emulator</title>
<p>The SMP emulator (introduced in R11B) will take advantage of
multi-core or multi-CPU computer by running several Erlang schedulers
threads (typically, the same as the number of cores). Each scheduler
thread schedules Erlang processes in the same way as the Erlang scheduler
in the non-SMP emulator.</p>
<p>To gain performance by using the SMP emulator, your application
<em>must have more than one runnable Erlang process</em> most of the time.
Otherwise, the Erlang emulator can still only run one Erlang process
at the time, but you must still pay the overhead for locking. Although
we try to reduce the locking overhead as much as possible, it will never
become exactly zero.</p>
<p>Benchmarks that may seem to be concurrent are often sequential.
The estone benchmark, for instance, is entirely sequential. So is also
the most common implementation of the "ring benchmark"; usually one process
is active, while the others wait in a <c>receive</c> statement.</p>
<p>The <seealso marker="percept:percept">percept</seealso> application
can be used to profile your application to see how much potential (or lack
thereof) it has for concurrency.</p>
</section>
</chapter>