20012009 Ericsson AB. All Rights Reserved. The contents of this file are subject to the Erlang Public License, Version 1.1, (the "License"); you may not use this file except in compliance with the License. You should have received a copy of the Erlang Public License along with this software. If not, it can be retrieved online at http://www.erlang.org/. Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. Processes Bjorn Gustavsson 2007-11-21 processes.xml
Creation of an Erlang process

An Erlang process is lightweight compared to operating systems threads and processes.

A newly spawned Erlang process uses 309 words of memory in the non-SMP emulator without HiPE support. (SMP support and HiPE support will both add to this size.) The size can be found out like this:

Erlang (BEAM) emulator version 5.6 [async-threads:0] [kernel-poll:false]

Eshell V5.6  (abort with ^G)
1> Fun = fun() -> receive after infinity -> ok end end.
#Fun<...>
2> {_,Bytes} = process_info(spawn(Fun), memory).
{memory,1232}
3> Bytes div erlang:system_info(wordsize).
309

The size includes 233 words for the heap area (which includes the stack). The garbage collector will increase the heap as needed.

The main (outer) loop for a process must be tail-recursive. If not, the stack will grow until the process terminates.

DO NOT

loop() -> receive {sys, Msg} -> handle_sys_msg(Msg), loop(); {From, Msg} -> Reply = handle_msg(Msg), From ! Reply, loop() end, io:format("Message is processed~n", []).

The call to io:format/2 will never be executed, but a return address will still be pushed to the stack each time loop/0 is called recursively. The correct tail-recursive version of the function looks like this:

DO

loop() -> receive {sys, Msg} -> handle_sys_msg(Msg), loop(); {From, Msg} -> Reply = handle_msg(Msg), From ! Reply, loop() end.
Initial heap size

The default initial heap size of 233 words is quite conservative in order to support Erlang systems with hundreds of thousands or even millions of processes. The garbage collector will grow and shrink the heap as needed.

In a system that use comparatively few processes, performance might be improved by increasing the minimum heap size using either the +h option for erl or on a process-per-process basis using the min_heap_size option for spawn_opt/4.

The gain is twofold: Firstly, although the garbage collector will grow the heap, it will it grow it step by step, which will be more costly than directly establishing a larger heap when the process is spawned. Secondly, the garbage collector may also shrink the heap if it is much larger than the amount of data stored on it; setting the minimum heap size will prevent that.

The emulator will probably use more memory, and because garbage collections occur less frequently, huge binaries could be kept much longer.

In systems with many processes, computation tasks that run for a short time could be spawned off into a new process with a higher minimum heap size. When the process is done, it will send the result of the computation to another process and terminate. If the minimum heap size is calculated properly, the process may not have to do any garbage collections at all. This optimization should not be attempted without proper measurements.

Process messages

All data in messages between Erlang processes is copied, with the exception of refc binaries on the same Erlang node.

When a message is sent to a process on another Erlang node, it will first be encoded to the Erlang External Format before being sent via an TCP/IP socket. The receiving Erlang node decodes the message and distributes it to the right process.

The constant pool

Constant Erlang terms (also called literals) are now kept in constant pools; each loaded module has its own pool. The following function

DO (in R12B and later)

days_in_month(M) -> element(M, {31,28,31,30,31,30,31,31,30,31,30,31}).

will no longer build the tuple every time it is called (only to have it discarded the next time the garbage collector was run), but the tuple will be located in the module's constant pool.

But if a constant is sent to another process (or stored in an ETS table), it will be copied. The reason is that the run-time system must be able to keep track of all references to constants in order to properly unload code containing constants. (When the code is unloaded, the constants will be copied to the heap of the processes that refer to them.) The copying of constants might be eliminated in a future release.

Loss of sharing

Shared sub-terms are not preserved when a term is sent to another process, passed as the initial process arguments in the spawn call, or stored in an ETS table. That is an optimization. Most applications do not send message with shared sub-terms.

Here is an example of how a shared sub-term can be created:

kilo_byte() -> kilo_byte(10, [42]). kilo_byte(0, Acc) -> Acc; kilo_byte(N, Acc) -> kilo_byte(N-1, [Acc|Acc]).

kilo_byte/1 creates a deep list. If we call list_to_binary/1, we can convert the deep list to a binary of 1024 bytes:

1> byte_size(list_to_binary(efficiency_guide:kilo_byte())).
1024

Using the erts_debug:size/1 BIF we can see that the deep list only requires 22 words of heap space:

2> erts_debug:size(efficiency_guide:kilo_byte()).
22

Using the erts_debug:flat_size/1 BIF, we can calculate the size of the deep list if sharing is ignored. It will be the size of the list when it has been sent to another process or stored in an ETS table:

3> erts_debug:flat_size(efficiency_guide:kilo_byte()).
4094

We can verify that sharing will be lost if we insert the data into an ETS table:

4> T = ets:new(tab, []).
17
5> ets:insert(T, {key,efficiency_guide:kilo_byte()}).
true
6> erts_debug:size(element(2, hd(ets:lookup(T, key)))).
4094
7> erts_debug:flat_size(element(2, hd(ets:lookup(T, key)))).
4094

When the data has passed through an ETS table, erts_debug:size/1 and erts_debug:flat_size/1 return the same value. Sharing has been lost.

In a future release of Erlang/OTP, we might implement a way to (optionally) preserve sharing. We have no plans to make preserving of sharing the default behaviour, since that would penalize the vast majority of Erlang applications.

The SMP emulator

The SMP emulator (introduced in R11B) will take advantage of multi-core or multi-CPU computer by running several Erlang schedulers threads (typically, the same as the number of cores). Each scheduler thread schedules Erlang processes in the same way as the Erlang scheduler in the non-SMP emulator.

To gain performance by using the SMP emulator, your application must have more than one runnable Erlang process most of the time. Otherwise, the Erlang emulator can still only run one Erlang process at the time, but you must still pay the overhead for locking. Although we try to reduce the locking overhead as much as possible, it will never become exactly zero.

Benchmarks that may seem to be concurrent are often sequential. The estone benchmark, for instance, is entirely sequential. So is also the most common implementation of the "ring benchmark"; usually one process is active, while the others wait in a receive statement.

The percept application can be used to profile your application to see how much potential (or lack thereof) it has for concurrency.