Erlang Scalability
18 Feb
I would like to share some experience and theories on Erlang scalability.
This will be in the form of a series of hints, which may or may not be accompanied with explanations as to why things are this way, or how they improve or reduce the scalability of a system. I will try to do my best to avoid giving falsehoods, even if that means a few things won’t be explained.
NIFs
NIFs are considered harmful. I don’t know any single NIF-based library that I would recommend. That doesn’t mean they should all be avoided, just that if you’re going to want your system to scale, you probably shouldn’t use a NIF.
A common case for using NIFs is JSON processing. The problem is that JSON is a highly inefficient data structure (similar in inefficiency to XML, although perhaps not as bad). If you can avoid using JSON, you probably should. MessagePack can replace it in many situations.
Long-running NIFs will take over a scheduler and prevent Erlang from efficiently handling many processes.
Short-running NIFs will still confuse the scheduler if they take more than a few microseconds to run.
Threaded NIFs, or the use of the enif_consume_timeslice
might help allievate this problem, but they’re not a silver bullet.
And as you already know, a crashing NIF will take down your VM, destroying any claims you may have at being scalable.
Never use a NIF because "C is fast". This is only true in single-threaded programs.
BIFs
BIFs can also be harmful. While they are generally better than NIFs, they are not perfect and some of them might have harmful effects on the scheduler.
A great example of this is the erlang:decode_packet/3
BIF, when used for HTTP request or response decoding. Avoiding
its use in Cowboy allowed us to see a big increase in
the number of requests production systems were able to handle,
up to two times the original amount. Incidentally this is something
that is impossible to detect using synthetic benchmarks.
BIFs that return immediately are perfectly fine though.
Binary strings
Binary strings use less memory, which means you spend less time allocating memory compared to list-based strings. They are also more natural for strings manipulation because they are optimized for appending (as opposed to prepending for lists).
If you can process a binary string using a single match context, then the code will run incredibly fast. The effects will be much increased if the code was compiled using HiPE, even if your Erlang system isn’t compiled natively.
Avoid using binary:split
or binary:replace
if you can avoid it. They have a certain overhead due to supporting
many options that you probably don’t need for most operations.
Buffering and streaming
Use binaries. They are great for appending, and it’s a direct copy from what you receive from a stream (usually a socket or a file).
Be careful to not indefinitely receive data, as you might end up having a single binary taking up huge amounts of memory.
If you stream from a socket and know how much data you expect,
then fetch that data in a single recv
call.
Similarly, if you can use a single send
call, then
you should do so, to avoid going back and forth unnecessarily between
your Erlang process and the Erlang port for your socket.
List and binary comprehensions
Prefer list comprehensions over lists:map/2
. The
compiler will be able to optimize your code greatly, for example
not creating the result if you don’t need it. As time goes on,
more optimizations will be added to the compiler and you will
automatically benefit from them.
gen_server is no silver bullet
It’s a bad idea to use gen_server
for everything.
For two reasons.
There is an overhead everytime the gen_server
receives
a call, a cast or a simple message. It can be a problem if your
gen_server
is in a critical code path where speed
is all that matters. Do not hesitate to create other kinds of
processes where it makes sense. And depending on the kind of process,
you might want to consider making them special processes, which
would essentially behave the same as a gen_server
.
A common mistake is to have a unique gen_server
to
handle queries from many processes. This generally becomes the
biggest bottleneck you’ll want to fix. You should try to avoid
relying on a single process, using a pool if you can.
Supervisor and monitoring
A supervisor
is also a gen_server
,
so the previous points also apply to them.
Sometimes you’re in a situation where you have supervised processes but also want to monitor them in one or more other processes, effectively duplicating the work. The supervisor already knows when processes die, why not use this to our advantage?
You can create a custom supervisor process that will perform both the supervision and handle exit and other events, allowing to avoid the combination of supervising and monitoring which can prove harmful when many processes die at once, or when you have many short lived processes.
ets for LOLSPEED(tm)
If you have data queried or modified by many processes, then
ets
public or protected tables will give you the
performance boost you require. Do not forget to set the
read_concurrency
or write_concurrency
options though.
You might also be thrilled to know that Erlang R16B will feature
a big performance improvement for accessing ets
tables
concurrently.