Beyond OTP

All about the Psychobitch!

Loïc Hoguin - @lhoguin

Erlang Cowboy and Nine Nines Founder

OTP

OTP?

Erlang is concurrent
Erlang is fault tolerant
Erlang is transparently distributed
Erlang provides hot upgrades

Why would I need OTP?

I'm no psycho, bitch!

You need OTP.

Because Francesco says so.

OTP gives you...

Architecture patterns
Middlewares
Libraries
Tools

OTP is built on top of Erlang

You could use Erlang without OTP
But you would have to reimplement most of it
OTP comes with the Erlang distribution
OTP is closely tied to the Erlang VM

VM boot sequence

Load a few very important modules
Load the modules for kernel and stdlib
Start the heart process
Start the error_logger process
Start the application_controller process
Load and start OTP applications

Two types of applications

OTP library applications
OTP applications

OTP library applications

Set of modules
No process can ever belong to this application

OTP applications

Set of modules
Set of processes running in the application's supervision tree
Implements the application behaviour
Has one top-level supervisor and possibly more child supervisors

Application behaviour

Middleware for starting and stopping applications
Pretty much just starts the top-level supervisor

Supervisor behaviour

Middleware for starting and supervising processes
Restarts processes automatically
Also keeps track of processes for code upgrades
Key component of Erlang's fault tolerance claims

Great, but my application does nothing!

There's more behaviours for that!

Generic server behaviour

Erlang processes are isolated
Communication occurs through a user-defined client-server protocol
gen_server implements all the client-server communication logic
gen_server also implements the server's receive loop
gen_server provides a callback for updating the state data during code upgrades
Experienced developers use gen_server 90% of the time

Other behaviours

gen_fsm is a generic finite state machine
gen_event is a generic event handler
They are more specialized but just as useful as gen_server

Great, I'll use behaviours then!

You should!
People expect you to respect OTP principles by using behaviours
But...
They're no silver bullet

Sometimes a supervisor isn't enough

Common case is having a supervisor + another process that monitors the same processes to maintain some kind of state
The supervisor already keeps track of processes, why duplicate the work?

Sometimes a gen_server is too much

Common case is needing LOLSPEED for a crucial part of the program
Sometimes the convenience of gen_server gives too much overhead

Time to follow Joe Armstrong's advice

Condensed quote
"When the abstraction is inappropriate, you should ditch the gen_server and roll your own."
And when you do, you should roll special processes

Special processes

Special processes?

They are implemented using sys and proc_lib
They comply to the OTP design principles
They are familiar
Other behaviours are implemented the same

proc_lib

Ensures new processes are started properly (init_ack)
Identifies the current process (initial call)
Identifies the process' parent and ancestors
Prints a crash log on failure if SASL is available

sys

Debug and trace special processes
Access and modify special processes state
Suspend and resume special processes
Safe hot code upgrade (code_change)

Using special processes

Start your process with proc_lib:start_link/3
Call proc_lib:init_ack/1 from the newly started process
Write a receive loop
Die if the parent process dies
Handle system messages
Implement system_continue/3, system_terminate/4 and system_code_change/4

Template

Not quite a gen_server yet

Our process fits OTP Design Principles
Our process can receive and send messages
Some processes need synchronized calls

Anatomy of a call

I know that!

Right, that's usually explained when you learn Erlang
Forget everything you learnt
Let's look at OTP directly

General steps

Find the Pid (if named locally, globally, is remote...)
Try monitoring the process
If monitor returns, continue to next slide
Otherwise we have a C/Java node that might not support monitors
Monitor the node instead and hope for the best

General steps, after monitor

Send the message, noconnect (monitor did), catch exceptions (remote pid or port process)
Receive either a reply, a node down, a process down or timeout
Down? Exit with appropriate reason
Timeout? Demonitor and exit(timeout)
Reply? Demonitor and return {ok, Reply}

In the code

In the code, continued

Do I need all this?

No!
Can't be a registered name? Skip the resolve part!
Can't be a C/Java node? Cut that part
Can't be a remote pid or a port? No exceptions will occur, nothing to catch
Supervisor strategy restarts calling process if server process crashes? No need for monitor nor timeout

Make your own call

Pick and choose what you need
Discard the rest
Get more performance for your bucks
Remember, only do this if you really need it!

Case study: custom supervisor

Ranch connections supervisor

Closely tied to acceptors
When this process dies, acceptors die
Acceptor creates connection processes with this supervisor
Must limit the connections accept rate

Two processes

Supervisor process links all connection processes
Extra process used for rate limiting
Extra process monitors all connection processes
Two processes doing the same thing, waste of resources

One custom supervisor

Supervisor process links all connection processes
Supervisor process used for rate limiting

More savings

start_protocol call is always local
start_protocol call doesn't need a monitor or timeout

start_protocol

Is that really safe?

Yes
We don't have to make any assumptions
We KNOW what to expect

Even more savings

Acceptors don't need to pass around all parameters
Supervisor can keep them and use them when needed
Supervisor can send {shoot, Ref} itself directly

Supervisor savings

We only need to handle which_children and count_children
No need for child specs
No need for strategies, they're all temporary!
Only need to keep the Pid around

With all these savings we must be rich!

Benchmarking shows Cowboy able to handle +10% requests/s
Also shows latency reduced by 20%
We also recover much better when it all goes to hell (too many connections or too many processes dying)

Conclusion

Keep it smart

Start with gen_server and supervisor
See if they're good enough
Go custom otherwise
(Make it work, make it pretty, make it fast)

Links

http://ninenines.eu
https://github.com/extend/ranch/blob/master/src/ranch_conns_sup.erl
Twitter: @lhoguin
IRC: #ninenines on Freenode

Questions