Beyond OTP
All about the Psychobitch!
Loïc Hoguin - @lhoguin
Erlang Cowboy and Nine Nines Founder
OTP
OTP?
- Erlang is concurrent
- Erlang is fault tolerant
- Erlang is transparently distributed
- Erlang provides hot upgrades
You need OTP.
- Because Francesco says so.
OTP gives you...
- Architecture patterns
- Middlewares
- Libraries
- Tools
OTP is built on top of Erlang
- You could use Erlang without OTP
- But you would have to reimplement most of it
- OTP comes with the Erlang distribution
- OTP is closely tied to the Erlang VM
VM boot sequence
- Load a few very important modules
- Load the modules for kernel and stdlib
- Start the heart process
- Start the error_logger process
- Start the application_controller process
- Load and start OTP applications
Two types of applications
- OTP library applications
- OTP applications
OTP library applications
- Set of modules
- No process can ever belong to this application
OTP applications
- Set of modules
- Set of processes running in the application's supervision tree
- Implements the application behaviour
- Has one top-level supervisor and possibly more child supervisors
Application behaviour
- Middleware for starting and stopping applications
- Pretty much just starts the top-level supervisor
Supervisor behaviour
- Middleware for starting and supervising processes
- Restarts processes automatically
- Also keeps track of processes for code upgrades
- Key component of Erlang's fault tolerance claims
Great, but my application does nothing!
- There's more behaviours for that!
Generic server behaviour
- Erlang processes are isolated
- Communication occurs through a user-defined client-server protocol
- gen_server implements all the client-server communication logic
- gen_server also implements the server's receive loop
- gen_server provides a callback for updating the state data during code upgrades
- Experienced developers use gen_server 90% of the time
Other behaviours
- gen_fsm is a generic finite state machine
- gen_event is a generic event handler
- They are more specialized but just as useful as gen_server
Great, I'll use behaviours then!
- You should!
- People expect you to respect OTP principles by using behaviours
- But...
- They're no silver bullet
Sometimes a supervisor isn't enough
- Common case is having a supervisor + another process that monitors the same processes to maintain some kind of state
- The supervisor already keeps track of processes, why duplicate the work?
Sometimes a gen_server is too much
- Common case is needing LOLSPEED for a crucial part of the program
- Sometimes the convenience of gen_server gives too much overhead
Time to follow Joe Armstrong's advice
- Condensed quote
- "When the abstraction is inappropriate, you should ditch the gen_server and roll your own."
- And when you do, you should roll special processes
Special processes
Special processes?
- They are implemented using sys and proc_lib
- They comply to the OTP design principles
- They are familiar
- Other behaviours are implemented the same
proc_lib
- Ensures new processes are started properly (init_ack)
- Identifies the current process (initial call)
- Identifies the process' parent and ancestors
- Prints a crash log on failure if SASL is available
sys
- Debug and trace special processes
- Access and modify special processes state
- Suspend and resume special processes
- Safe hot code upgrade (code_change)
Using special processes
- Start your process with proc_lib:start_link/3
- Call proc_lib:init_ack/1 from the newly started process
- Write a receive loop
- Die if the parent process dies
- Handle system messages
- Implement system_continue/3, system_terminate/4 and system_code_change/4
Not quite a gen_server yet
- Our process fits OTP Design Principles
- Our process can receive and send messages
- Some processes need synchronized calls
Anatomy of a call
I know that!
- Right, that's usually explained when you learn Erlang
- Forget everything you learnt
- Let's look at OTP directly
General steps
- Find the Pid (if named locally, globally, is remote...)
- Try monitoring the process
- If monitor returns, continue to next slide
- Otherwise we have a C/Java node that might not support monitors
- Monitor the node instead and hope for the best
General steps, after monitor
- Send the message, noconnect (monitor did), catch exceptions (remote pid or port process)
- Receive either a reply, a node down, a process down or timeout
- Down? Exit with appropriate reason
- Timeout? Demonitor and exit(timeout)
- Reply? Demonitor and return {ok, Reply}
Do I need all this?
- No!
- Can't be a registered name? Skip the resolve part!
- Can't be a C/Java node? Cut that part
- Can't be a remote pid or a port? No exceptions will occur, nothing to catch
- Supervisor strategy restarts calling process if server process crashes? No need for monitor nor timeout
Make your own call
- Pick and choose what you need
- Discard the rest
- Get more performance for your bucks
- Remember, only do this if you really need it!
Case study: custom supervisor
Ranch connections supervisor
- Closely tied to acceptors
- When this process dies, acceptors die
- Acceptor creates connection processes with this supervisor
- Must limit the connections accept rate
Two processes
- Supervisor process links all connection processes
- Extra process used for rate limiting
- Extra process monitors all connection processes
- Two processes doing the same thing, waste of resources
One custom supervisor
- Supervisor process links all connection processes
- Supervisor process used for rate limiting
More savings
- start_protocol call is always local
- start_protocol call doesn't need a monitor or timeout
Is that really safe?
- Yes
- We don't have to make any assumptions
- We KNOW what to expect
Even more savings
- Acceptors don't need to pass around all parameters
- Supervisor can keep them and use them when needed
- Supervisor can send {shoot, Ref} itself directly
Supervisor savings
- We only need to handle which_children and count_children
- No need for child specs
- No need for strategies, they're all temporary!
- Only need to keep the Pid around
With all these savings we must be rich!
- Benchmarking shows Cowboy able to handle +10% requests/s
- Also shows latency reduced by 20%
- We also recover much better when it all goes to hell (too many connections or too many processes dying)
Conclusion
Keep it smart
- Start with gen_server and supervisor
- See if they're good enough
- Go custom otherwise
- (Make it work, make it pretty, make it fast)
Questions