summaryrefslogtreecommitdiffstats
path: root/_build/content/articles/cowboy-2.13.0-performance.asciidoc
blob: 22c0b7a7c6098a59c664ee0efc89f8f43811794e (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
+++
date = "2025-02-13T07:00:00+01:00"
title = "Performance improvements in Cowboy 2.13"

+++

Cowboy `2.13.0` is close to being released. Much of the
work done on this release focused on performance, and in
particular the performance of the Websocket protocol.

The Websocket protocol requires clients to mask the data
they send, originally to avoid issues with proxies. This
requirement is still present for Websocket over HTTP/2
nevertheless. This means that the server has to unmask
the data, which has a notable impact on performance.
Cowboy 2.13 will attempt to unmask 16 bytes at a time
instead of 4 previously, leading to a rouhgly 10%
faster processing of Websocket frames.

Similarly, the Websocket protocol requires validating
that the data in text frames is valid UTF-8. This part
of the Websocket code was always highly optimised, but
changes in the VM made the optimisations less impactful.
The validation code was therefore rewritten, inspired
by the validation code in Erlang/OTP's new `json` module
(itself inspired by the validation in Cowboy!) and
Cowboy 2.13 is now 10% to 15% faster processing Websocket
text frames.

Cowboy must keep track of the data coming in to detect
whether the client is still around, or gone. It does so
via a timeout that is reset when data is received. Since
it may receive a lot of data, there may be a lot of
timer resets, impacting performance. Cowboy 2.13 will
instead run a timer periodically with a timeout that is
a tenth of `idle_timeout`, and count how many times in
a row that timeout was triggered without receiving data.
This greatly reduces the number of times we set or reset
the timer and makes some Websocket scenarios 10% faster.

These timer improvements were also brought to HTTP/2.

Cowboy by default will not configure transport options,
because they are highly dependent on environment, so
anyone looking to get the best performance out of Cowboy
has to figure out the best values. One of these values
is the `buffer` option, which represents the size of
the socket's buffer on OTP's side. Its default is very
low (1460 bytes) and so performance is highly degraded
in some scenarios, such as reading large request bodies,
or large Websocket frames. During experiments it was
found that there is no best default value, even in a
single environment. Indeed the performance was highly
dependent on the size of the data we were receiving
for each packet. Cowboy 2.13 therefore comes with a
new `dynamic_buffer` mechanism that keeps track of the
data received and updates the `buffer` size dynamically
accordingly. The performance gains of this new approach
are enormous as you will see at the end of this article,
and have a positive impact on all protocols in most scenarios.
The technical details can be found in commit https://github.com/ninenines/cowboy/commit/49be0f57cf5ce66178dc24b9c08c835888d1ce0e[49be0f57cf5ce66178dc24b9c08c835888d1ce0e].

Finally, HTTP/1.1 and HTTP/2 connection processes can
now be set to `hibernate` automatically, which can
have a positive impact on performance in some scenarios
(such as receiving large HTTP/1.1 request bodies) even
if most scenarios will see a small drop in performance.
Cowboy 2.13 will not `hibernate` by default, but it
felt important to mention in this article: if your
service deals with GB-sized request bodies, it may help.

Cumulatively, the optimisations improve the performance
of Cowboy 2.13, compared to Cowboy 2.12, in these terms:

A single HTTP request uploading a 10GB file is handled
11x faster over HTTP, 8x faster over HTTPS and 7.5 faster
over HTTP/2.

Ten thousand sequential HTTP requests each uploading a 1MB
file is handled 23x faster over HTTP, 6x faster over HTTPS
and 5x faster over HTTP/2.

The same benchmark over 10 concurrent connections is
7x faster over HTTP, 1.7x faster over HTTPS and 3.5x
faster over HTTP/2.

Performance of "hello world" type of benchmarks, where
the response is received before issuing a new request,
sees little impact from the changes in Cowboy 2.13.

Performance of HTTP/2 over TCP connections, which is
typically not used in production, is dramatically
improved depending on the HTTP/2 options used.
Performance improvements of up to 57x faster processing
of 10GB request bodies have been observed. I suspect
there might be an underlying issue in Erlang/OTP that
leads to performance issues that only HTTP/2 over TCP
could see in Cowboy 2.12, and have opened a ticket.

Websocket performance improvements vary depending on the
type of data: binary frames, text frames containing ASCII,
text frames containing mixed ASCII/UTF-8 and text frames
containing mostly UTF-8. They also vary depending on the
underlying protocol (HTTP/1.1 or HTTP/2). Again, comparing
Cowboy 2.13 to Cowboy 2.12:

Binary text frames are processed 5x faster in both
Websocket over HTTP/1.1 and Websocket over HTTP/2.

ASCII text frames are processed 4x to 6x faster in
Websocket over HTTP/1.1, depending on the scenario,
and 4x faster in Websocket over HTTP/2.

Mixed text frames are processed 3x to 4.5x faster in
Websocket over HTTP/1.1, depending on the scenario,
and 3x faster in Websocket over HTTP/2.

Mostly UTF-8 text frames are processed 3x faster in
Websocket over HTTP/1.1, depending on the scenario,
and 2.5x faster in Websocket over HTTP/2.

When deflate compression is enabled, the performance
is roughly the same between Cowboy 2.12 and Cowboy 2.13.

Most of the performance improvements are due to the
new `dynamic_buffer` option. Users that already set
a custom `buffer` value will not see as much improvement
in this release, since they already gained much from
tweaking `buffer`. Do note that the `dynamic_buffer`
option will not be enabled by default if `buffer` is
configured.

Still, all other improvements should be beneficial
to all users, particularly the Websocket improvements.
I hope that you are looking forward to these changes!
I will now be preparing the Cowboy 2.13 release.

You can donate to this project via
https://github.com/sponsors/essen[GitHub Sponsors].

As usual, feedback is appreciated, and issues or
questions should be sent via Github tickets or
discussions. We also have a new Discord server.
https://discord.gg/x25nNq2fFE[Join Erlang OSS Discord now!]