summaryrefslogblamecommitdiffstats
path: root/_build/content/articles/cowboy2-qs.asciidoc
blob: 90ef714b518c393ecd7527de191145d8821c3298 (plain) (tree)























































































































































































                                                                                
+++
date = "2014-08-20T00:00:00+01:00"
title = "Cowboy 2.0 and query strings"

+++

Now that Cowboy 1.0 is out, I can spend some of my time thinking
about Cowboy 2.0 that will be released soon after Erlang/OTP 18.0.
This entry discusses the proposed changes to query string handling
in Cowboy.

Cowboy 2.0 will respond to user wishes by simplifying the interface
of the `cowboy_req` module. Users want two things: less
juggling with the Req variable, and more maps. Maps is the only
dynamic key/value data structure in Erlang that we can match directly
to extract values, allowing users to greatly simplify their code as
they don't need to call functions to do everything anymore.

Query strings are a good candidate for maps. It's a list of
key/values, so it's pretty obvious we can win a lot by using maps.
However query strings have one difference with maps: they can have
duplicate keys.

How are we expected to handle duplicate keys? There's no standard
behavior. It's up to applications. And looking at what is done in
the wild, there's no de facto standard either. While some ignore
duplicate keys (keeping the first or the last they find), others
require duplicate keys to end with `[]` to automatically
put the values in a list, or even worse, languages like PHP even
allow you to do things like `key[something][other]` and
create a deep structure for it. Finally some allow any key to have
duplicates and just gives you lists of key/values.

Cowboy so far had functions to retrieve query string values one
value at a time, and if there were duplicates it would return the
first it finds. It also has a function returning the entire list
with all duplicates, allowing you to filter it to get all of them,
and another function that returns the raw query string.

What are duplicates used for? Not that many things actually.

One use of duplicate keys is with HTML forms. It is common practice
to give all related checkboxes the same name so you get a list of
what's been checked. When nothing is checked, nothing is sent at all,
the key is not in the list.

Another use of duplicate keys is when generating forms. A good
example of that would be a form that allows uploading any number
of files. When you add a file, client-side code adds another field
to the form. Repeat up to a certain limit.

And that's about it. Of note is that HTML radio elements share
the same name too, but only one key/value is sent, so they are not
relevant here.

Normally this would be the part where I tell you how we solve
this elegantly. But I had doubts. Why? Because there's no good
solutions to solving only this particular problem.

I then stopped thinking about duplicate keys for a minute and
started to think about the larger problem.

Query strings are input data. They take a particular form,
and may be sent as part of the URI or as part of the request
body. We have other kinds of input data. We have headers and
cookies and the request body in various forms. We also have
path segments in URIs.

What do you do with input data? Well you use it to do
something. But there is one thing that you almost always do
(and if you don't, you really should): you validate it and
you map it into Erlang terms.

Cowboy left the user take care of validation and conversion
into Erlang terms so far. Rather, it left the user take care
of it everywhere except one place. Guess where? That's right,
bindings.

If you define routes with bindings then you have the option
to provide constraints. Constraints can be used to do two things:
validate the data and convert it in a more appropriate term. For
example if you use the `int` constraint, Cowboy will
make sure the binding is an integer, and will replace the value
with the integer representation so that you can use it directly.
In this particular case it not only routes the URI, but also
validates and converts the bindings directly.

This is very relevant in the case of our duplicate keys,
because if we have a list with duplicates of a key, chances
are we want to convert that into a list of Erlang terms, and
also make sure that all the elements in this list are expected.

The answer to this particular problem is simple. We need a
function that will parse the query string and apply constraints.
But this is not all, there is one other problem to be solved.

The other problem is that for the user some keys are mandatory
and some are optional. Optional keys include the ones that
correspond to HTML checkboxes: if the key for one or more
checkbox is missing from the query string, we still want to
have an empty list in our map so we can easily match. Matching
maps is great, but not so much when values might be missing,
so we have to normalize this data a little.

This problem is solved by allowing a default value. If the
key is missing and a default exists, set it. If no default
exists, then the key was mandatory and we want to crash.

I therefore make a proposal for changing the query string
interface to three functions.

The first function already exists, it is `cowboy_req:qs(Req)`
and it returns only the query string binary. No more Req returned.

The second function is a renaming of `cowboy_req:qs_vals(Req)`
to something more explicit: `cowboy_req:parse_qs(Req)`.
The new name implies that a parsing operation is done. It was implicit
and cached before. It will be explicit and not cached anymore now.
Again, no more Req returned.

The third function is the one I mentioned above. I think
the interface `cowboy_req:match_qs(Req, Fields)` is
most appropriate. It returns a normalized map that is the same
regardless of optional fields being provided with the request,
allowing for easy matching. It crashes if something went wrong.
Still no Req returned.

I feel that this three function interface provides everything
one would need to comfortably write applications. You can get
low level and get the query string directly; you can get a list
of key/value binaries without any additional processing and do it
on your own; or you can get a processed map that contains Erlang
terms ready to be used.

I strongly believe that by democratizing the constraints to
more than just bindings, but also to query string, cookies and
other key/values in Cowboy, we can allow the developer to quickly
and easily go from HTTP request to Erlang function calls. The
constraints are reusable functions that can serve as guards
against unwanted data, providing convenience in the process.

Your handlers will not look like an endless series of calls
to get and convert the input data, they will instead be just
one call at the beginning followed by the actual application
logic, thanks to constraints and maps.

[source,erlang]
----
handle(Req, State) ->
    #{name:=Name, email:=Email, choices:=ChoicesList, remember_me:=RememberMe} =
        cowboy_req:match_qs(Req, [
            name, {email, email},
            {choices, fun check_choices/1, []},
            {remember_me, boolean, false}]),
    save_choices(Name, Email, ChoicesList),
    if RememberMe -> create_account(Name, Email); true -> ok end,
    {ok, Req, State}.

check_choices(<<"blue">>) -> {true, blue};
check_choices(<<"red">>) -> {true, red};
check_choices(_) -> false;
----

(Don't look too closely at the structure yet.)

As you can see in the above snippet, it becomes really easy
to go from query string to values. You can also use the map
directly as it is guaranteed to only contain the keys you
specified, any extra key is not returned.

This would I believe be a huge step up as we can now
focus on writing applications instead of translating HTTP
calls. Cowboy can now take care of it.

And to conclude, this also solves our duplicate keys
dilemma, as they now automatically become a list of binaries,
and this list is then checked against constraints that
will fail if they were not expecting a list. And in the
example above, it even converts the values to atoms for
easier manipulation.

As usual, feedback is more than welcome, and I apologize
for the rocky structure of this post as it contains all the
thoughts that went into this rather than just the conclusion.