summaryrefslogblamecommitdiffstats
path: root/_build/content/articles/xerl-0.1-empty-modules.asciidoc
blob: b2c178b2396c1ecf2e997f3b08353c06855c326a (plain) (tree)
























































































































































                                                                                           
+++
date = "2013-01-30T00:00:00+01:00"
title = "Xerl: empty modules"

+++

Let's build a programming language. I call it Xerl: eXtended ERLang.
It'll be an occasion for us to learn a few things, especially me.

Unlike in Erlang, in this language, everything is an expression.
This means that modules and functions are expression, and indeed that
you can have more than one module per file.

We are just starting, so let's no go ahead of ourselves here. We'll
begin with writing the code allowing us to compile an empty module.

We will compile to Core Erlang: this is one of the many intermediate
step your Erlang code compiles to before it becomes BEAM machine code.
Core Erlang is a very neat language for machine generated code, and we
will learn many things about it.

Today we will only focus on compiling the following code:

[source,erlang]
mod my_module
begin
end

Compilation will be done in a few steps. First, the source file will
be transformed in a tree of tokens by the lexer. Then, the parser will
use that tree of tokens and convert it to the AST, bringing semantical
meaning to our representation. Finally, the code generator will transform
this AST to Core Erlang AST, which will then be compiled.

We will use _leex_ for the lexer. This lexer uses .xrl files
which are then compiled to .erl files that you can then compile to BEAM.
The file is divided in three parts: definitions, rules and Erlang code.
Definitions and Erlang code are obvious; rules are what concerns us.

We only need two things: atoms and whitespaces. Atoms are a lowercase
letter followed by any letter, number, _ or @. Whitespace is either a
space, an horizontal tab, \r or \n. There exists other kinds of whitespaces
but we simply do not allow them in the Xerl language.

Rules consist of a regular expression followed by Erlang code. The
latter must return a token representation or the atom `skip_token`.

[source,erlang]
----
{L}{A}* :
    Atom = list_to_atom(TokenChars),
    {token, case reserved_word(Atom) of
        true -> {Atom, TokenLine};
        false -> {atom, TokenLine, Atom}
    end}.

{WS}+ : skip_token.
----

The first rule matches an atom, which is converted to either a special
representation for reserved words, or an atom tuple. The
`TokenChars` variable represents the match as a string, and
the `TokenLine` variable contains the line number.
https://github.com/extend/xerl/blob/0.1/src/xerl_lexer.xrl[View the complete file].

We obtain the following result from the lexer:

[source,erlang]
----
[{mod,1},{atom,1,my_module},{'begin',2},{'end',3}]
----

The second step is to parse this list of tokens to add semantic meaning
and generate what is called an _abstract syntax tree_. We will be
using the _yecc_ parser generator for this. This time it will take
.yrl files but the process is the same as before. The file is a little
more complex than for the lexer, we need to define at the very least
terminals, nonterminals and root symbols, the grammar itself, and
optionally some Erlang code.

To compile our module, we need a few things. First, everything is an
expression. We thus need list of expressions and individual expressions.
We will support a single expression for now, the `mod`
expression which defines a module. And that's it! We end up with the
following grammar:

[source,erlang]
----
exprs -> expr : ['$1'].
exprs -> expr exprs : ['$1' | '$2'].

expr -> module : '$1'.

module -> 'mod' atom 'begin' 'end' :
	{'mod', ?line('$1'), '$2', []}.
----

https://github.com/extend/xerl/blob/0.1/src/xerl_parser.yrl[View the complete file].

We obtain the following result from the parser:

[source,erlang]
----
[{mod,1,{atom,1,my_module},[]}]
----

We obtain a list of a single `mod` expression. Just like
we wanted. Last step is generating the Core Erlang code from it.

Code generation generally is comprised of several steps. We will
discuss these in more details later on. For now, we will focus on the
minimal needed for successful compilation.

We can use the `cerl` module to generate Core Erlang AST.
We will simply be using functions, which allows us to avoid learning
and keeping up to date with the internal representation.

There's one important thing to do when generating Core Erlang AST
for a module: create the `module_info/{0,1}` functions.
Indeed, these are added to Erlang before it becomes Core Erlang, and
so we need to replicate this ourselves. Do not be concerned however,
as this only takes a few lines of extra code.

As you can see by
https://github.com/extend/xerl/blob/0.1/src/xerl_codegen.erl[looking at the complete file],
the code generator echoes the grammar we defined in the parser, and
simply applies the appropriate Core Erlang functions for each expressions.

We obtain the following pretty-printed Core Erlang generated code:

[source,erlang]
----
module 'my_module' ['module_info'/0,
                       'module_info'/1]
    attributes []
'module_info'/0 =
    fun () ->
        call 'erlang':'get_module_info'
            ('empty_module')
'module_info'/1 =
    fun (Key) ->
        call 'erlang':'get_module_info'
            ('empty_module', Key)
end
----

For convenience I added all the steps in a `xerl:compile/1`
function that you can use against your own .xerl files.

That's it for today! We will go into more details over each steps in
the next few articles.

* https://github.com/extend/xerl/blob/0.1/[View the source]