summaryrefslogtreecommitdiffstats
path: root/_build/content/articles/xerl-0.1-empty-modules.asciidoc
diff options
context:
space:
mode:
Diffstat (limited to '_build/content/articles/xerl-0.1-empty-modules.asciidoc')
-rw-r--r--_build/content/articles/xerl-0.1-empty-modules.asciidoc153
1 files changed, 153 insertions, 0 deletions
diff --git a/_build/content/articles/xerl-0.1-empty-modules.asciidoc b/_build/content/articles/xerl-0.1-empty-modules.asciidoc
new file mode 100644
index 00000000..b2c178b2
--- /dev/null
+++ b/_build/content/articles/xerl-0.1-empty-modules.asciidoc
@@ -0,0 +1,153 @@
++++
+date = "2013-01-30T00:00:00+01:00"
+title = "Xerl: empty modules"
+
++++
+
+Let's build a programming language. I call it Xerl: eXtended ERLang.
+It'll be an occasion for us to learn a few things, especially me.
+
+Unlike in Erlang, in this language, everything is an expression.
+This means that modules and functions are expression, and indeed that
+you can have more than one module per file.
+
+We are just starting, so let's no go ahead of ourselves here. We'll
+begin with writing the code allowing us to compile an empty module.
+
+We will compile to Core Erlang: this is one of the many intermediate
+step your Erlang code compiles to before it becomes BEAM machine code.
+Core Erlang is a very neat language for machine generated code, and we
+will learn many things about it.
+
+Today we will only focus on compiling the following code:
+
+[source,erlang]
+mod my_module
+begin
+end
+
+Compilation will be done in a few steps. First, the source file will
+be transformed in a tree of tokens by the lexer. Then, the parser will
+use that tree of tokens and convert it to the AST, bringing semantical
+meaning to our representation. Finally, the code generator will transform
+this AST to Core Erlang AST, which will then be compiled.
+
+We will use _leex_ for the lexer. This lexer uses .xrl files
+which are then compiled to .erl files that you can then compile to BEAM.
+The file is divided in three parts: definitions, rules and Erlang code.
+Definitions and Erlang code are obvious; rules are what concerns us.
+
+We only need two things: atoms and whitespaces. Atoms are a lowercase
+letter followed by any letter, number, _ or @. Whitespace is either a
+space, an horizontal tab, \r or \n. There exists other kinds of whitespaces
+but we simply do not allow them in the Xerl language.
+
+Rules consist of a regular expression followed by Erlang code. The
+latter must return a token representation or the atom `skip_token`.
+
+[source,erlang]
+----
+{L}{A}* :
+ Atom = list_to_atom(TokenChars),
+ {token, case reserved_word(Atom) of
+ true -> {Atom, TokenLine};
+ false -> {atom, TokenLine, Atom}
+ end}.
+
+{WS}+ : skip_token.
+----
+
+The first rule matches an atom, which is converted to either a special
+representation for reserved words, or an atom tuple. The
+`TokenChars` variable represents the match as a string, and
+the `TokenLine` variable contains the line number.
+https://github.com/extend/xerl/blob/0.1/src/xerl_lexer.xrl[View the complete file].
+
+We obtain the following result from the lexer:
+
+[source,erlang]
+----
+[{mod,1},{atom,1,my_module},{'begin',2},{'end',3}]
+----
+
+The second step is to parse this list of tokens to add semantic meaning
+and generate what is called an _abstract syntax tree_. We will be
+using the _yecc_ parser generator for this. This time it will take
+.yrl files but the process is the same as before. The file is a little
+more complex than for the lexer, we need to define at the very least
+terminals, nonterminals and root symbols, the grammar itself, and
+optionally some Erlang code.
+
+To compile our module, we need a few things. First, everything is an
+expression. We thus need list of expressions and individual expressions.
+We will support a single expression for now, the `mod`
+expression which defines a module. And that's it! We end up with the
+following grammar:
+
+[source,erlang]
+----
+exprs -> expr : ['$1'].
+exprs -> expr exprs : ['$1' | '$2'].
+
+expr -> module : '$1'.
+
+module -> 'mod' atom 'begin' 'end' :
+ {'mod', ?line('$1'), '$2', []}.
+----
+
+https://github.com/extend/xerl/blob/0.1/src/xerl_parser.yrl[View the complete file].
+
+We obtain the following result from the parser:
+
+[source,erlang]
+----
+[{mod,1,{atom,1,my_module},[]}]
+----
+
+We obtain a list of a single `mod` expression. Just like
+we wanted. Last step is generating the Core Erlang code from it.
+
+Code generation generally is comprised of several steps. We will
+discuss these in more details later on. For now, we will focus on the
+minimal needed for successful compilation.
+
+We can use the `cerl` module to generate Core Erlang AST.
+We will simply be using functions, which allows us to avoid learning
+and keeping up to date with the internal representation.
+
+There's one important thing to do when generating Core Erlang AST
+for a module: create the `module_info/{0,1}` functions.
+Indeed, these are added to Erlang before it becomes Core Erlang, and
+so we need to replicate this ourselves. Do not be concerned however,
+as this only takes a few lines of extra code.
+
+As you can see by
+https://github.com/extend/xerl/blob/0.1/src/xerl_codegen.erl[looking at the complete file],
+the code generator echoes the grammar we defined in the parser, and
+simply applies the appropriate Core Erlang functions for each expressions.
+
+We obtain the following pretty-printed Core Erlang generated code:
+
+[source,erlang]
+----
+module 'my_module' ['module_info'/0,
+ 'module_info'/1]
+ attributes []
+'module_info'/0 =
+ fun () ->
+ call 'erlang':'get_module_info'
+ ('empty_module')
+'module_info'/1 =
+ fun (Key) ->
+ call 'erlang':'get_module_info'
+ ('empty_module', Key)
+end
+----
+
+For convenience I added all the steps in a `xerl:compile/1`
+function that you can use against your own .xerl files.
+
+That's it for today! We will go into more details over each steps in
+the next few articles.
+
+* https://github.com/extend/xerl/blob/0.1/[View the source]