diff options
Diffstat (limited to '_build/content/articles/xerl-0.1-empty-modules.asciidoc')
-rw-r--r-- | _build/content/articles/xerl-0.1-empty-modules.asciidoc | 153 |
1 files changed, 153 insertions, 0 deletions
diff --git a/_build/content/articles/xerl-0.1-empty-modules.asciidoc b/_build/content/articles/xerl-0.1-empty-modules.asciidoc new file mode 100644 index 00000000..b2c178b2 --- /dev/null +++ b/_build/content/articles/xerl-0.1-empty-modules.asciidoc @@ -0,0 +1,153 @@ ++++ +date = "2013-01-30T00:00:00+01:00" +title = "Xerl: empty modules" + ++++ + +Let's build a programming language. I call it Xerl: eXtended ERLang. +It'll be an occasion for us to learn a few things, especially me. + +Unlike in Erlang, in this language, everything is an expression. +This means that modules and functions are expression, and indeed that +you can have more than one module per file. + +We are just starting, so let's no go ahead of ourselves here. We'll +begin with writing the code allowing us to compile an empty module. + +We will compile to Core Erlang: this is one of the many intermediate +step your Erlang code compiles to before it becomes BEAM machine code. +Core Erlang is a very neat language for machine generated code, and we +will learn many things about it. + +Today we will only focus on compiling the following code: + +[source,erlang] +mod my_module +begin +end + +Compilation will be done in a few steps. First, the source file will +be transformed in a tree of tokens by the lexer. Then, the parser will +use that tree of tokens and convert it to the AST, bringing semantical +meaning to our representation. Finally, the code generator will transform +this AST to Core Erlang AST, which will then be compiled. + +We will use _leex_ for the lexer. This lexer uses .xrl files +which are then compiled to .erl files that you can then compile to BEAM. +The file is divided in three parts: definitions, rules and Erlang code. +Definitions and Erlang code are obvious; rules are what concerns us. + +We only need two things: atoms and whitespaces. Atoms are a lowercase +letter followed by any letter, number, _ or @. Whitespace is either a +space, an horizontal tab, \r or \n. There exists other kinds of whitespaces +but we simply do not allow them in the Xerl language. + +Rules consist of a regular expression followed by Erlang code. The +latter must return a token representation or the atom `skip_token`. + +[source,erlang] +---- +{L}{A}* : + Atom = list_to_atom(TokenChars), + {token, case reserved_word(Atom) of + true -> {Atom, TokenLine}; + false -> {atom, TokenLine, Atom} + end}. + +{WS}+ : skip_token. +---- + +The first rule matches an atom, which is converted to either a special +representation for reserved words, or an atom tuple. The +`TokenChars` variable represents the match as a string, and +the `TokenLine` variable contains the line number. +https://github.com/extend/xerl/blob/0.1/src/xerl_lexer.xrl[View the complete file]. + +We obtain the following result from the lexer: + +[source,erlang] +---- +[{mod,1},{atom,1,my_module},{'begin',2},{'end',3}] +---- + +The second step is to parse this list of tokens to add semantic meaning +and generate what is called an _abstract syntax tree_. We will be +using the _yecc_ parser generator for this. This time it will take +.yrl files but the process is the same as before. The file is a little +more complex than for the lexer, we need to define at the very least +terminals, nonterminals and root symbols, the grammar itself, and +optionally some Erlang code. + +To compile our module, we need a few things. First, everything is an +expression. We thus need list of expressions and individual expressions. +We will support a single expression for now, the `mod` +expression which defines a module. And that's it! We end up with the +following grammar: + +[source,erlang] +---- +exprs -> expr : ['$1']. +exprs -> expr exprs : ['$1' | '$2']. + +expr -> module : '$1'. + +module -> 'mod' atom 'begin' 'end' : + {'mod', ?line('$1'), '$2', []}. +---- + +https://github.com/extend/xerl/blob/0.1/src/xerl_parser.yrl[View the complete file]. + +We obtain the following result from the parser: + +[source,erlang] +---- +[{mod,1,{atom,1,my_module},[]}] +---- + +We obtain a list of a single `mod` expression. Just like +we wanted. Last step is generating the Core Erlang code from it. + +Code generation generally is comprised of several steps. We will +discuss these in more details later on. For now, we will focus on the +minimal needed for successful compilation. + +We can use the `cerl` module to generate Core Erlang AST. +We will simply be using functions, which allows us to avoid learning +and keeping up to date with the internal representation. + +There's one important thing to do when generating Core Erlang AST +for a module: create the `module_info/{0,1}` functions. +Indeed, these are added to Erlang before it becomes Core Erlang, and +so we need to replicate this ourselves. Do not be concerned however, +as this only takes a few lines of extra code. + +As you can see by +https://github.com/extend/xerl/blob/0.1/src/xerl_codegen.erl[looking at the complete file], +the code generator echoes the grammar we defined in the parser, and +simply applies the appropriate Core Erlang functions for each expressions. + +We obtain the following pretty-printed Core Erlang generated code: + +[source,erlang] +---- +module 'my_module' ['module_info'/0, + 'module_info'/1] + attributes [] +'module_info'/0 = + fun () -> + call 'erlang':'get_module_info' + ('empty_module') +'module_info'/1 = + fun (Key) -> + call 'erlang':'get_module_info' + ('empty_module', Key) +end +---- + +For convenience I added all the steps in a `xerl:compile/1` +function that you can use against your own .xerl files. + +That's it for today! We will go into more details over each steps in +the next few articles. + +* https://github.com/extend/xerl/blob/0.1/[View the source] |