The xmerl XML parser is able to parse XML documents according to the XML 1.0 standard. As default it performs well-formed parsing,(syntax checks and checks of well-formed constraints). Optionally one can also use xmerl as a validating parser,(validate according to referenced DTD and validating constraints). By means of for example the xmerl_xs module it is possible to transform the parsed result to other formats, e.g. text, HTML, XML etc.
This document does not give an introduction to XML. There
are a lot of books available that describe XML from
different views. At the
However, here you will find some examples of how to use and to what you can use xmerl. A detailed description of the user interface can be found in the reference manual.
There are two known shortcomings in xmerl:
By parsing an XML document you will get a record, displaying the structure of the document, as return value. The record also holds the data of the document. xmerl is convenient to use in for instance the following scenarios:
You need to retrieve data from XML documents. Your Erlang software can handle information from the XML document by extracting data from the data structure received by parsing.
It is also possible to do further processing of parsed XML with xmerl. If you want to change format of the XML document to for instance HTML, text or other XML format you can transform it. There is support for such transformations in xmerl.
One may also convert arbitrary data to XML. So it for instance is easy to make it readable by humans. In this case you first create xmerl data structures out of your data, then transform it to XML.
You can find examples of these three examples of usage below.
The following records used by xmerl to save the parsed
data are defined in
The result of a successful parsing is a tuple
#xmlElement{name=Name, ... parents=Parents, ... attributes=Attrs, content=Content, ...}
The name of the element is found in the
The record
The
<?xml version="1.0"?> <dog> Grand Danois </dog>
The parse result will be:
#xmlElement{name = dog, ... parents = [], ... attributes = [], content = [{xmlText,[{dog,1}],1,[],"\ Grand Danois\ ",text}], ... }
Where the content of the top element is:
Space characters between mark-up as
An unsuccessful parse results in an error, which may be a
tuple
In the following examples we use the XML file
"motorcycles.xml" and the corresponding DTD
"motorcycles.dtd". motorcycles.xml looks like:
and motorcycles.dtd looks like:
If you want to parse the XML file motorcycles.xml you run it in the Erlang shell like:
3> {ParsResult,Misc}=xmerl_scan:file("motorcycles.xml"). {{xmlElement,motorcycles, motorcycles, [], {xmlNamespace,[],[]}, [], 1, [], [{xmlText,[{motorcycles,1}],1,[],"\ ",text}, {xmlElement,bike, bike, [], {xmlNamespace,[],[]}, [{motorcycles,1}], 2, [{xmlAttribute,year,[],[],[],[]|...}, {xmlAttribute,color,[],[],[]|...}], [{xmlText,[{bike,2},{motorcycles|...}], 1, []|...}, {xmlElement,name,name,[]|...}, {xmlText,[{...}|...],3|...}, {xmlElement,engine|...}, {xmlText|...}, {...}|...], [], ".", undeclared}, ... ], [], ".", undeclared}, []} 4>
If you instead receives the XML doc as a string you can
parse it by
In this example consider the situation where you want to examine a particular data in the XML file. For instance, you want to check for how long each motorcycle have been recorded.
Take a look at the DTD and observe that the structure of an
XML document that is conformant to this DTD must have one
motorcycles element (the root element). The motorcycles element
must have at least one bike element. After each bike element it
may be a date element. The content of the date element is
#PCDATA (Parsed Character DATA), i.e. raw text. Observe that if
#PCDATA must have a
If you successfully parse the XML file with the validation
on as in:
Thus, knowing the allowed structure it is easy to write a program that traverses the data structure and picks the information in the xmlElements records with name date.
Observe that white space: each space, tab or line feed, between mark-up results in an xmlText record.
For this task there are more than one way to go. The "brute force" method is to create the records you need and feed your data in the content and attribute fields of the appropriate element.
There is support for this in xmerl by the "simple-form"
format. You can put your data in a simple-form data structure
and feed it into
The Types are:
Element is any of:
The simple-form structure is any of
See also reference manual for
If you want to add the information about a black Harley Davidsson 1200 cc Sportster motorcycle from 2003 that is in shape as new in the motorcycles.xml document you can put the data in a simple-form data structure like:
Data = {bike, [{year,"2003"},{color,"black"},{condition,"new"}], [{name, [{manufacturer,["Harley Davidsson"]}, {brandName,["XL1200C"]}, {additionalName,["Sportster"]}]}, {engine, ["V-engine, 2-cylinders, 1200 cc"]}, {kind,["custom"]}, {drive,["belt"]}]}
In order to append this data to the end of the motorcycles.xml document you have to parse the file and add Data to the end of the root element content.
{RootEl,Misc}=xmerl_scan:file('motorcycles.xml'), #xmlElement{content=Content} = RootEl, NewContent=Content++lists:flatten([Data]), NewRootEl=RootEl#xmlElement{content=NewContent},
Then you can run it through the export_simple/2 function:
{ok,IOF}=file:open('new_motorcycles.xml',[write]), Export=xmerl:export_simple([NewRootEl],xmerl_xml), io:format(IOF,"~s~n",[lists:flatten(Export)]),
The result would be:
If it is important to get similar indentation and newlines
as in the original document you have to add #xmlText{} records
with space and newline values in appropriate places. It may also
be necessary to keep the original prolog where the DTD is
referenced. If so, it is possible to pass a RootAttribute
Data = [#xmlText{value=" "}, {bike,[{year,"2003"},{color,"black"},{condition,"new"}], [#xmlText{value="\ "}, {name,[#xmlText{value="\ "}, {manufacturer,["Harley Davidsson"]}, #xmlText{value="\ "}, {brandName,["XL1200C"]}, #xmlText{value="\ "}, {additionalName,["Sportster"]}, #xmlText{value="\ "}]}, {engine,["V-engine, 2-cylinders, 1200 cc"]}, #xmlText{value="\ "}, {kind,["custom"]}, #xmlText{value="\ "}, {drive,["belt"]}, #xmlText{value="\ "}]}, #xmlText{value="\ "}], ... NewContent=Content++lists:flatten([Data]), NewRootEl=RootEl#xmlElement{content=NewContent}, ... Prolog = ["<?xml version=\\"1.0\\" encoding=\\"utf-8\\" ?> <!DOCTYPE motorcycles SYSTEM \\"motorcycles.dtd\\">\ "], Export=xmerl:export_simple([NewRootEl],xmerl_xml,[{prolog,Prolog}]), ...
The result will be:
Assume that you want to transform the
2> {Doc,Misc}=xmerl_scan:file('motorcycles.xml'). {{xmlElement,motorcycles, motorcycles, [], {xmlNamespace,[],[]}, [], 1, [], [{xmlText,[{motorcycles,1}],1,[],"\ ",text}, {xmlElement,bike, ... 3> DocHtml=xmerl:export([Doc],xmerl_html). ["<!DOCTYPE HTML PUBLIC \\"", "-//W3C//DTD HTML 4.01 Transitional//EN", "\\"", [], ">\ ", [[["<","motorcycles",">"], ["\ ", [["<", "bike", [[" ","year","=\\"","2000","\\""],[" ","color","=\\"","black","\\""]], ">"], ...
Will give the result
Perhaps you want to do something more arranged for human
reading. Suppose that you want to list all different brands in
the beginning with links to each group of motorcycles. You also
want all motorcycles sorted by brand, then some flashy colors
on top of it. Thus you rearrange the order of the elements and
put in arbitrary HTML tags. This is possible to do by means of
the
Even though the following example shows one way to transform data from XML to HTML it also applies to transformations to other formats.
First, some words about the xmerl_xs functionality:
You need to wright template functions to be able to control
what kind of output you want. Thus if you want to encapsulate a
template(E = #xmlElement{name='bike'}) -> ["<p>",xslapply(fun template/1,E),"</p>"];
With
template(E = #xmlElement{name='bike'}) -> ["<p>",xslapply(fun template/1,select("bike/name/manufacturer")),"</p>"];
If you want to output the content of an XML element or an attribute you will get the value as a string by the
template(E = #xmlElement{name='motorcycles'}) -> ["<p>",value_of(select("bike/name/manufacturer",E),"</p>"];
In the xmerl_xs functions you can provide a select(String)
call, which is an
Now, back to the example where we wanted to make the output more arranged. With the template:
template(E = #xmlElement{name='motorcycles'}) -> [ "<head>\ <title>motorcycles</title>\ </head>\ ", "<body>\ ", \011 "<h1>Used Motorcycles</h1>\ ", \011 "<ul>\ ", \011 remove_duplicates(value_of(select("bike/name/manufacturer",E))), \011 "\ </ul>\ ", \011 sort_by_manufacturer(xslapply(fun template/1, E)), "</body>\ ", \011 "</html>\ "];
We match on the top element and embed the inner parts in an
HTML body. Then we extract the string values of all motorcycle
brands, sort them and removes duplicates by
The next template matches on the
template(E = #xmlElement{name='bike'}) -> {value_of(select("name/manufacturer",E)),["<dt>",xslapply(fun template/1,select("name",E)),"</dt>", "<dd><ul>\ ", "<li style="color:green">Manufacturing year: ",xslapply(fun template/1,select("@year",E)),"</li>\ ", "<li style="color:red">Color: ",xslapply(fun template/1,select("@color",E)),"</li>\ ", "<li style="color:blue">Shape : ",xslapply(fun template/1,select("@condition",E)),"</li>\ ", "</ul></dd>\ "]};
This creates a tuple with the brand of the motorcycle and
the output format. We use the brand name only for sorting
purpose. We have to end the template function with the "built
in clause"
The entire program is motorcycles2html.erl:
If we run it like this: