diff options
Diffstat (limited to 'lib/xmerl/doc/examples/xml')
-rwxr-xr-x | lib/xmerl/doc/examples/xml/test.xml | 6 | ||||
-rwxr-xr-x | lib/xmerl/doc/examples/xml/test2.xml | 8 | ||||
-rwxr-xr-x | lib/xmerl/doc/examples/xml/test3.xml | 8 | ||||
-rwxr-xr-x | lib/xmerl/doc/examples/xml/test4.xml | 9 | ||||
-rwxr-xr-x | lib/xmerl/doc/examples/xml/test5.xml | 9 | ||||
-rwxr-xr-x | lib/xmerl/doc/examples/xml/testdtd.dtd | 17 | ||||
-rwxr-xr-x | lib/xmerl/doc/examples/xml/xmerl.xml | 523 | ||||
-rw-r--r-- | lib/xmerl/doc/examples/xml/xmerl_xs.xml | 541 |
8 files changed, 1121 insertions, 0 deletions
diff --git a/lib/xmerl/doc/examples/xml/test.xml b/lib/xmerl/doc/examples/xml/test.xml new file mode 100755 index 0000000000..e803a83560 --- /dev/null +++ b/lib/xmerl/doc/examples/xml/test.xml @@ -0,0 +1,6 @@ +<?xml version="1.0" ?> +<People> + <Person Type = "Personal"> + </Person> +</People> + diff --git a/lib/xmerl/doc/examples/xml/test2.xml b/lib/xmerl/doc/examples/xml/test2.xml new file mode 100755 index 0000000000..0cb11194fc --- /dev/null +++ b/lib/xmerl/doc/examples/xml/test2.xml @@ -0,0 +1,8 @@ +<?xml version="1.0" encoding = "ISO-8859-1" ?> +<People> + <!-- This is a real comment --> + <comment>This is a comment</comment> + <Person Type = "Personal"> + </Person> +</People> + diff --git a/lib/xmerl/doc/examples/xml/test3.xml b/lib/xmerl/doc/examples/xml/test3.xml new file mode 100755 index 0000000000..dbdc1e62c2 --- /dev/null +++ b/lib/xmerl/doc/examples/xml/test3.xml @@ -0,0 +1,8 @@ +<?xml version="1.0" encoding = 'ISO-8859-1' ?> +<People> + <!-- This is a real comment --> + <comment>This is a comment</comment> + <Person Type = "Personal"> + </Person> +</People> + diff --git a/lib/xmerl/doc/examples/xml/test4.xml b/lib/xmerl/doc/examples/xml/test4.xml new file mode 100755 index 0000000000..e9d85b8d8f --- /dev/null +++ b/lib/xmerl/doc/examples/xml/test4.xml @@ -0,0 +1,9 @@ +<?xml version="1.0" encoding = 'ISO-8859-1' ?> +<People> + <!-- This is a real comment --> + <comment> + This is a comment + </comment> + <Person Type = "Personal"> + </Person> +</People> diff --git a/lib/xmerl/doc/examples/xml/test5.xml b/lib/xmerl/doc/examples/xml/test5.xml new file mode 100755 index 0000000000..e9d85b8d8f --- /dev/null +++ b/lib/xmerl/doc/examples/xml/test5.xml @@ -0,0 +1,9 @@ +<?xml version="1.0" encoding = 'ISO-8859-1' ?> +<People> + <!-- This is a real comment --> + <comment> + This is a comment + </comment> + <Person Type = "Personal"> + </Person> +</People> diff --git a/lib/xmerl/doc/examples/xml/testdtd.dtd b/lib/xmerl/doc/examples/xml/testdtd.dtd new file mode 100755 index 0000000000..2ce1c513a6 --- /dev/null +++ b/lib/xmerl/doc/examples/xml/testdtd.dtd @@ -0,0 +1,17 @@ +<!ELEMENT PARAMETER ( #PCDATA | PARAMETER )* > +<!ATTLIST PARAMETER NR ( 1000024 | 1000025 | 1000101 | 1000102 | 1000103 +| 1000105 | 1000110 | 1000115 | 1000198 ) #REQUIRED > +<!ATTLIST PARAMETER UNIT CDATA #REQUIRED > + +<!ELEMENT PRODUCT ( USER_DEF, PRODUCTELEMENT+ ) > +<!ATTLIST PRODUCT CUSTOMER CDATA #REQUIRED > +<!ATTLIST PRODUCT DESCRIPTION CDATA #REQUIRED > +<!ATTLIST PRODUCT GENERATOR NMTOKEN #REQUIRED > +<!ATTLIST PRODUCT PRODUCTID NMTOKEN #REQUIRED > + +<!ELEMENT PRODUCTELEMENT ( PARAMETER+ ) > +<!ATTLIST PRODUCTELEMENT ELEMENTID CDATA #REQUIRED > +<!ATTLIST PRODUCTELEMENT TYPE NMTOKEN #REQUIRED > + +<!ELEMENT USER_DEF ( #PCDATA ) > + diff --git a/lib/xmerl/doc/examples/xml/xmerl.xml b/lib/xmerl/doc/examples/xml/xmerl.xml new file mode 100755 index 0000000000..f02282dbef --- /dev/null +++ b/lib/xmerl/doc/examples/xml/xmerl.xml @@ -0,0 +1,523 @@ +<?xml version="1.0" encoding="iso-8859-1"?> +<!DOCTYPE article + PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN" + "http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd"> + +<article lang="en" xml:lang="en" > + <articleinfo> + <title>XMerL - XML processing tools for Erlang</title> + <subtitle>Reference Manual</subtitle> + <authorgroup> + <author> + <firstname>Ulf</firstname> + <surname>Wiger</surname> + </author> + </authorgroup> + <revhistory> + <revision> + <revnumber>1.0</revnumber><date>2003-02-04</date> + <revremark>Converted xml from html</revremark> + </revision> + </revhistory> + <abstract> + <para>XMerL tools contains xmerl_scan; a non-validating XML + processor, xmerl_xpath; a XPath implementation, xmerl for export + of XML trees to HTML, XML or text and xmerl_xs for XSLT like + transforms in erlang. + </para> + </abstract> + </articleinfo> + + <section> + <title>xmerl_scan - the XML processor</title> + <para>The (non-validating) XML processor is activated through + <computeroutput>xmerl_scan:string/[1,2]</computeroutput> or + <computeroutput>xmerl_scan:file/[1,2]</computeroutput>. + It returns records of the type defined in xmerl.hrl. + </para> + + <para>As far as I can tell, xmerl_scan implements the complete XML + 1.0 spec, including:</para> + <itemizedlist> + <listitem><para>entity expansion</para></listitem> + <listitem><para>fetching and parsing external DTDs</para></listitem> + <listitem><para>contitional processing</para></listitem> + <listitem><para>UniCode</para></listitem> + <listitem><para>XML Names</para></listitem> + </itemizedlist> + <programlisting> +xmerl_scan:string(Text [ , Options ]) -> #xmlElement{}. +xmerl_scan:file(Filename [ , Options ]) -> #xmlElement{}. </programlisting> + + <para>The Options are basically to specify the behaviour of the + scanner. See the source code for details, but you can specify + funs to handle scanner events (event_fun), process the document + entities once identified (hook_fun), and decide what to do if the + scanner runs into eof before the document is complete + (continuation_fun).</para> + + <para>You can also specify a path (fetch_path) as a list of + directories to search when fetching files. If the file in question + is not in the fetch_path, the URI will be used as a file + name.</para> + + + <section> + <title>Customization functions</title> + <para>The XML processor offers a number of hooks for + customization. These hooks are defined as function objects, and + can be provided by the caller.</para> + + <para>The following customization functions are available. If + they also have access to their own state variable, the access + function for this state is identified within parentheses:</para> + + <itemizedlist> + + <listitem><para>event function (<computeroutput> + xmerl_scan:event_state/[1,2] + </computeroutput>)</para></listitem> + + <listitem><para>hook function (<computeroutput> + xmerl_scan:hook_state/[1,2] + </computeroutput>)</para></listitem> + + <listitem><para>fetch function (<computeroutput> + xmerl_scan:fetch_state/[1,2] </computeroutput>) + </para></listitem> + + <listitem><para>continuation function (<computeroutput> + xmerl_scan:cont_state/[1,2] </computeroutput>) + </para></listitem> + + <listitem><para>rules function (<computeroutput> + xmerl_scan:rules_state/[1,2] </computeroutput>) + </para></listitem> + + <listitem><para>accumulator function</para></listitem> + + <listitem><para>close function</para></listitem> + + </itemizedlist> + + <para>For all of the above state access functions, the function + with one argument + (e.g. <computeroutput>event_fun(GlobalState)</computeroutput>) + will read the state variable, while the function with two + arguments (e.g.: <computeroutput>event_fun(NewStateData, + GlobalState)</computeroutput>) will modify it.</para> + + <para>For each function, the description starts with the syntax + for specifying the function in the + <computeroutput>Options</computeroutput> list. The general forms + are <computeroutput>{Tag, Fun}</computeroutput>, or + <computeroutput>{Tag, Fun, LocalState}</computeroutput>. The + second form can be used to initialize the state variable in + question.</para> + + <section> + <title>User State</title> + + <para>All customization functions are free to access a + "User state" variable. Care must of course be taken + to coordinate the use of this state. It is recommended that + functions, which do not really have anything to contribute to + the "global" user state, use their own state + variable instead. Another option (used in + e.g. <computeroutput>xmerl_eventp.erl</computeroutput>) is for + customization functions to share one of the local states (in + <computeroutput>xmerl_eventp.erl</computeroutput>, the + continuation function and the fetch function both acces the + <computeroutput>cont_state</computeroutput>.)</para> + + <para>Functions to access user state:</para> + + <itemizedlist> + + <listitem><para><computeroutput> + xmerl_scan:user_state(GlobalState) </computeroutput> + </para></listitem> + + <listitem><para><computeroutput>xmerl_scan:user_state(UserState', + GlobalState) </computeroutput></para></listitem> + + </itemizedlist> + + </section> + <section> + <title>Event Function</title> + + <para><computeroutput>{event_fun, fun()} | {event_fun, fun(), + LocalState}</computeroutput></para> + + <para>The event function is called at the beginning and at the + end of a parsed entity. It has the following format and + semantics:</para> + +<programlisting> +<![CDATA[ +fun(Event, GlobalState) -> + EventState = xmerl_scan:event_state(GlobalState), + EventState' = foo(Event, EventState), + GlobalState' = xmerl_scan:event_state(EventState', GlobalState) +end. +]]></programlisting> + + </section> + <section> + <title>Hook Function</title> + <para> <computeroutput>{hook_fun, fun()} | {hook_fun, fun(), + LocalState}</computeroutput></para> + + + +<para>The hook function is called when the processor has parsed a complete +entity. Format and semantics:</para> + +<programlisting> +<![CDATA[ +fun(Entity, GlobalState) -> + HookState = xmerl_scan:hook_state(GlobalState), + {TransformedEntity, HookState'} = foo(Entity, HookState), + GlobalState' = xmerl_scan:hook_state(HookState', GlobalState), + {TransformedEntity, GlobalState'} +end. +]]></programlisting> + + <para>The relationship between the event function, the hook + function and the accumulator function is as follows:</para> + + <orderedlist> + <listitem><para>The event function is first called with an + 'ended' event for the parsed entity.</para></listitem> + + <listitem><para>The hook function is called, possibly + re-formatting the entity.</para></listitem> + + <listitem><para>The acc function is called in order to + (optionally) add the re-formatted entity to the contents of + its parent element.</para></listitem> + + </orderedlist> + + </section> + <section> + <title>Fetch Function</title> +<para> +<computeroutput>{fetch_fun, fun()} | {fetch_fun, fun(), LocalState}</computeroutput> +</para> +<para>The fetch function is called in order to fetch an external resource +(e.g. a DTD).</para> + +<para>The fetch function can respond with three different return values:</para> + + <programlisting> +<![CDATA[ + Result ::= + {ok, GlobalState'} | + {ok, {file, Filename}, GlobalState'} | + {ok, {string, String}, GlobalState'} +]]></programlisting> + +<para>Format and semantics:</para> + + <programlisting> +<![CDATA[ +fun(URI, GlobalState) -> + FetchState = xmerl_scan:fetch_state(GlobalState), + Result = foo(URI, FetchState). % Result being one of the above +end. +]]></programlisting> + + </section> + <section> + <title>Continuation Function</title> +<para> +<computeroutput>{continuation_fun, fun()} | {continuation_fun, fun(), LocalState}</computeroutput> +</para> +<para>The continuation function is called when the parser encounters the end +of the byte stream. Format and semantics:</para> + + <programlisting> +<![CDATA[ +fun(Continue, Exception, GlobalState) -> + ContState = xmerl_scan:cont_state(GlobalState), + {Result, ContState'} = get_more_bytes(ContState), + GlobalState' = xmerl_scan:cont_state(ContState', GlobalState), + case Result of + [] -> + GlobalState' = xmerl_scan:cont_state(ContState', GlobalState), + Exception(GlobalState'); + MoreBytes -> + {MoreBytes', Rest} = end_on_whitespace_char(MoreBytes), + ContState'' = update_cont_state(Rest, ContState'), + GlobalState' = xmerl_scan:cont_state(ContState'', GlobalState), + Continue(MoreBytes', GlobalState') + end +end. +]]></programlisting> + </section> + <section> + <title>Rules Functions</title> + <para> +<computeroutput> +{rules, ReadFun : fun(), WriteFun : fun(), LocalState} | +{rules, Table : ets()}</computeroutput> +</para> + <para>The rules functions take care of storing scanner + information in a rules database. User-provided rules functions + may opt to store the information in mnesia, or perhaps in the + user_state(LocalState).</para> + + <para>The following modes exist:</para> + + <itemizedlist> + + <listitem><para>If the user doesn't specify an option, the + scanner creates an ets table, and uses built-in functions to + read and write data to it. When the scanner is done, the ets + table is deleted.</para></listitem> + + <listitem><para>If the user specifies an ets table via the + <computeroutput>{rules, Table}</computeroutput> option, the + scanner uses this table. When the scanner is done, it does + <emphasis>not</emphasis> delete the table.</para></listitem> + + <listitem><para>If the user specifies read and write + functions, the scanner will use them instead.</para></listitem> + + </itemizedlist> + + <para>The format for the read and write functions are as + follows:</para> + + +<programlisting> +<![CDATA[ +WriteFun(Context, Name, Definition, ScannerState) -> NewScannerState. +ReadFun(Context, Name, ScannerState) -> Definition | undefined. +]]></programlisting> + + <para>Here is a summary of the data objects currently being + written by the scanner:</para> + + <table> + <title>Scanner data objects</title> + <tgroup cols="3"> + <thead> + <row> + <entry>Context</entry> + <entry>Key Value</entry> + <entry>Definition</entry> + </row> + </thead> + <tbody> + <row> + <entry>notation</entry> + <entry>NotationName</entry> + <entry><computeroutput>{system, SL} | {public, PIDL, SL}</computeroutput></entry> + </row> + <row> + <entry>elem_def</entry> + <entry>ElementName</entry> + <entry><computeroutput>#xmlElement{content = ContentSpec}</computeroutput></entry> + </row> + <row> + <entry>parameter_entity</entry> + <entry>PEName</entry> + <entry><computeroutput>PEDef</computeroutput></entry> + </row> + <row> + <entry>entity</entry> + <entry>EntityName</entry> + <entry><computeroutput>EntityDef</computeroutput></entry> + </row> + </tbody> + </tgroup> + </table> + + +<programlisting> +<![CDATA[ +ContentSpec ::= empty | any | ElemContent +ElemContent ::= {Mode, Elems} +Mode ::= seq | choice +Elems ::= [Elem] +Elem ::= '#PCDATA' | Name | ElemContent | {Occurrence, Elems} +Occurrence ::= '*' | '?' | '+' +]]></programlisting> + <note><para>When <Elem> is not wrapped with +<Occurrence>, (Occurrence = once) is implied.</para></note> + + </section> + <section> + <title>Accumulator Function</title> + <para><computeroutput>{acc_fun, fun()} | {acc_fun, fun(), + LocalState}</computeroutput></para> + + <para>The accumulator function is called to accumulate the + contents of an entity.When parsing very large files, it may + not be desireable to do so.In this case, an acc function can + be provided that simply doesn't accumulate.</para> + + <para>Note that it is possible to even modify the parsed + entity before accumulating it, but this must be done with + care. <computeroutput>xmerl_scan</computeroutput> performs + post-processing of the element for namespace management. Thus, + the element must keep its original structure for this to + work.</para> + + <para>The acc function has the following format and + semantics:</para> + + <programlisting> +<![CDATA[ +%% default accumulating acc fun +fun(ParsedEntity, Acc, GlobalState) -> + {[X|Acc], GlobalState}. + +%% non-accumulating acc fun +fun(ParsedEntity, Acc, GlobalState) -> + {Acc, GlobalState}. +]]></programlisting> + </section> + <section> + <title>Close Function</title> + + <para>The close function is called when a document (either the + main document or an external DTD) has been completely + parsed. When xmerl_scan was started using + <computeroutput>xmerl_scan:file/[1,2]</computeroutput>, the + file will be read in full, and closed immediately, before the + parsing starts, so when the close function is called, it will + not need to actually close the file. In this case, the close + function will be a good place to modify the state + variables.</para> + + <para>Format and semantics:</para> + + <programlisting> +<![CDATA[ +fun(GlobalState) -> + GlobalState' = .... % state variables may be altered +]]></programlisting> + </section> + + </section> + + </section> + + <section> + <title>XPATH</title> + + <programlisting> +<![CDATA[ +xmerl_xpath:string(QueryString, #xmlElement{}) -> + [DocEntity] + +DocEntity : #xmlElement{} + | #xmlAttribute{} + | #xmlText{} + | #xmlPI{} + | #xmlComment{} +]]></programlisting> + + <para>The xmerl_xpath module does seem to handle the entire XPATH + 1.0 spec, but I haven't tested that much yet. The grammar is + defined in + <computeroutput>xmerl_xpath_parse.yrl</computeroutput>. The core + functions are defined in + <computeroutput>xmerl_xpath_pred.erl</computeroutput>.</para> + </section> + <section> + <title>Some useful shell commands for debugging the XPath parser</title> +<para> + <command> +<![CDATA[ +c(xmerl_xpath_scan). +yecc:yecc("xmerl_xpath_parse.yrl", "xmerl_xpath_parse", true, []). +c(xmerl_xpath_parse). + +xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("position() > -1")). +xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("5 * 6 div 2")). +xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("5 + 6 mod 2")). +xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("5 * 6")). +xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("5 * 6")). +xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("-----6")). +xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("parent::node()")). +xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("descendant-or-self::node()")). +xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("parent::processing-instruction('foo')")).]]></command></para> + </section> + <section> + <title>Erlang Data Structure Export</title> + + <para>The idea as follows:</para> + + <para>The Erlang data structure should look like this:</para> + <programlisting> +<![CDATA[ +Element: {Tag, Attributes, Content} +Tag : atom() +Attributes: [{Key, Value}] +Content: [String | Element] +String: [char() | binary() | String] +]]></programlisting> + + <para>Some short forms are allowed:</para> + <programlisting> +<![CDATA[ +{Tag, Content} -> {Tag, [], Content} +Tag -> {Tag, [], []} +]]></programlisting> + + <para>Note that content lists must be flat, but strings can be + deep.</para> + + <para>It is also allowed to include normal + <computeroutput>#xml...</computeroutput> elements in the simple + format.</para> + + <para><computeroutput>xmerl:export_simple(Data, + Callback)</computeroutput> takes the above data structure and + exports it, using the callback module + <computeroutput>Callback</computeroutput>.</para> + + <para>The callback module should contain hook functions for all + tags present in the data structure. The hook function must have + the format:</para> + <para><computeroutput> Tag(Data, Attrs, Parents, E) + </computeroutput></para> + + <para>where E is an <computeroutput>#xmlElement{}</computeroutput> + record (see <computeroutput>xmerl.hrl</computeroutput>).</para> + + <para>Attrs is converted from the simple <computeroutput>[{Key, + Value}]</computeroutput> to + <computeroutput>[#xmlAttribute{}]</computeroutput></para> + + <para>Parents is a list of <computeroutput>[{ParentTag, + ParentTagPosition}]</computeroutput>.</para> + + <para>The hook function should return either the Data to be + exported, or the tuple <computeroutput>{'#xml-redefine#', + NewStructure}</computeroutput>, where + <computeroutput>NewStructure</computeroutput> is an element (which + can be simple), or a (simple-) content list wrapped in a 1-tuple + as <computeroutput>{NewContent}</computeroutput>.</para> + + <para>The callback module can inherit definitions from other + callback modules, through the required function + <computeroutput>'#xml-interitance#() -> + [ModuleName]</computeroutput>. </para> + + <para>As long as a tag is represented in one of the callback + modules, things will work. It is of course also possible to + redefine a tag.</para> + <section> + <title>XSLT like transforms</title> + <para>See separate document <ulink url="xmerl_xs.html" >xmerl_xs.html + </ulink></para>. + </section> + </section> + +</article> diff --git a/lib/xmerl/doc/examples/xml/xmerl_xs.xml b/lib/xmerl/doc/examples/xml/xmerl_xs.xml new file mode 100644 index 0000000000..9a798808b9 --- /dev/null +++ b/lib/xmerl/doc/examples/xml/xmerl_xs.xml @@ -0,0 +1,541 @@ +<?xml version="1.0" encoding="iso-8859-1"?> +<!DOCTYPE article + PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN" + "http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd"> + +<article lang="en" xml:lang="en" > + <articleinfo> + <title>XSLT like transformations in Erlang </title> + <subtitle>User Guide</subtitle> + <authorgroup> + <author> + <firstname>Mikael</firstname> + <surname>Karlsson</surname> + </author> + </authorgroup> + <revhistory> + <revision> + <revnumber>1.0</revnumber><date>2002-10-25</date> + <revremark>First Draft</revremark> + </revision> + <revision> + <revnumber>1.1</revnumber><date>2003-02-05</date> + <revremark>Moved module xserl to xmerl application, renamed to + xmerl_xs</revremark> + </revision> + </revhistory> + <abstract> + <para>Erlang has similarities to XSLT since both languages + have a functional programming approach. Using the xpath implementation + in the existing xmerl application it is possible to write XSLT + like transforms in Erlang. One can also combine the + transformations with the erlang scripting possibility + in the yaws webserver to implement "on the fly" html + conversions of xml documents. + </para> + </abstract> + </articleinfo> + + + <section> + <title>Terminology</title> + <variablelist> + <varlistentry> + <term>XML</term> + <listitem> + <para>Extensible Markup Language</para> + </listitem> + </varlistentry> + <varlistentry> + <term>XSLT</term> + <listitem> + <para>Extensible Stylesheet Language: Transformations</para> + </listitem> + </varlistentry> + </variablelist> + </section> + <section> + <title>Introduction</title> + <para>XSLT stylesheets are often used when transforming XML + documents, to other XML documents or (X)HTML for presentation. + There are a number of brick-sized books written on the + topic. XSLT contains quite many + functions and learning them all may take some effort, which + could be a reason why the author only has reached a basic level of + understanding. This document assumes a basic level of + understanding of XSLT. + </para> + <para>Since XSLT is based on a functional programming approach + with pattern matching and recursion it is possible to write + similar style sheets in Erlang. At least for basic + transforms. XPath which is used in XSLT is also already + implemented in the xmerl application written i Erlang. This + document describes how to use the XPath implementation together + with Erlangs pattern matching and a couple of functions to write + XSLT like transforms.</para> + <para>This approach is probably easier for an Erlanger but + if you need to use real XSLT stylesheets in order to "comply to + the standard" there is an adapter available to the Sablotron + XSLT package which is written i C++. + </para> + <para> + This document is written in the Simplified Docbook DTD which is + a subset of the complete one and converted to xhtml using a + stylesheet written in Erlang. + </para> + </section> + + <section> + <title>Tools</title> + <section> + <title>xmerl</title> + <para><ulink url="http://sowap.sourceforge.net/" >xmerl</ulink> + is a xml parser written in Erlang</para> + <section> + <title>xmerl_xpath</title> + <para>XPath is in important part of XSLT and is implemented in + xmerl</para> + </section> + <section> + <title>xmerl_xs</title> + <para> + <ulink url="xmerl_xs.yaws" >xmerl_xs</ulink> is a very small + module acting as "syntactic sugar" for the XSLT lookalike + transforms. It uses xmerl_xpath. + </para> + </section> + </section> + + <section> + <title>yaws</title> + <para> + <ulink url="http://yaws.hyber.org/" >Yaws</ulink>, Yet Another + Webserver, is a web server written in Erlang that support dynamic + content generation using embedded scripts, also written in Erlang. + </para> +<!-- + <figure> + <title>The Yaws logo</title> + <mediaobject> + <imageobject> + <imagedata fileref="yaws_pb.gif" format="GIF" scale="50%"/> + </imageobject> + </mediaobject> + </figure> +--> + <para>Yaws is not needed to make the XSLT like transformations, but + combining yaws and xmerl it is possible to do transformations + of XML documents to HTML in realtime, when clients requests a + web page. As an example I am able to edit this document using + emacs with psgml tools, save the document and just do a reload + in my browser to see the result. The parse/transform time is not + visually different compared to loading any other document in the + browser. + </para> + </section> + + </section> + + <section> + <title>Transformations</title> +<para> + When xmerl_scan parses an xml string/file it returns a record of: +</para> + <programlisting> +<![CDATA[ + -record(xmlElement, { + name, + parents = [], + pos, + attributes = [], + content = [], + language = [], + expanded_name = [], + nsinfo = [],% {Prefix, Local} | [] + namespace = #xmlNamespace{} + }). + ]]> +</programlisting> +<para> + Were content is a mixed list of yet other xmlElement records and/or + xmlText (or other node types). +</para> + <section> + <title>xmerl_xs functions</title> + <para> + Functions used: + </para> + <variablelist> + <varlistentry> + <term>xslapply/2</term> + <listitem> + <para>function to make things look similar + to xsl:apply-templates. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>value_of/1</term> + <listitem> + <para>Conatenates all text nodes within a tree.</para> + </listitem> + </varlistentry> + <varlistentry> + <term>select/2</term> + <listitem> + <para>select(Str, E) extracts nodes from the XML tree using + xmerl_xpath. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term>built_in_rules/2</term> + <listitem> + <para>The default fallback behaviour, template funs should + end with: + <computeroutput>template(E)->built_in_rules(fun + template/1, E). +</computeroutput> + </para> + </listitem> + </varlistentry> + </variablelist> +<note><para>Text is escaped using xmerl_lib:export_text/1 for + "<", ">" and other relevant xml + characters when exported. So the value_of/1 and built_in_rules/2 + functions should be replaced when not exporting to xml or html. +</para></note> + </section> + + +<section><title>Examples</title> + <example> + <title>Using xslapply</title> + <para>original XSLT:</para> + <programlisting> +<![CDATA[ + <xsl:template match="doc/title"> + <h1> + <xsl:apply-templates/> + </h1> + </xsl:template> + ]]> + </programlisting> + <para> + becomes in Erlang:</para> + <programlisting> +<![CDATA[ + template(E = #xmlElement{ parents=[{'doc',_}|_], name='title'}) -> + ["<h1>", + xslapply(fun template/1, E), + "</h1>"]; + ]]> + </programlisting> + + </example> + <example> + <title>Using value_of and select</title> + <programlisting> +<![CDATA[ + <xsl:template match="title"> + <div align="center"><h1><xsl:value-of select="." /></h1></div> + </xsl:template> + ]]> + </programlisting> + <para> + becomes: + </para> + <programlisting> +<![CDATA[ +template(E = #xmlElement{name='title'}) -> + ["<div align=\"center\"><h1>", value_of(select(".", E)), "</h1></div>"]; + ]]> + </programlisting> + </example> + <example> + <title>Simple xsl stylesheet</title> +<para> + A complete example with the XSLT sheet in the xmerl distribution. +</para> + <programlisting> +<![CDATA[ + +<xsl:stylesheet version="1.0" + xmlns:xsl="http://www.w3.org/1999/XSL/Transform" + xmlns="http://www.w3.org/TR/xhtml1/strict"> + + <xsl:strip-space elements="doc chapter section"/> + <xsl:output + method="xml" + indent="yes" + encoding="iso-8859-1" + /> + + <xsl:template match="doc"> + <html> + <head> + <title> + <xsl:value-of select="title"/> + </title> + </head> + <body> + <xsl:apply-templates/> + </body> + </html> + </xsl:template> + + <xsl:template match="doc/title"> + <h1> + <xsl:apply-templates/> + </h1> + </xsl:template> + + <xsl:template match="chapter/title"> + <h2> + <xsl:apply-templates/> + </h2> + </xsl:template> + + <xsl:template match="section/title"> + <h3> + <xsl:apply-templates/> + </h3> + </xsl:template> + + <xsl:template match="para"> + <p> + <xsl:apply-templates/> + </p> + </xsl:template> + + <xsl:template match="note"> + <p class="note"> + <b>NOTE: </b> + <xsl:apply-templates/> + </p> + </xsl:template> + + <xsl:template match="emph"> + <em> + <xsl:apply-templates/> + </em> + </xsl:template> + +</xsl:stylesheet> + ]]> + </programlisting> + </example> + <example> + <title>Erlang version</title> + <para> + Erlang transformation of previous example: + </para> + <programlisting> +<![CDATA[ + +-include("xmerl.hrl"). + +-import(xmerl_xs, + [ xslapply/2, value_of/1, select/2, built_in_rules/2 ]). + +doctype()-> + "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\ + \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd \">". + +process_xml(Doc)-> + template(Doc). + +template(E = #xmlElement{name='doc'})-> + [ "<\?xml version=\"1.0\" encoding=\"iso-8859-1\"\?>", + doctype(), + "<html xmlns=\"http://www.w3.org/1999/xhtml\" >" + "<head>" + "<title>", value_of(select("title",E)), "</title>" + "</head>" + "<body>", + xslapply( fun template/1, E), + "</body>" + "</html>" ]; + + +template(E = #xmlElement{ parents=[{'doc',_}|_], name='title'}) -> + ["<h1>", + xslapply( fun template/1, E), + "</h1>"]; + +template(E = #xmlElement{ parents=[{'chapter',_}|_], name='title'}) -> + ["<h2>", + xslapply( fun template/1, E), + "</h2>"]; + +template(E = #xmlElement{ parents=[{'section',_}|_], name='title'}) -> + ["<h3>", + xslapply( fun template/1, E), + "</h3>"]; + +template(E = #xmlElement{ name='para'}) -> + ["<p>", xslapply( fun template/1, E), "</p>"]; + +template(E = #xmlElement{ name='note'}) -> + ["<p class=\"note\">" + "<b>NOTE: </b>", + xslapply( fun template/1, E), + "</p>"]; + +template(E = #xmlElement{ name='emph'}) -> + ["<em>", xslapply( fun template/1, E), "</em>"]; + +template(E)-> + built_in_rules( fun template/1, E). + ]]> + </programlisting> + <para> + It is important to end with a call to + <computeroutput>xmerl_xs:built_in_rules/2</computeroutput> + if you want any text to be written in "push" transforms. + That are the ones using a lot <computeroutput>xslapply( fun + template/1, E )</computeroutput> instead of + <computeroutput>value_of(select("xpath",E))</computeroutput>, + which is pull... + </para> + </example> +<para>The largest example is the stylesheet to transform this document + from the Simplified Docbook XML format to xhtml. The source + file is <computeroutput>sdocbook2xhtml.erl</computeroutput>. +</para> +</section> + <section> + <title>Tips and tricks</title> + <section> + <title>for-each</title> + <para>The function for-each is quite common in XSLT stylesheets. + It can often be rewritten and replaced by select/1. Since + select/1 returns a list of #xmlElements and xslapply/2 + traverses them it is more or less the same as to loop over all + the elements. + </para> + </section> + <section> + <title>position()</title> + <para>The XSLT position() and #xmlElement.pos are not the + same. One has to make an own position in Erlang.</para> + <example> + <title>Counting positions</title> + <programlisting> +<![CDATA[ +<xsl:template match="stanza"> + <p><xsl:apply-templates select="line" /></p> +</xsl:template> + +<xsl:template match="line"> + <xsl:if test="position() mod 2 = 0">  </xsl:if> + <xsl:value-of select="." /><br /> +</xsl:template> + ]]> + </programlisting> +<para>Can be written as</para> + <programlisting> +<![CDATA[ +template(E = #xmlElement{name='stanza'}) -> + {Lines,LineNo} = lists:mapfoldl(fun template_pos/2, 1, select("line", E)), + ["<p>", Lines, "</p>"]. + +template_pos(E = #xmlElement{name='line'}, P) -> + {[indent_line(P rem 2), value_of(E#xmlElement.content), "<br />"], P + 1 }. + +indent_line(0)->"  "; +indent_line(_)->"". + ]]> + </programlisting> + </example> + </section> + <section> + <title>Global tree awareness</title> + <para>In XSLT you have "root" access to the top of the tree + with XPath, even though you are somewhere deep in your + tree.</para> + <para>The xslapply/2 function only carries back the child part + of the tree to the template fun. But it is quite easy to write + template funs that handles both the child and top tree.</para> + <example> + <title>Passing the root tree</title> + <para>The following example piece will prepend the article + title to any section title</para> + <programlisting> +<![CDATA[ +template(E = #xmlElement{name='title'}, ETop ) -> + ["<h3>", value_of(select("title", ETop))," - ", + xslapply( fun(A) -> template(A, ETop) end, E), + "</h3>"]; + ]]> + </programlisting> + </example> + </section> + </section> + + </section> + + + <section> + <title>Utility functions</title> + <para> + The module xmerl_xs contains the functions + <computeroutput>mapxml/2, foldxml/3</computeroutput> and + <computeroutput> mapfoldxml/3</computeroutput> to traverse + <literal>#xmlElement</literal> trees. They can be used in order + to build cross-references, see sdocbook2xhtml.erl for instance + where <computeroutput>foldxml/3</computeroutput> and + <computeroutput> mapfoldxml/3</computeroutput> are used to + number chapters, examples and figures and to build the Table of + contents for the document. + </para> + </section> + + + <section> + <title>Future enhancements</title> + <para> + More wish- than task-list at the moment. + </para> + <itemizedlist> + <listitem> + <para>More stylesheets</para> + </listitem> + <listitem> + <para>On the fly exports to PDF for printing and also more + "polished" presentations. + </para> + </listitem> + </itemizedlist> + </section> + + <section> + <title>References</title> + <orderedlist> + <listitem> + <para><ulink url="../xml/xmerl_xs.xml" >XML source + file</ulink> for this document. + </para> + </listitem> + <listitem> + <para><ulink url="../xs/sdocbook2xhtml.erl" >Erlang style + sheet</ulink> used for this document. (Simplified Docbook DTD).</para> + </listitem> + <listitem> + <para><ulink url="http://www.erlang.org/" >Open Source Erlang</ulink> + </para> + </listitem> + </orderedlist> + + </section> +</article> + +<!-- +Local Variables: +mode: xml +sgml-indent-step: 2 +sgml-indent-data: t +sgml-set-face: t +sgml-insert-missing-element-comment: nil +End: +--> |