xmerl: Add doc/examples directory

Needed by the test suite.
author: Björn Gustavsson <[email protected]> 2010-09-02 16:56:23 +0200
committer: Lars Thorsen <[email protected]> 2011-05-10 09:13:22 +0200
commit: 1a5796cd12061ebb21e7e51a0b7bdf05ed4786a7 (patch)
tree: 7d4a0418919b15ebe2ca9993c3a4a9999f6de006 /lib/xmerl/doc/examples/xml
parent: e3af9123e7ef9291535cafbd0ecb9d3309d674f7 (diff)
download: otp-1a5796cd12061ebb21e7e51a0b7bdf05ed4786a7.tar.gz
otp-1a5796cd12061ebb21e7e51a0b7bdf05ed4786a7.tar.bz2
otp-1a5796cd12061ebb21e7e51a0b7bdf05ed4786a7.zip
8 files changed, 1121 insertions, 0 deletions
diff --git a/lib/xmerl/doc/examples/xml/test.xml b/lib/xmerl/doc/examples/xml/test.xml
new file mode 100755
index 0000000000..e803a83560
--- /dev/null
+++ b/lib/xmerl/doc/examples/xml/test.xml
@@ -0,0 +1,6 @@
+<?xml version="1.0" ?>
+<People>
+  <Person Type = "Personal">
+  </Person>
+</People>
+  
diff --git a/lib/xmerl/doc/examples/xml/test2.xml b/lib/xmerl/doc/examples/xml/test2.xml
new file mode 100755
index 0000000000..0cb11194fc
--- /dev/null
+++ b/lib/xmerl/doc/examples/xml/test2.xml
@@ -0,0 +1,8 @@
+<?xml version="1.0" encoding = "ISO-8859-1" ?>
+<People>
+  <!-- This is a real comment -->
+  <comment>This is a comment</comment>
+  <Person Type = "Personal">
+  </Person>
+</People>
+  
diff --git a/lib/xmerl/doc/examples/xml/test3.xml b/lib/xmerl/doc/examples/xml/test3.xml
new file mode 100755
index 0000000000..dbdc1e62c2
--- /dev/null
+++ b/lib/xmerl/doc/examples/xml/test3.xml
@@ -0,0 +1,8 @@
+<?xml version="1.0" encoding = 'ISO-8859-1' ?>
+<People>
+  <!-- This is a real comment -->
+  <comment>This is a comment</comment>
+  <Person Type = "Personal">
+  </Person>
+</People>
+  
diff --git a/lib/xmerl/doc/examples/xml/test4.xml b/lib/xmerl/doc/examples/xml/test4.xml
new file mode 100755
index 0000000000..e9d85b8d8f
--- /dev/null
+++ b/lib/xmerl/doc/examples/xml/test4.xml
@@ -0,0 +1,9 @@
+<?xml version="1.0" encoding = 'ISO-8859-1' ?>
+<People>
+  <!-- This is a real comment -->
+  <comment>
+	This is a comment
+  </comment>
+  <Person Type = "Personal">
+  </Person>
+</People>
diff --git a/lib/xmerl/doc/examples/xml/test5.xml b/lib/xmerl/doc/examples/xml/test5.xml
new file mode 100755
index 0000000000..e9d85b8d8f
--- /dev/null
+++ b/lib/xmerl/doc/examples/xml/test5.xml
@@ -0,0 +1,9 @@
+<?xml version="1.0" encoding = 'ISO-8859-1' ?>
+<People>
+  <!-- This is a real comment -->
+  <comment>
+	This is a comment
+  </comment>
+  <Person Type = "Personal">
+  </Person>
+</People>
diff --git a/lib/xmerl/doc/examples/xml/testdtd.dtd b/lib/xmerl/doc/examples/xml/testdtd.dtd
new file mode 100755
index 0000000000..2ce1c513a6
--- /dev/null
+++ b/lib/xmerl/doc/examples/xml/testdtd.dtd
@@ -0,0 +1,17 @@
+<!ELEMENT PARAMETER ( #PCDATA | PARAMETER )* >
+<!ATTLIST PARAMETER NR ( 1000024 | 1000025 | 1000101 | 1000102 | 1000103 
+| 1000105 | 1000110 | 1000115 | 1000198 ) #REQUIRED >
+<!ATTLIST PARAMETER UNIT CDATA #REQUIRED >
+
+<!ELEMENT PRODUCT ( USER_DEF, PRODUCTELEMENT+ ) >
+<!ATTLIST PRODUCT CUSTOMER CDATA #REQUIRED >
+<!ATTLIST PRODUCT DESCRIPTION CDATA #REQUIRED >
+<!ATTLIST PRODUCT GENERATOR NMTOKEN #REQUIRED >
+<!ATTLIST PRODUCT PRODUCTID NMTOKEN #REQUIRED >
+
+<!ELEMENT PRODUCTELEMENT ( PARAMETER+ ) >
+<!ATTLIST PRODUCTELEMENT ELEMENTID CDATA #REQUIRED >
+<!ATTLIST PRODUCTELEMENT TYPE NMTOKEN #REQUIRED >
+
+<!ELEMENT USER_DEF ( #PCDATA ) >
+
diff --git a/lib/xmerl/doc/examples/xml/xmerl.xml b/lib/xmerl/doc/examples/xml/xmerl.xml
new file mode 100755
index 0000000000..f02282dbef
--- /dev/null
+++ b/lib/xmerl/doc/examples/xml/xmerl.xml
@@ -0,0 +1,523 @@
+<?xml version="1.0" encoding="iso-8859-1"?>
+<!DOCTYPE article
+      PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN"
+      "http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd">
+
+<article  lang="en" xml:lang="en" >
+  <articleinfo>
+    <title>XMerL - XML processing tools for Erlang</title>
+    <subtitle>Reference Manual</subtitle>
+    <authorgroup>
+      <author>
+	<firstname>Ulf</firstname>
+	<surname>Wiger</surname>
+      </author>
+    </authorgroup>
+    <revhistory>
+      <revision>
+      <revnumber>1.0</revnumber><date>2003-02-04</date>
+      <revremark>Converted xml from html</revremark>
+      </revision>
+    </revhistory>
+    <abstract>
+      <para>XMerL tools contains xmerl_scan; a non-validating XML
+      processor, xmerl_xpath; a XPath implementation, xmerl for export
+      of XML trees to HTML, XML or text and xmerl_xs for XSLT like
+      transforms in erlang. 
+      </para>
+    </abstract>
+  </articleinfo>
+  
+  <section>
+    <title>xmerl_scan - the XML processor</title>
+    <para>The (non-validating) XML processor is activated through 
+    <computeroutput>xmerl_scan:string/[1,2]</computeroutput> or 
+    <computeroutput>xmerl_scan:file/[1,2]</computeroutput>.
+    It returns records of the type defined in xmerl.hrl.
+    </para>
+  
+    <para>As far as I can tell, xmerl_scan implements the complete XML
+    1.0 spec, including:</para>
+    <itemizedlist>
+      <listitem><para>entity expansion</para></listitem>
+      <listitem><para>fetching and parsing external DTDs</para></listitem>
+      <listitem><para>contitional processing</para></listitem>
+      <listitem><para>UniCode</para></listitem>
+      <listitem><para>XML Names</para></listitem>
+    </itemizedlist>
+    <programlisting>
+xmerl_scan:string(Text [ , Options ]) -> #xmlElement{}.
+xmerl_scan:file(Filename [ , Options ]) -> #xmlElement{}. </programlisting>
+    
+    <para>The Options are basically to specify the behaviour of the
+    scanner.  See the source code for details, but you can specify
+    funs to handle scanner events (event_fun), process the document
+    entities once identified (hook_fun), and decide what to do if the
+    scanner runs into eof before the document is complete
+    (continuation_fun).</para>
+
+    <para>You can also specify a path (fetch_path) as a list of
+    directories to search when fetching files. If the file in question
+    is not in the fetch_path, the URI will be used as a file
+    name.</para>
+
+
+    <section>
+      <title>Customization functions</title>
+      <para>The XML processor offers a number of hooks for
+      customization. These hooks are defined as function objects, and
+      can be provided by the caller.</para>
+      
+      <para>The following customization functions are available. If
+      they also have access to their own state variable, the access
+      function for this state is identified within parentheses:</para>
+
+      <itemizedlist>
+
+	<listitem><para>event function (<computeroutput>
+	xmerl_scan:event_state/[1,2]
+	</computeroutput>)</para></listitem>
+
+	<listitem><para>hook function (<computeroutput>
+	xmerl_scan:hook_state/[1,2]
+	</computeroutput>)</para></listitem>
+
+	<listitem><para>fetch function (<computeroutput>
+	xmerl_scan:fetch_state/[1,2] </computeroutput>)
+	</para></listitem>
+
+	<listitem><para>continuation function (<computeroutput>
+	xmerl_scan:cont_state/[1,2] </computeroutput>)
+	</para></listitem>
+
+	<listitem><para>rules function (<computeroutput>
+      xmerl_scan:rules_state/[1,2] </computeroutput>)
+      </para></listitem>
+
+	<listitem><para>accumulator function</para></listitem>
+
+	<listitem><para>close function</para></listitem>
+
+      </itemizedlist>
+
+      <para>For all of the above state access functions, the function
+      with one argument
+      (e.g. <computeroutput>event_fun(GlobalState)</computeroutput>)
+      will read the state variable, while the function with two
+      arguments (e.g.: <computeroutput>event_fun(NewStateData,
+      GlobalState)</computeroutput>) will modify it.</para>
+
+      <para>For each function, the description starts with the syntax
+      for specifying the function in the
+      <computeroutput>Options</computeroutput> list. The general forms
+      are <computeroutput>{Tag, Fun}</computeroutput>, or
+      <computeroutput>{Tag, Fun, LocalState}</computeroutput>. The
+      second form can be used to initialize the state variable in
+      question.</para>
+
+      <section>
+	<title>User State</title>
+
+	<para>All customization functions are free to access a
+	&quot;User state&quot; variable. Care must of course be taken
+	to coordinate the use of this state. It is recommended that
+	functions, which do not really have anything to contribute to
+	the &quot;global&quot; user state, use their own state
+	variable instead. Another option (used in
+	e.g. <computeroutput>xmerl_eventp.erl</computeroutput>) is for
+	customization functions to share one of the local states (in
+	<computeroutput>xmerl_eventp.erl</computeroutput>, the
+	continuation function and the fetch function both acces the
+	<computeroutput>cont_state</computeroutput>.)</para>
+
+	<para>Functions to access user state:</para>
+
+	<itemizedlist>
+
+	  <listitem><para><computeroutput>
+	  xmerl_scan:user_state(GlobalState) </computeroutput>
+	  </para></listitem>
+
+	  <listitem><para><computeroutput>xmerl_scan:user_state(UserState',
+	  GlobalState) </computeroutput></para></listitem>
+
+	</itemizedlist>
+
+      </section>
+      <section>
+	<title>Event Function</title>
+
+	<para><computeroutput>{event_fun, fun()} | {event_fun, fun(),
+	LocalState}</computeroutput></para>
+
+	<para>The event function is called at the beginning and at the
+	end of a parsed entity. It has the following format and
+	semantics:</para>
+
+<programlisting>
+<![CDATA[
+fun(Event, GlobalState) ->
+   EventState = xmerl_scan:event_state(GlobalState),
+   EventState' = foo(Event, EventState),
+   GlobalState' = xmerl_scan:event_state(EventState', GlobalState)
+end.
+]]></programlisting>
+
+      </section>
+      <section>
+	<title>Hook Function</title>
+	<para> <computeroutput>{hook_fun, fun()} | {hook_fun, fun(),
+	LocalState}</computeroutput></para>
+
+
+
+<para>The hook function is called when the processor has parsed a complete
+entity. Format and semantics:</para>
+
+<programlisting>
+<![CDATA[
+fun(Entity, GlobalState) ->
+   HookState = xmerl_scan:hook_state(GlobalState),
+   {TransformedEntity, HookState'} = foo(Entity, HookState),
+   GlobalState' = xmerl_scan:hook_state(HookState', GlobalState),
+   {TransformedEntity, GlobalState'}
+end.
+]]></programlisting>
+
+	<para>The relationship between the event function, the hook
+	function and the accumulator function is as follows:</para>
+
+	<orderedlist>
+	  <listitem><para>The event function is first called with an
+	  'ended' event for the parsed entity.</para></listitem>
+
+	  <listitem><para>The hook function is called, possibly
+	  re-formatting the entity.</para></listitem>
+
+	  <listitem><para>The acc function is called in order to
+	  (optionally) add the re-formatted entity to the contents of
+	  its parent element.</para></listitem>
+
+    </orderedlist>
+
+      </section>
+      <section>
+	<title>Fetch Function</title>
+<para>
+<computeroutput>{fetch_fun, fun()} | {fetch_fun, fun(), LocalState}</computeroutput>
+</para>
+<para>The fetch function is called in order to fetch an external resource
+(e.g. a DTD).</para>
+
+<para>The fetch function can respond with three different return values:</para>
+
+    <programlisting>
+<![CDATA[
+    Result ::=
+      {ok, GlobalState'} |
+      {ok, {file, Filename}, GlobalState'} |
+      {ok, {string, String}, GlobalState'}
+]]></programlisting>
+
+<para>Format and semantics:</para>
+
+    <programlisting>
+<![CDATA[
+fun(URI, GlobalState) ->
+   FetchState = xmerl_scan:fetch_state(GlobalState),
+   Result = foo(URI, FetchState).  % Result being one of the above
+end.
+]]></programlisting>
+
+      </section>
+      <section>
+	<title>Continuation Function</title>
+<para>
+<computeroutput>{continuation_fun, fun()} | {continuation_fun, fun(), LocalState}</computeroutput>
+</para>
+<para>The continuation function is called when the parser encounters the end
+of the byte stream. Format and semantics:</para>
+
+    <programlisting>
+<![CDATA[
+fun(Continue, Exception, GlobalState) ->
+   ContState = xmerl_scan:cont_state(GlobalState),
+   {Result, ContState'} = get_more_bytes(ContState),
+   GlobalState' = xmerl_scan:cont_state(ContState', GlobalState),
+   case Result of
+      [] ->
+         GlobalState' = xmerl_scan:cont_state(ContState', GlobalState),
+         Exception(GlobalState');
+      MoreBytes ->
+         {MoreBytes', Rest} = end_on_whitespace_char(MoreBytes),
+         ContState'' = update_cont_state(Rest, ContState'),
+         GlobalState' = xmerl_scan:cont_state(ContState'', GlobalState),
+         Continue(MoreBytes', GlobalState')
+   end
+end.
+]]></programlisting>
+      </section>
+      <section>
+	<title>Rules Functions</title>
+	<para>
+<computeroutput>
+{rules, ReadFun : fun(), WriteFun : fun(), LocalState} |
+{rules, Table : ets()}</computeroutput>
+</para>
+	<para>The rules functions take care of storing scanner
+	information in a rules database. User-provided rules functions
+	may opt to store the information in mnesia, or perhaps in the
+	user_state(LocalState).</para>
+
+	<para>The following modes exist:</para>
+
+	<itemizedlist>
+
+	  <listitem><para>If the user doesn't specify an option, the
+	  scanner creates an ets table, and uses built-in functions to
+	  read and write data to it. When the scanner is done, the ets
+	  table is deleted.</para></listitem>
+
+	  <listitem><para>If the user specifies an ets table via the 
+	<computeroutput>{rules, Table}</computeroutput> option, the
+	scanner uses this table. When the scanner is done, it does
+	<emphasis>not</emphasis> delete the table.</para></listitem>
+	  
+	  <listitem><para>If the user specifies read and write
+	  functions, the scanner will use them instead.</para></listitem>
+
+	</itemizedlist>
+	
+	<para>The format for the read and write functions are as
+	follows:</para>
+
+
+<programlisting>
+<![CDATA[
+WriteFun(Context, Name, Definition, ScannerState) -> NewScannerState.
+ReadFun(Context, Name, ScannerState) -> Definition | undefined.
+]]></programlisting>
+
+	<para>Here is a summary of the data objects currently being
+	written by the scanner:</para>
+	
+	<table>
+	  <title>Scanner data objects</title>
+	  <tgroup cols="3">
+	    <thead>
+	      <row>
+		<entry>Context</entry>
+		<entry>Key Value</entry>
+		<entry>Definition</entry>
+	      </row>
+	    </thead>
+	    <tbody>
+	      <row>
+		<entry>notation</entry>
+		<entry>NotationName</entry>
+		<entry><computeroutput>{system, SL} | {public, PIDL, SL}</computeroutput></entry>
+	      </row>
+	      <row>
+		<entry>elem_def</entry>
+		<entry>ElementName</entry>
+		<entry><computeroutput>#xmlElement{content = ContentSpec}</computeroutput></entry>
+	      </row>
+	      <row>
+		<entry>parameter_entity</entry>
+		<entry>PEName</entry>
+		<entry><computeroutput>PEDef</computeroutput></entry>
+	      </row>
+	      <row>
+		<entry>entity</entry>
+		<entry>EntityName</entry>
+	  <entry><computeroutput>EntityDef</computeroutput></entry>
+	      </row>
+	    </tbody>
+	  </tgroup>
+	</table>
+      
+	
+<programlisting>
+<![CDATA[
+ContentSpec ::= empty | any | ElemContent
+ElemContent ::= {Mode, Elems}
+Mode        ::= seq | choice
+Elems       ::= [Elem]
+Elem        ::= '#PCDATA' | Name | ElemContent | {Occurrence, Elems}
+Occurrence  ::= '*' | '?' | '+'
+]]></programlisting>
+	<note><para>When &lt;Elem&gt; is not wrapped with
+&lt;Occurrence&gt;, (Occurrence = once) is implied.</para></note>
+
+      </section>
+      <section>
+	<title>Accumulator Function</title>
+	<para><computeroutput>{acc_fun, fun()} | {acc_fun, fun(),
+	LocalState}</computeroutput></para>
+
+	<para>The accumulator function is called to accumulate the
+	contents of an entity.When parsing very large files, it may
+	not be desireable to do so.In this case, an acc function can
+	be provided that simply doesn't accumulate.</para>
+
+	<para>Note that it is possible to even modify the parsed
+	entity before accumulating it, but this must be done with
+	care. <computeroutput>xmerl_scan</computeroutput> performs
+	post-processing of the element for namespace management. Thus,
+	the element must keep its original structure for this to
+	work.</para>
+
+	<para>The acc function has the following format and
+	semantics:</para>
+
+	<programlisting>
+<![CDATA[
+%% default accumulating acc fun
+fun(ParsedEntity, Acc, GlobalState) ->
+   {[X|Acc], GlobalState}.
+
+%% non-accumulating acc fun
+fun(ParsedEntity, Acc, GlobalState) ->
+   {Acc, GlobalState}.
+]]></programlisting>
+      </section>
+      <section>
+	<title>Close Function</title>
+
+	<para>The close function is called when a document (either the
+	main document or an external DTD) has been completely
+	parsed. When xmerl_scan was started using
+	<computeroutput>xmerl_scan:file/[1,2]</computeroutput>, the
+	file will be read in full, and closed immediately, before the
+	parsing starts, so when the close function is called, it will
+	not need to actually close the file. In this case, the close
+	function will be a good place to modify the state
+	variables.</para>
+
+	<para>Format and semantics:</para>
+
+	<programlisting>
+<![CDATA[
+fun(GlobalState) ->
+   GlobalState' = ....  % state variables may be altered
+]]></programlisting>
+      </section>
+
+    </section>
+
+  </section>
+
+  <section>
+    <title>XPATH</title>
+
+    <programlisting>
+<![CDATA[
+xmerl_xpath:string(QueryString, #xmlElement{}) ->
+	[DocEntity]
+
+DocEntity :	#xmlElement{} 
+		| #xmlAttribute{} 
+		| #xmlText{} 
+		| #xmlPI{}
+		| #xmlComment{}
+]]></programlisting>
+
+    <para>The xmerl_xpath module does seem to handle the entire XPATH
+    1.0 spec, but I haven't tested that much yet. The grammar is
+    defined in
+    <computeroutput>xmerl_xpath_parse.yrl</computeroutput>.  The core
+    functions are defined in
+    <computeroutput>xmerl_xpath_pred.erl</computeroutput>.</para>
+  </section>
+  <section>
+    <title>Some useful shell commands for debugging the XPath parser</title>
+<para>
+    <command>
+<![CDATA[
+c(xmerl_xpath_scan).
+yecc:yecc("xmerl_xpath_parse.yrl", "xmerl_xpath_parse", true, []).
+c(xmerl_xpath_parse).
+
+xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("position() > -1")).
+xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("5 * 6 div 2")).
+xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("5 + 6 mod 2")).
+xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("5 * 6")).
+xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("5 * 6")).
+xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("-----6")).
+xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("parent::node()")).
+xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("descendant-or-self::node()")).
+xmerl_xpath_parse:parse(xmerl_xpath_scan:tokens("parent::processing-instruction('foo')")).]]></command></para>
+  </section>
+  <section>
+    <title>Erlang Data Structure Export</title>
+
+    <para>The idea as follows:</para>
+
+    <para>The Erlang data structure should look like this:</para>
+    <programlisting>
+<![CDATA[
+Element:	{Tag, Attributes, Content}
+Tag :		atom()
+Attributes:	[{Key, Value}]
+Content:	[String | Element]
+String:		[char() | binary() | String]
+]]></programlisting>
+
+    <para>Some short forms are allowed:</para>
+    <programlisting>
+<![CDATA[
+{Tag, Content}	-> {Tag, [], Content}
+Tag		-> {Tag, [], []}
+]]></programlisting>
+
+    <para>Note that content lists must be flat, but strings can be
+    deep.</para>
+
+    <para>It is also allowed to include normal
+    <computeroutput>#xml...</computeroutput> elements in the simple
+    format.</para>
+
+    <para><computeroutput>xmerl:export_simple(Data,
+    Callback)</computeroutput> takes the above data structure and
+    exports it, using the callback module
+    <computeroutput>Callback</computeroutput>.</para>
+
+    <para>The callback module should contain hook functions for all
+    tags present in the data structure. The hook function must have
+    the format:</para>
+    <para><computeroutput> Tag(Data, Attrs, Parents, E)
+    </computeroutput></para>
+
+    <para>where E is an <computeroutput>#xmlElement{}</computeroutput>
+    record  (see <computeroutput>xmerl.hrl</computeroutput>).</para>
+
+    <para>Attrs is converted from the simple <computeroutput>[{Key,
+    Value}]</computeroutput> to
+    <computeroutput>[#xmlAttribute{}]</computeroutput></para>
+
+    <para>Parents is a list of <computeroutput>[{ParentTag,
+    ParentTagPosition}]</computeroutput>.</para>
+
+    <para>The hook function should return either the Data to be
+    exported, or the tuple <computeroutput>{'#xml-redefine#',
+    NewStructure}</computeroutput>, where
+    <computeroutput>NewStructure</computeroutput> is an element (which
+    can be simple), or a (simple-) content list wrapped in a 1-tuple
+    as <computeroutput>{NewContent}</computeroutput>.</para>
+
+    <para>The callback module can inherit definitions from other
+    callback modules, through the required function
+    <computeroutput>'#xml-interitance#() ->
+    [ModuleName]</computeroutput>. </para>
+
+    <para>As long as a tag is represented in one of the callback
+    modules, things will work. It is of course also possible to
+    redefine a tag.</para>
+      <section>
+      <title>XSLT like transforms</title>
+	<para>See separate document <ulink url="xmerl_xs.html" >xmerl_xs.html
+	</ulink></para>.
+      </section>
+  </section>
+
+</article>
diff --git a/lib/xmerl/doc/examples/xml/xmerl_xs.xml b/lib/xmerl/doc/examples/xml/xmerl_xs.xml
new file mode 100644
index 0000000000..9a798808b9
--- /dev/null
+++ b/lib/xmerl/doc/examples/xml/xmerl_xs.xml
@@ -0,0 +1,541 @@
+<?xml version="1.0" encoding="iso-8859-1"?>
+<!DOCTYPE article
+      PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN"
+      "http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd">
+
+<article  lang="en" xml:lang="en" >
+  <articleinfo>
+    <title>XSLT like transformations in Erlang </title>
+    <subtitle>User Guide</subtitle>
+    <authorgroup>
+      <author>
+	<firstname>Mikael</firstname>
+	<surname>Karlsson</surname>
+      </author>
+    </authorgroup>
+    <revhistory>
+      <revision>
+      <revnumber>1.0</revnumber><date>2002-10-25</date>
+      <revremark>First Draft</revremark>
+      </revision>
+      <revision>
+      <revnumber>1.1</revnumber><date>2003-02-05</date>
+      <revremark>Moved module xserl to xmerl application, renamed to 
+	  xmerl_xs</revremark>
+      </revision>
+    </revhistory>
+    <abstract>
+      <para>Erlang has similarities to XSLT since both languages
+	have a functional programming approach. Using the xpath implementation
+	in the existing xmerl application it is possible to write XSLT
+	like transforms in Erlang. One can also combine the
+	transformations with the erlang scripting possibility
+	in the yaws webserver to implement "on the fly" html
+	conversions of xml documents. 
+      </para>
+    </abstract>
+  </articleinfo>
+
+
+  <section>
+    <title>Terminology</title>
+    <variablelist>
+      <varlistentry>
+	<term>XML</term>
+	<listitem>
+	  <para>Extensible Markup Language</para>
+	</listitem>
+      </varlistentry>
+      <varlistentry>
+	<term>XSLT</term>
+	<listitem>
+	  <para>Extensible Stylesheet Language: Transformations</para>
+	</listitem>
+      </varlistentry>
+    </variablelist>
+  </section>
+  <section>
+    <title>Introduction</title>
+    <para>XSLT stylesheets are often used when transforming XML
+      documents, to other XML documents or (X)HTML for presentation.
+      There are a number of brick-sized books written on the
+      topic. XSLT contains quite many
+      functions and learning them all may take some effort, which
+      could be a reason why the author only has reached a basic level of
+      understanding. This document assumes a basic level of
+      understanding of XSLT.
+    </para>
+    <para>Since XSLT is based on a functional programming approach
+      with pattern matching and recursion it is possible to write
+      similar style sheets in Erlang. At least for basic
+      transforms. XPath which is used in XSLT is also already
+      implemented in the xmerl application written i Erlang. This
+      document describes how to use the XPath implementation together
+      with Erlangs pattern matching and a couple of functions to write
+      XSLT like transforms.</para>
+    <para>This approach is probably easier for an Erlanger but
+      if you need to use real XSLT stylesheets in order to "comply to
+      the standard" there is an adapter available to the Sablotron
+      XSLT package which is written i C++.
+    </para>
+    <para>
+      This document is written in the Simplified Docbook DTD which is
+      a subset of the complete one and converted to xhtml using a
+      stylesheet written in Erlang.
+    </para>
+  </section>
+
+  <section>
+    <title>Tools</title>
+    <section>
+      <title>xmerl</title>
+      <para><ulink url="http://sowap.sourceforge.net/" >xmerl</ulink>
+      is a xml parser written in Erlang</para>
+      <section>
+	<title>xmerl_xpath</title>
+	<para>XPath is in important part of XSLT and is implemented in
+	xmerl</para>
+      </section>
+      <section>
+	<title>xmerl_xs</title>
+	<para>
+	  <ulink url="xmerl_xs.yaws" >xmerl_xs</ulink> is a very small
+	  module acting as "syntactic sugar" for the XSLT lookalike 
+	  transforms. It uses xmerl_xpath.
+	</para>
+      </section>
+    </section>
+
+    <section>
+      <title>yaws</title>
+      <para>
+	<ulink url="http://yaws.hyber.org/" >Yaws</ulink>, Yet Another
+	Webserver, is a web server written in Erlang that support dynamic
+	content generation using embedded scripts, also written in Erlang.
+      </para>
+<!--
+      <figure>
+	<title>The Yaws logo</title>
+	<mediaobject>
+	  <imageobject>
+	    <imagedata fileref="yaws_pb.gif" format="GIF" scale="50%"/>
+	  </imageobject>
+	</mediaobject>
+      </figure>
+-->
+      <para>Yaws is not needed to make the XSLT like transformations, but
+	combining yaws and xmerl it is possible to do transformations
+	of XML documents to HTML in realtime, when clients requests a
+	web page. As an example I am able to edit this document using 
+	emacs with psgml tools, save the document and just do a reload
+	in my browser to see the result. The parse/transform time is not
+	visually different compared to loading any other document in the
+	browser.
+      </para>
+    </section>
+    
+  </section>
+  
+  <section>
+    <title>Transformations</title>
+<para>
+ When xmerl_scan parses an xml string/file it returns a record of:
+</para>
+ <programlisting>
+<![CDATA[ 
+  -record(xmlElement, {
+      name,
+      parents = [],
+      pos,
+      attributes = [],
+      content = [],
+      language = [],
+      expanded_name = [],
+      nsinfo = [],% {Prefix, Local} | []
+      namespace = #xmlNamespace{}
+     }).
+ ]]> 
+</programlisting>
+<para>
+ Were content is a mixed list of yet other xmlElement records and/or
+ xmlText  (or other node types).
+</para>
+    <section>
+      <title>xmerl_xs functions</title>
+      <para>
+	Functions used:
+      </para>
+      <variablelist>
+	<varlistentry>
+	  <term>xslapply/2</term>
+	  <listitem>
+	    <para>function to make things look similar 
+	      to xsl:apply-templates. 
+	    </para>
+	  </listitem>
+	</varlistentry>
+	<varlistentry>
+	  <term>value_of/1</term>
+	  <listitem>
+	    <para>Conatenates all text nodes within a tree.</para>
+	  </listitem>
+	</varlistentry>
+	<varlistentry>
+	  <term>select/2</term>
+	  <listitem>
+	    <para>select(Str, E) extracts nodes from the XML tree using
+	      xmerl_xpath.
+	    </para>
+	  </listitem>
+	</varlistentry>
+	<varlistentry>
+	  <term>built_in_rules/2</term>
+	  <listitem>
+	    <para>The default fallback behaviour, template funs should
+	      end with:
+	      <computeroutput>template(E)->built_in_rules(fun
+	      template/1, E).
+</computeroutput>
+	    </para>
+	  </listitem>
+	</varlistentry>
+      </variablelist>
+<note><para>Text is escaped using xmerl_lib:export_text/1 for 
+	"&lt;", "&gt;" and other relevant xml
+	characters when exported. So the value_of/1 and built_in_rules/2
+	functions should be replaced when not exporting to xml or html.
+</para></note>
+    </section>
+    
+
+<section><title>Examples</title>
+      <example>
+	<title>Using xslapply</title>
+	<para>original XSLT:</para>
+	<programlisting>
+<![CDATA[
+  <xsl:template match="doc/title">
+      <h1>
+        <xsl:apply-templates/>
+      </h1>
+  </xsl:template>
+ ]]> 
+	    </programlisting>
+	    <para>
+	      becomes in Erlang:</para>
+	    <programlisting>
+<![CDATA[
+  template(E = #xmlElement{ parents=[{'doc',_}|_], name='title'}) ->
+      ["<h1>",
+           xslapply(fun template/1, E),
+       "</h1>"];
+ ]]> 
+	    </programlisting>
+
+      </example>
+      <example>
+	<title>Using value_of and select</title>
+	<programlisting>
+<![CDATA[
+  <xsl:template match="title">
+    <div align="center"><h1><xsl:value-of select="." /></h1></div>
+  </xsl:template>
+ ]]> 
+	</programlisting>
+	<para>
+	  becomes:
+	</para>
+	<programlisting>
+<![CDATA[
+template(E = #xmlElement{name='title'}) ->
+    ["<div align=\"center\"><h1>", value_of(select(".", E)), "</h1></div>"];
+ ]]> 
+	    </programlisting>
+      </example>
+    <example>
+      <title>Simple xsl stylesheet</title>
+<para>
+ A complete example with the XSLT sheet in the xmerl distribution. 
+</para>
+ <programlisting>
+<![CDATA[
+
+<xsl:stylesheet version="1.0"
+		xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
+		xmlns="http://www.w3.org/TR/xhtml1/strict">
+
+  <xsl:strip-space elements="doc chapter section"/>
+  <xsl:output
+	method="xml"
+	indent="yes"
+	encoding="iso-8859-1"
+  />
+
+  <xsl:template match="doc">
+    <html>
+      <head>
+        <title>
+          <xsl:value-of select="title"/>
+        </title>
+      </head>
+      <body>
+        <xsl:apply-templates/>
+      </body>
+    </html>
+  </xsl:template>
+
+  <xsl:template match="doc/title">
+    <h1>
+      <xsl:apply-templates/>
+    </h1>
+  </xsl:template>
+
+  <xsl:template match="chapter/title">
+    <h2>
+      <xsl:apply-templates/>
+    </h2>
+  </xsl:template>
+
+  <xsl:template match="section/title">
+    <h3>
+      <xsl:apply-templates/>
+    </h3>
+  </xsl:template>
+
+  <xsl:template match="para">
+    <p>
+      <xsl:apply-templates/>
+    </p>
+  </xsl:template>
+
+  <xsl:template match="note">
+    <p class="note">
+      <b>NOTE: </b>
+      <xsl:apply-templates/>
+    </p>
+  </xsl:template>
+
+  <xsl:template match="emph">
+    <em>
+      <xsl:apply-templates/>
+    </em>
+  </xsl:template>
+
+</xsl:stylesheet>
+ ]]>
+      </programlisting>
+    </example>
+    <example>
+      <title>Erlang version</title>
+      <para>
+	Erlang transformation of previous example:
+      </para>
+      <programlisting>
+<![CDATA[
+
+-include("xmerl.hrl").
+
+-import(xmerl_xs, 
+	[ xslapply/2, value_of/1, select/2, built_in_rules/2 ]).
+
+doctype()->
+    "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"\
+ \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd \">".
+
+process_xml(Doc)->
+	template(Doc).
+
+template(E = #xmlElement{name='doc'})->
+    [ "<\?xml version=\"1.0\" encoding=\"iso-8859-1\"\?>",
+      doctype(),
+      "<html xmlns=\"http://www.w3.org/1999/xhtml\" >"
+      "<head>"
+      "<title>", value_of(select("title",E)), "</title>"
+      "</head>"
+      "<body>",
+      xslapply( fun template/1, E),
+      "</body>"
+      "</html>" ];
+
+
+template(E = #xmlElement{ parents=[{'doc',_}|_], name='title'}) ->
+    ["<h1>", 
+     xslapply( fun template/1, E), 
+     "</h1>"];
+
+template(E = #xmlElement{ parents=[{'chapter',_}|_], name='title'}) ->
+    ["<h2>", 
+     xslapply( fun template/1, E),
+     "</h2>"];
+
+template(E = #xmlElement{ parents=[{'section',_}|_], name='title'}) ->
+    ["<h3>", 
+     xslapply( fun template/1, E),
+     "</h3>"];
+
+template(E = #xmlElement{ name='para'}) ->
+    ["<p>", xslapply( fun template/1, E), "</p>"];
+
+template(E = #xmlElement{ name='note'}) ->
+    ["<p class=\"note\">"
+     "<b>NOTE: </b>",
+     xslapply( fun template/1, E),
+     "</p>"];
+
+template(E = #xmlElement{ name='emph'}) ->
+    ["<em>", xslapply( fun template/1, E), "</em>"];
+
+template(E)->
+    built_in_rules( fun template/1, E).
+ ]]>
+      </programlisting>
+      <para>
+	It is important to end with a call to
+	<computeroutput>xmerl_xs:built_in_rules/2</computeroutput>
+	if you want any text to be written in "push" transforms. 
+	That are the ones using a lot <computeroutput>xslapply( fun
+	template/1, E )</computeroutput> instead of
+	<computeroutput>value_of(select("xpath",E))</computeroutput>,
+	which is pull...
+      </para>
+    </example>
+<para>The largest example is the stylesheet to transform this document
+	from the Simplified Docbook XML format to xhtml. The source
+	file is <computeroutput>sdocbook2xhtml.erl</computeroutput>.
+</para>
+</section>
+  <section>
+    <title>Tips and tricks</title>
+      <section>
+	<title>for-each</title>
+	<para>The function for-each is quite common in XSLT stylesheets.
+	  It can often be rewritten and replaced by select/1. Since
+	select/1 returns a list of #xmlElements and xslapply/2
+	traverses them it is more or less the same as to loop over all
+	the elements. 
+	</para>
+      </section>
+      <section>
+	<title>position()</title>
+	<para>The XSLT position() and #xmlElement.pos are not the
+	same. One has to make an own position in Erlang.</para>
+	<example>
+	  <title>Counting positions</title>
+	  <programlisting>
+<![CDATA[
+<xsl:template match="stanza">
+  <p><xsl:apply-templates select="line" /></p>
+</xsl:template>
+
+<xsl:template match="line">
+  <xsl:if test="position() mod 2 = 0">&#160;&#160;</xsl:if>
+  <xsl:value-of select="." /><br />
+</xsl:template>
+ ]]>
+	  </programlisting>
+<para>Can be written as</para>
+	  <programlisting>
+<![CDATA[
+template(E = #xmlElement{name='stanza'}) ->
+    {Lines,LineNo} = lists:mapfoldl(fun template_pos/2, 1, select("line", E)),
+    ["<p>", Lines, "</p>"].
+
+template_pos(E = #xmlElement{name='line'}, P) ->
+    {[indent_line(P rem 2), value_of(E#xmlElement.content), "<br />"], P + 1 }.
+
+indent_line(0)->"&#160;&#160;";
+indent_line(_)->"".
+ ]]>
+	  </programlisting>
+	</example>
+      </section>
+      <section>
+	<title>Global tree awareness</title>
+	<para>In XSLT you have "root" access to the top of the tree
+	with XPath, even though you are somewhere deep in your
+	tree.</para>
+	<para>The xslapply/2 function only carries back the child part
+	  of the tree to the template fun. But it is quite easy to write
+	  template funs that handles both the child and top tree.</para>
+	<example>
+	  <title>Passing the root tree</title>
+	  <para>The following example piece will prepend the article 
+	    title to any section title</para>
+	  <programlisting>
+<![CDATA[
+template(E = #xmlElement{name='title'}, ETop ) ->
+    ["<h3>", value_of(select("title", ETop))," - ",
+     xslapply( fun(A) -> template(A, ETop) end, E),
+     "</h3>"];
+ ]]>
+	  </programlisting>
+	</example>
+      </section>
+    </section>
+
+  </section>
+  
+
+  <section>
+    <title>Utility functions</title>
+    <para>
+      The module xmerl_xs contains the functions 
+      <computeroutput>mapxml/2, foldxml/3</computeroutput> and 
+      <computeroutput> mapfoldxml/3</computeroutput> to traverse 
+      <literal>#xmlElement</literal> trees. They can be used in order
+      to build cross-references, see sdocbook2xhtml.erl for instance
+      where <computeroutput>foldxml/3</computeroutput> and
+      <computeroutput> mapfoldxml/3</computeroutput> are used to 
+      number chapters, examples and figures and to build the Table of
+      contents for the document.
+    </para>
+  </section>
+
+
+  <section>
+    <title>Future enhancements</title>
+    <para>
+      More wish- than task-list at the moment.
+    </para>
+    <itemizedlist>
+      <listitem>
+	<para>More stylesheets</para>
+      </listitem>
+      <listitem>
+	<para>On the fly exports to PDF for printing and also more 
+	  "polished" presentations.
+	</para>
+      </listitem>
+    </itemizedlist>
+  </section>
+
+  <section>
+    <title>References</title>
+    <orderedlist>
+      <listitem>
+	<para><ulink url="../xml/xmerl_xs.xml" >XML source
+	file</ulink> for this document.
+	</para>
+      </listitem>
+      <listitem>
+	<para><ulink url="../xs/sdocbook2xhtml.erl" >Erlang style
+	sheet</ulink> used for this document. (Simplified Docbook DTD).</para>
+      </listitem>
+      <listitem>
+	    <para><ulink url="http://www.erlang.org/" >Open Source Erlang</ulink>
+    </para>
+      </listitem>
+    </orderedlist>
+
+  </section>
+</article>
+
+<!-- 
+Local Variables:
+mode: xml
+sgml-indent-step: 2
+sgml-indent-data: t
+sgml-set-face: t
+sgml-insert-missing-element-comment: nil
+End:
+-->
author	Björn Gustavsson <[email protected]>	2010-09-02 16:56:23 +0200
committer	Lars Thorsen <[email protected]>	2011-05-10 09:13:22 +0200
commit	1a5796cd12061ebb21e7e51a0b7bdf05ed4786a7 (patch)
tree	7d4a0418919b15ebe2ca9993c3a4a9999f6de006 /lib/xmerl/doc/examples/xml
parent	e3af9123e7ef9291535cafbd0ecb9d3309d674f7 (diff)
download	otp-1a5796cd12061ebb21e7e51a0b7bdf05ed4786a7.tar.gz otp-1a5796cd12061ebb21e7e51a0b7bdf05ed4786a7.tar.bz2 otp-1a5796cd12061ebb21e7e51a0b7bdf05ed4786a7.zip