diff options
author | Erlang/OTP <[email protected]> | 2009-11-20 14:54:40 +0000 |
---|---|---|
committer | Erlang/OTP <[email protected]> | 2009-11-20 14:54:40 +0000 |
commit | 84adefa331c4159d432d22840663c38f155cd4c1 (patch) | |
tree | bff9a9c66adda4df2106dfd0e5c053ab182a12bd /lib/xmerl/doc/src/xmerl_ug.xmlsrc | |
download | otp-84adefa331c4159d432d22840663c38f155cd4c1.tar.gz otp-84adefa331c4159d432d22840663c38f155cd4c1.tar.bz2 otp-84adefa331c4159d432d22840663c38f155cd4c1.zip |
The R13B03 release.OTP_R13B03
Diffstat (limited to 'lib/xmerl/doc/src/xmerl_ug.xmlsrc')
-rw-r--r-- | lib/xmerl/doc/src/xmerl_ug.xmlsrc | 490 |
1 files changed, 490 insertions, 0 deletions
diff --git a/lib/xmerl/doc/src/xmerl_ug.xmlsrc b/lib/xmerl/doc/src/xmerl_ug.xmlsrc new file mode 100644 index 0000000000..6ee6707e53 --- /dev/null +++ b/lib/xmerl/doc/src/xmerl_ug.xmlsrc @@ -0,0 +1,490 @@ +<?xml version="1.0" encoding="latin1" ?> +<!DOCTYPE chapter SYSTEM "chapter.dtd"> + +<chapter> + <header> + <copyright> + <year>2004</year><year>2009</year> + <holder>Ericsson AB. All Rights Reserved.</holder> + </copyright> + <legalnotice> + The contents of this file are subject to the Erlang Public License, + Version 1.1, (the "License"); you may not use this file except in + compliance with the License. You should have received a copy of the + Erlang Public License along with this software. If not, it can be + retrieved online at http://www.erlang.org/. + + Software distributed under the License is distributed on an "AS IS" + basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See + the License for the specific language governing rights and limitations + under the License. + + </legalnotice> + + <title>xmerl</title> + <prepared>UKH/L Bertil Karlsson</prepared> + <docno></docno> + <date>2004-06-16</date> + <rev>D</rev> + <file>xmerl_ug.xml</file> + </header> + + <section> + <title>Introduction</title> + + <section> + <title>Features</title> + <p>The <em>xmerl</em> XML parser is able to parse XML documents + according to the XML 1.0 standard. As default it performs + well-formed parsing,(syntax checks and checks of well-formed + constraints). Optionally one can also use xmerl as a validating + parser,(validate according to referenced DTD and validating + constraints). By means of for example the xmerl_xs module it is + possible to transform the parsed result to other formats, + e.g. text, HTML, XML etc.</p> + </section> + + <section> + <title>Overview</title> + <p>This document does not give an introduction to XML. There + are a lot of books available that describe XML from + different views. At the <url href="http://www.w3.org">www.W3.org</url> site you will find + the <url href="http://www.w3.org/TR/REC-xml/">XML 1.0 specification</url> and other related specs. One site were + you can find tutorials on XML and related specs is <url href="http://www.zvon.org">ZVON.org</url>.</p> + <p>However, here you will find some examples of how to use + and to what you can use xmerl. A detailed description of the + user interface can be found in the reference manual.</p> + <p>There are two known shortcomings in xmerl:</p> + <list type="bulleted"> + <item>It cannot retrieve external entities on the Internet + by a URL reference, only resources in the local file + system.</item> + <item>xmerl can parse Unicode encoded data. But, it fails + on tag names, attribute names and other mark-up names that + are encoded Unicode characters not mapping on ASCII.</item> + </list> + <p>By parsing an XML document you will get a record, + displaying the structure of the document, as return + value. The record also holds the data of the document. xmerl + is convenient to use in for instance the following scenarios:</p> + <p>You need to retrieve data from XML documents. Your + Erlang software can handle information from the XML document + by extracting data from the data structure received by + parsing.</p> + <p>It is also possible to do further processing of parsed + XML with xmerl. If you want to change format of the XML + document to for instance HTML, text or other XML format you + can transform it. There is support for such transformations + in xmerl.</p> + <p>One may also convert arbitrary data to XML. So it for + instance is easy to make it readable by humans. In this case + you first create xmerl data structures out of your data, then + transform it to XML. </p> + <p>You can find examples of these three examples of usage + below.</p> + </section> + </section> + + <section> + <title>xmerl User Interface Data Structure</title> + <p>The following records used by xmerl to save the parsed + data are defined in <c>xmerl.hrl</c></p> + <p>The result of a successful parsing is a tuple + <c>{DataStructure,M}</c>. <c>M</c> is the XML production Misc, + which is the mark-up that comes after the element of the + document. It is returned "as is". <c>DataStructure</c> is an + <c>xmlElement</c> record, that among others have the fields + <c>name</c>, <c>parents</c>, <c>attributes</c> and + <c>content</c> like:</p> + <pre> +#xmlElement{name=Name, + ... + parents=Parents, + ... + attributes=Attrs, + content=Content, + ...} </pre> + <p>The name of the element is found in the <c>name</c> + field. In the <c>parents</c> field is the names of the parent + elements saved. Parents is a list of tuples where the first + element in each tuple is the name of the parent element. The + list is in reverse order.</p> + <p>The record <c>xmlAttribute</c> holds the name and value of + an attribute in the fields <c>name</c> and <c>value</c>. All + attributes of an element is a list of xmlAttribute in the + field <c>attributes</c> of the xmlElement record. + </p> + <p>The <c>content</c> field of the top element is a list of + records that shows the structure and data of the document. If + it is a simple document like: </p> + <pre> +<?xml version="1.0"?> +<dog> +Grand Danois +</dog> </pre> + <p>The parse result will be:</p> + <pre> +#xmlElement{name = dog, + ... + parents = [], + ... + attributes = [], + content = [{xmlText,[{dog,1}],1,[],"\ +Grand Danois\ +",text}], + ... + } </pre> + <p>Where the content of the top element is: + <c>[{xmlText,[{dog,1}],1,[],"\ Grand Danois\ ",text}]</c>. Text will be returned in <c>xmlText</c> records. Though, + usually documents are more complex, and the content of the top + element will in that case be a nested structure with + xmlElement records that in turn may have complex content. All of + this reflects the structure of the XML document.</p> + <p>Space characters between mark-up as <c>space</c>, + <c>tab</c> and <c>line feed</c> are normalized and returned as + xmlText records.</p> + + <section> + <title>Errors</title> + <p>An unsuccessful parse results in an error, which may be a + tuple <c>{error,Reason}</c> or an exit: + <c>{'EXIT',Reason}</c>. According to the XML 1.0 standard + there are <c>fatal error</c> and <c>error</c> situations. The + fatal errors <em>must</em> be detected by a conforming parser + while an error <em>may</em> be detected. Both categories of + errors are reported as fatal errors by this version of xmerl, + most often as an exit.</p> + </section> + </section> + + <section> + <title>Getting Started</title> + <p>In the following examples we use the XML file + "motorcycles.xml" and the corresponding DTD + "motorcycles.dtd". motorcycles.xml looks like: + <marker id="motorcyclesxml"></marker> +</p> + <codeinclude file="motorcycles.txt" tag="" type="none"></codeinclude> + <p>and motorcycles.dtd looks like: </p> + <codeinclude file="motorcycles_dtd.txt" tag="" type="none"></codeinclude> + <p>If you want to parse the XML file motorcycles.xml you run + it in the Erlang shell like:</p> + <pre> +3> {ParsResult,Misc}=xmerl_scan:file("motorcycles.xml"). +{{xmlElement,motorcycles, + motorcycles, + [], + {xmlNamespace,[],[]}, + [], + 1, + [], + [{xmlText,[{motorcycles,1}],1,[],"\ + ",text}, + {xmlElement,bike, + bike, + [], + {xmlNamespace,[],[]}, + [{motorcycles,1}], + 2, + [{xmlAttribute,year,[],[],[],[]|...}, + {xmlAttribute,color,[],[],[]|...}], + [{xmlText,[{bike,2},{motorcycles|...}], + 1, + []|...}, + {xmlElement,name,name,[]|...}, + {xmlText,[{...}|...],3|...}, + {xmlElement,engine|...}, + {xmlText|...}, + {...}|...], + [], + ".", + undeclared}, + ... + ], + [], + ".", + undeclared}, + []} +4> </pre> + <p>If you instead receives the XML doc as a string you can + parse it by <c>xmerl_scan:string/1</c>. Both file/2 and string/2 + exists where the second argument is a list of options to the + parser, see the <seealso marker="xmerl_scan">reference manual</seealso>.</p> + </section> + + <section> + <title>Example: Extracting Data From XML Content</title> + <p>In this example consider the situation where you want to + examine a particular data in the XML file. For instance, you + want to check for how long each motorcycle have been recorded.</p> + <p>Take a look at the DTD and observe that the structure of an + XML document that is conformant to this DTD must have one + motorcycles element (the root element). The motorcycles element + must have at least one bike element. After each bike element it + may be a date element. The content of the date element is + #PCDATA (Parsed Character DATA), i.e. raw text. Observe that if + #PCDATA must have a <c><![CDATA["<"]]></c> or a <c><![CDATA["&"]]></c> character it must + be written as <c><![CDATA["<"]]></c> and <c><![CDATA["&"]]></c> + respectively. Also other character entities exists similar to + the ones in HTML and SGML.</p> + <p>If you successfully parse the XML file with the validation + on as in: + <c>xmerl_scan:file('motorcycles.xml',[{validation,true}])</c> + you know that the XML document is valid and has the structure + according to the DTD.</p> + <p>Thus, knowing the allowed structure it is easy to write a + program that traverses the data structure and picks the + information in the xmlElements records with name date.</p> + <p>Observe that white space: each space, tab or line feed, + between mark-up results in an xmlText record.</p> + <p></p> + </section> + + <section> + <title>Example: Create XML Out Of Arbitrary Data</title> + <p>For this task there are more than one way to go. The "brute + force" method is to create the records you need and feed your + data in the content and attribute fields of the appropriate + element.</p> + <p>There is support for this in xmerl by the "simple-form" + format. You can put your data in a simple-form data structure + and feed it into + <c>xmerl:export_simple(Content,Callback,RootAttributes)</c>. Content + may be a mixture of simple-form and xmerl records as xmlElement + and xmlText.</p> + <p>The Types are:</p> + <list type="bulleted"> + <item>Content = [Element]</item> + <item>Callback = atom()</item> + <item>RootAttributes = [Attributes]</item> + </list> + <p>Element is any of:</p> + <list type="bulleted"> + <item>{Tag, Attributes, Content}</item> + <item>{Tag, Content}</item> + <item>Tag</item> + <item>IOString</item> + <item>#xmlText{}</item> + <item>#xmlElement{}</item> + <item>#xmlPI{}</item> + <item>#xmlComment{}</item> + <item>#xmlDecl{}</item> + </list> + <p>The simple-form structure is any of <c>{Tag, Attributes, Content}</c>, <c>{Tag, Content}</c> or <c>Tag</c> where:</p> + <p></p> + <list type="bulleted"> + <item>Tag = atom()</item> + <item>Attributes = [{Name, Value}| #xmlAttribute{}]</item> + <item>Name = atom()</item> + <item>Value = IOString | atom() | integer()</item> + </list> + <p>See also reference manual for + <seealso marker="xmerl#export_simple-3">xmerl</seealso></p> + <p>If you want to add the information about a black Harley + Davidsson 1200 cc Sportster motorcycle from 2003 that is in + shape as new in the motorcycles.xml document you can put the + data in a simple-form data structure like:</p> + <pre> +Data = + {bike, + [{year,"2003"},{color,"black"},{condition,"new"}], + [{name, + [{manufacturer,["Harley Davidsson"]}, + {brandName,["XL1200C"]}, + {additionalName,["Sportster"]}]}, + {engine, + ["V-engine, 2-cylinders, 1200 cc"]}, + {kind,["custom"]}, + {drive,["belt"]}]} </pre> + <p>In order to append this data to the end of the + motorcycles.xml document you have to parse the file and add Data + to the end of the root element content.</p> + <pre> + {RootEl,Misc}=xmerl_scan:file('motorcycles.xml'), + #xmlElement{content=Content} = RootEl, + NewContent=Content++lists:flatten([Data]), + NewRootEl=RootEl#xmlElement{content=NewContent}, </pre> + <p>Then you can run it through the export_simple/2 function: </p> + <pre> + {ok,IOF}=file:open('new_motorcycles.xml',[write]), + Export=xmerl:export_simple([NewRootEl],xmerl_xml), + io:format(IOF,"~s~n",[lists:flatten(Export)]), </pre> + <marker id="new_motorcyclesxml"></marker> + <p>The result would be: </p> + <codeinclude file="new_motorcycles.txt" tag="" type="none"></codeinclude> + <p>If it is important to get similar indentation and newlines + as in the original document you have to add #xmlText{} records + with space and newline values in appropriate places. It may also + be necessary to keep the original prolog where the DTD is + referenced. If so, it is possible to pass a RootAttribute + <c>{prolog,Value}</c> to <c>export_simple/3</c>. The following + example code fixes those changes in the previous example:</p> + <pre> + Data = + [#xmlText{value=" "}, + {bike,[{year,"2003"},{color,"black"},{condition,"new"}], + [#xmlText{value="\ + "}, + {name,[#xmlText{value="\ + "}, + {manufacturer,["Harley Davidsson"]}, + #xmlText{value="\ + "}, + {brandName,["XL1200C"]}, + #xmlText{value="\ + "}, + {additionalName,["Sportster"]}, + #xmlText{value="\ + "}]}, + {engine,["V-engine, 2-cylinders, 1200 cc"]}, + #xmlText{value="\ + "}, + {kind,["custom"]}, + #xmlText{value="\ + "}, + {drive,["belt"]}, + #xmlText{value="\ + "}]}, + #xmlText{value="\ +"}], + ... + NewContent=Content++lists:flatten([Data]), + NewRootEl=RootEl#xmlElement{content=NewContent}, + ... + Prolog = ["<?xml version=\\"1.0\\" encoding=\\"utf-8\\" ?> +<!DOCTYPE motorcycles SYSTEM \\"motorcycles.dtd\\">\ +"], + Export=xmerl:export_simple([NewRootEl],xmerl_xml,[{prolog,Prolog}]), + ... </pre> + <p>The result will be: </p> + <codeinclude file="new_motorcycles2.txt" tag="" type="none"></codeinclude> + </section> + + <section> + <title>Example: Transforming XML To HTML</title> + <p>Assume that you want to transform the <seealso marker="#motorcyclesxml">motorcycles.xml</seealso> document to + HTML. If you want the same structure and tags of the resulting + HTML document as of the XML document then you can use the + <c>xmerl:export/2</c> function. The following:</p> + <pre> +2> {Doc,Misc}=xmerl_scan:file('motorcycles.xml'). +{{xmlElement,motorcycles, + motorcycles, + [], + {xmlNamespace,[],[]}, + [], + 1, + [], + [{xmlText,[{motorcycles,1}],1,[],"\ + ",text}, + {xmlElement,bike, +... +3> DocHtml=xmerl:export([Doc],xmerl_html). +["<!DOCTYPE HTML PUBLIC \\"", + "-//W3C//DTD HTML 4.01 Transitional//EN", + "\\"", + [], + ">\ +", + [[["<","motorcycles",">"], + ["\ + ", + [["<", + "bike", + [[" ","year","=\\"","2000","\\""],[" ","color","=\\"","black","\\""]], + ">"], +... </pre> + <p>Will give the result <url href="result_export.html">result_export.html</url></p> + <p>Perhaps you want to do something more arranged for human + reading. Suppose that you want to list all different brands in + the beginning with links to each group of motorcycles. You also + want all motorcycles sorted by brand, then some flashy colors + on top of it. Thus you rearrange the order of the elements and + put in arbitrary HTML tags. This is possible to do by means of + the <url href="http://www.w3.org/Style/XSL/">XSL Transformation (XSLT)</url> like functionality in xmerl. </p> + <p>Even though the following example shows one way to transform data + from XML to HTML it also applies to transformations to other + formats.</p> + <p><c>xmerl_xs</c> does not implement the entire XSLT + specification but the basic functionality. For all details see + the <seealso marker="xmerl_xs">reference manual</seealso></p> + <p>First, some words about the xmerl_xs functionality:</p> + <p>You need to wright template functions to be able to control + what kind of output you want. Thus if you want to encapsulate a + <c>bike</c> element in <p> tags you simply wright a + function:</p> + <pre> +template(E = #xmlElement{name='bike'}) -> + ["<p>",xslapply(fun template/1,E),"</p>"]; </pre> + <p>With <c>xslapply</c> you tell the XSLT processor in which + order it should traverse the XML structure. By default it goes + in preorder traversal, but with the following we make a + deliberate choice to break that order:</p> + <pre> +template(E = #xmlElement{name='bike'}) -> + ["<p>",xslapply(fun template/1,select("bike/name/manufacturer")),"</p>"]; </pre> + <p>If you want to output the content of an XML element or an attribute you will get the value as a string by the <c>value_of</c> function:</p> + <pre> +template(E = #xmlElement{name='motorcycles'}) -> + ["<p>",value_of(select("bike/name/manufacturer",E),"</p>"]; </pre> + <p>In the xmerl_xs functions you can provide a select(String) + call, which is an <url href="http://www.w3.org/TR/xpath">XPath</url> + functionality. For more details see the xmerl_xs <url href="xmerl_xs_examples.html">tutorial</url>.</p> + <p>Now, back to the example where we wanted to make the output + more arranged. With the template:</p> + <pre> +template(E = #xmlElement{name='motorcycles'}) -> + [ "<head>\ +<title>motorcycles</title>\ +</head>\ +", + "<body>\ +", +\011 "<h1>Used Motorcycles</h1>\ +", +\011 "<ul>\ +", +\011 remove_duplicates(value_of(select("bike/name/manufacturer",E))), +\011 "\ +</ul>\ +", +\011 sort_by_manufacturer(xslapply(fun template/1, E)), + "</body>\ +", +\011 "</html>\ +"]; </pre> + <p>We match on the top element and embed the inner parts in an + HTML body. Then we extract the string values of all motorcycle + brands, sort them and removes duplicates by + <c>remove_duplicates(value_of(select("bike/name/manufacturer", E)))</c>. We also process the substructure of the top element + and pass it to a function that sorts all motorcycle information + by brand according to the task formulation in the beginning of + this example.</p> + <p>The next template matches on the <c>bike</c> element:</p> + <pre> +template(E = #xmlElement{name='bike'}) -> + {value_of(select("name/manufacturer",E)),["<dt>",xslapply(fun template/1,select("name",E)),"</dt>", + "<dd><ul>\ +", + "<li style="color:green">Manufacturing year: ",xslapply(fun template/1,select("@year",E)),"</li>\ +", + "<li style="color:red">Color: ",xslapply(fun template/1,select("@color",E)),"</li>\ +", + "<li style="color:blue">Shape : ",xslapply(fun template/1,select("@condition",E)),"</li>\ +", + "</ul></dd>\ +"]}; </pre> + <p>This creates a tuple with the brand of the motorcycle and + the output format. We use the brand name only for sorting + purpose. We have to end the template function with the "built + in clause" <c>template(E) -> built_in_rules(fun template/1, E).</c></p> + <p>The entire program is motorcycles2html.erl:</p> + <codeinclude file="motorcycles2html.erl" tag="" type="erl"></codeinclude> + <p>If we run it like this: + <c>motorcycles2html:process_to_file('result_xs.html', 'motorcycles2.xml').</c> The result will be <url href="result_xs.html">result_xs.html</url>. When the + input file is of the same structure as the previous + "motorcycles" XML files but it has a little more 'bike' + elements and the 'manufacturer' elements are not in order.</p> + </section> +</chapter> + |