diff options
Diffstat (limited to 'lib/xmerl/doc/src/xmerl_examples.html')
-rw-r--r-- | lib/xmerl/doc/src/xmerl_examples.html | 408 |
1 files changed, 408 insertions, 0 deletions
diff --git a/lib/xmerl/doc/src/xmerl_examples.html b/lib/xmerl/doc/src/xmerl_examples.html new file mode 100644 index 0000000000..1305f59d4a --- /dev/null +++ b/lib/xmerl/doc/src/xmerl_examples.html @@ -0,0 +1,408 @@ +<meta http-equiv="Context-Type" content="text/html; charset=iso-8859-1"> +<?xml version="1.0" encoding="iso-8859-1"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" + "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd "> +<html xmlns="http://www.w3.org/1999/xhtml" ><head> +<title>Customization functions</title> +<link rel="stylesheet" type="text/css" href="stylesheet.css"> +</head> +<body> +<h1>Customization functions</h1> + <h3 id="sect_1.1">1 Description</h3> + <p>The XML processor offers a number of hooks for + customization. These hooks are defined as function objects, and + can be provided by the caller.</p> + + <p>The following customization functions are available. If + they also have access to their own state variable, the access + function for this state is identified within parentheses:</p> + + <ul> + + <li>event function (<tt> + xmerl_scan:event_state/[1,2] + </tt>)</li> + + <li>hook function (<tt> + xmerl_scan:hook_state/[1,2] + </tt>)</li> + + <li>fetch function (<tt> + xmerl_scan:fetch_state/[1,2] </tt>) + </li> + + <li>continuation function (<tt> + xmerl_scan:cont_state/[1,2] </tt>) + </li> + + <li>rules function (<tt> + xmerl_scan:rules_state/[1,2] </tt>) + </li> + + <li>accumulator function</li> + + <li>close function</li> + + </ul> + + <p>For all of the above state access functions, the function + with one argument + (e.g. <tt>event_state(GlobalState)</tt>) + will read the state variable, while the function with two + arguments (e.g.: <tt>event_state(NewEventState, + GlobalState)</tt>) will modify it.</p> + + <p>For each function, the description starts with the syntax + for specifying the function in the + <a href="xmerl_scan.html#type-option_list"><tt>Option_list</tt></a>. + The general forms are <tt>{Tag, Fun}</tt>, or + <tt>{Tag, Fun, LocalState}</tt>. The + second form can be used to initialize the state variable in + question.</p> + + + <h4 id="sect_1.1.1">1.1 User State</h4> + + <p>All customization functions are free to access a + "User state" variable. Care must of course be taken + to coordinate the use of this state. It is recommended that + functions, which do not really have anything to contribute to + the "global" user state, use their own state + variable instead. Another option (used in + e.g. <tt>xmerl_eventp.erl</tt>) is for + customization functions to share one of the local states (in + <tt>xmerl_eventp.erl</tt>, the + continuation function and the fetch function both acces the + <tt>cont_state</tt>.)</p> + + <p>Functions to access user state:</p> + + <ul> + + <li><tt> + xmerl_scan:user_state(GlobalState) </tt> + </li> + + <li><tt>xmerl_scan:user_state(UserState, + GlobalState) </tt></li> + + </ul> + + + + <h4 id="sect_1.1.2">1.2 Event Function</h4> + + <p><tt>{event_fun, fun()} | {event_fun, fun(), + EventState}</tt></p> + + <p>The event function is called at the beginning and at the + end of a parsed entity. It has the following format and + semantics:</p> + + <table border="1" cellspacing="0" cellpadding="5" width="100%" + bgcolor="#CCCCCC"><tr><td><pre><code> + + fun(Event, GlobalState) -> + EventState = xmerl_scan:event_state(GlobalState), + EventState2 = foo(Event, EventState), + GlobalState2 = xmerl_scan:event_state(EventState2, GlobalState) + end. + </code></pre></td></tr></table> + + + + <h4 id="sect_1.1.3">1.3 Hook Function</h4> + <p> <tt>{hook_fun, fun()} | {hook_fun, fun(), + HookState}</tt></p> + + + + <p>The hook function is called when the processor has parsed a complete + entity. Format and semantics:</p> + + <table border="1" cellspacing="0" cellpadding="5" width="100%" + bgcolor="#CCCCCC"><tr><td><pre><code> + + fun(Entity, GlobalState) -> + HookState = xmerl_scan:hook_state(GlobalState), + {TransformedEntity, HookState2} = foo(Entity, HookState), + GlobalState2 = xmerl_scan:hook_state(HookState2, GlobalState), + {TransformedEntity, GlobalState2} + end. + </code></pre></td></tr></table> + + <p>The relationship between the event function, the hook + function and the accumulator function is as follows:</p> + + <ol> + <li>The event function is first called with an + 'ended' event for the parsed entity.</li> + + <li>The hook function is called, possibly + re-formatting the entity.</li> + + <li>The acc function is called in order to + (optionally) add the re-formatted entity to the contents of + its parent element.</li> + + </ol> + + + + <h4 id="sect_1.1.4">1.4 Fetch Function</h4> + <p> + <tt>{fetch_fun, fun()} | {fetch_fun, fun(), FetchState}</tt> + </p> + <p>The fetch function is called in order to fetch an external resource + (e.g. a DTD).</p> + + <p>The fetch function can respond with three different return values:</p> + + <table border="1" cellspacing="0" cellpadding="5" width="100%" + bgcolor="#CCCCCC"><tr><td><pre><code> + + Result ::= + {ok, {file, Filename}, NewGlobalState} | + {ok, {string, String}, NewGlobalState} | + {ok, not_fetched, NewGlobalState} + </code></pre></td></tr></table> + + <p>Format and semantics:</p> + + <table border="1" cellspacing="0" cellpadding="5" width="100%" + bgcolor="#CCCCCC"><tr><td><pre><code> + + fun(URI, GlobalState) -> + FetchState = xmerl_scan:fetch_state(GlobalState), + Result = foo(URI, FetchState). % Result being one of the above + end. + </code></pre></td></tr></table> + + + + <h4 id="sect_1.1.5">1.5 Continuation Function</h4> + <p> + <tt>{continuation_fun, fun()} | {continuation_fun, fun(), ContinuationState}</tt> + </p> + <p>The continuation function is called when the parser encounters the end + of the byte stream. Format and semantics:</p> + + <table border="1" cellspacing="0" cellpadding="5" width="100%" + bgcolor="#CCCCCC"><tr><td><pre><code> + + fun(Continue, Exception, GlobalState) -> + ContState = xmerl_scan:cont_state(GlobalState), + {Result, ContState2} = get_more_bytes(ContState), + case Result of + [] -> + GlobalState2 = xmerl_scan:cont_state(ContState2, GlobalState), + Exception(GlobalState2); + MoreBytes -> + {MoreBytes2, Rest} = end_on_whitespace_char(MoreBytes), + ContState3 = update_cont_state(Rest, ContState2), + GlobalState3 = xmerl_scan:cont_state(ContState3, GlobalState), + Continue(MoreBytes2, GlobalState3) + end + end. + </code></pre></td></tr></table> + + + <h4 id="sect_1.1.6">1.6 Rules Functions</h4> + <p> + <tt> + {rules, ReadFun : fun(), WriteFun : fun(), RulesState} | + {rules, Table : ets()}</tt> + </p> + <p>The rules functions take care of storing scanner + information in a rules database. User-provided rules functions + may opt to store the information in mnesia, or perhaps in the + user_state(RulesState).</p> + + <p>The following modes exist:</p> + + <ul> + + <li>If the user doesn't specify an option, the + scanner creates an ets table, and uses built-in functions to + read and write data to it. When the scanner is done, the ets + table is deleted.</li> + + <li>If the user specifies an ets table via the + <tt>{rules, Table}</tt> option, the + scanner uses this table. When the scanner is done, it does + <em>not</em> delete the table.</li> + + <li>If the user specifies read and write + functions, the scanner will use them instead.</li> + + </ul> + + <p>The format for the read and write functions are as + follows:</p> + + + <table border="1" cellspacing="0" cellpadding="5" width="100%" + bgcolor="#CCCCCC"><tr><td><pre><code> + + WriteFun(Context, Name, Definition, ScannerState) -> NewScannerState. + ReadFun(Context, Name, ScannerState) -> Definition | undefined. + </code></pre></td></tr></table> + + <p>Here is a summary of the data objects currently being + written by the scanner:</p> + + <table border="1" cellspacing="0" cellpadding="4" > + <colgroup cols="3"> + <thead> + <tr> + <th>Context</th> + <th>Key Value</th> + <th>Definition</th> + </tr> + </thead> + <tbody> + <tr> + <td>notation</td> + <td>NotationName</td> + <td><tt>{system, SL} | {public, PIDL, SL}</tt></td> + </tr> + <tr> + <td>elem_def</td> + <td>ElementName</td> + <td><tt>#xmlElement{content = ContentSpec}</tt></td> + </tr> + <tr> + <td>parameter_entity</td> + <td>PEName</td> + <td><tt>PEDef</tt></td> + </tr> + <tr> + <td>entity</td> + <td>EntityName</td> + <td><tt>EntityDef</tt></td> + </tr> + </tbody> + </colgroup> + <caption>Table 1 - 1 Scanner data objects</caption> + </table> + + <p>where</p> + <table border="1" cellspacing="0" cellpadding="5" width="100%" + bgcolor="#CCCCCC"><tr><td><pre><code> + + ContentSpec ::= empty | any | ElemContent + ElemContent ::= {Mode, Elems} + Mode ::= seq | choice + Elems ::= [Elem] + Elem ::= '#PCDATA' | Name | ElemContent | {Occurrence, Elems} + Occurrence ::= '*' | '?' | '+' + </code></pre></td></tr></table> + <table border="1" cellspacing="0" cellpadding="5" width="80%" + bgcolor="#CCCCCC"> + </table> + <p> + NOTE: <i>When <Elem> is not wrapped with + <Occurrence>, (Occurrence = once) is implied.</i></p> + + + + <h4 id="sect_1.1.7">1.7 Accumulator Function</h4> + <p><tt>{acc_fun, fun()}</tt></p> + + <p>The accumulator function is called to accumulate the + contents of an entity.When parsing very large files, it may + not be desireable to do so.In this case, an acc function can + be provided that simply doesn't accumulate.</p> + + <p>Note that it is possible to even modify the parsed + entity before accumulating it, but this must be done with + care. <tt>xmerl_scan</tt> performs + post-processing of the element for namespace management. Thus, + the element must keep its original structure for this to + work.</p> + + <p>The acc function has the following format and + semantics:</p> + + <table border="1" cellspacing="0" cellpadding="5" width="100%" + bgcolor="#CCCCCC"><tr><td><pre><code> + + %% default accumulating acc fun + fun(ParsedEntity, Acc, GlobalState) -> + {[ParsedEntity|Acc], GlobalState}. + + %% non-accumulating acc fun + fun(ParsedEntity, Acc, GlobalState) -> + {Acc, GlobalState}. + </code></pre></td></tr></table> + + + <h4 id="sect_1.1.8">1.8 Close Function</h4> + + <p>The close function is called when a document (either the + main document or an external DTD) has been completely + parsed. When xmerl_scan was started using + <tt>xmerl_scan:file/[1,2]</tt>, the + file will be read in full, and closed immediately, before the + parsing starts, so when the close function is called, it will + not need to actually close the file. In this case, the close + function will be a good place to modify the state + variables.</p> + + <p>Format and semantics:</p> + + <table border="1" cellspacing="0" cellpadding="5" width="100%" + bgcolor="#CCCCCC"><tr><td><pre><code> + + fun(GlobalState) -> + GlobalState1 = .... % state variables may be altered + </code></pre></td></tr></table> + + + <h3 id="sect_2.1">2 Examples</h3> +<p> +See <code>xmerl_test.erl</code> for more examples. +</p> + + <h4 id="sect_2.1.1">2.1 Handling spaces</h3> +<p> +The following sample program illustrates three ways of +scanning a document: +<ol> +<li> the default scan, which leaves whitespace untouched +<li> normalizing spaces +<li> normalizing spaces, then removing text elements that only + contain one space. +</ol> + +</p> + <table border="1" cellspacing="0" cellpadding="5" width="100%" +bgcolor="#CCCCCC"><tr><td><pre><code> +-module(tmp). + +-include("xmerl.hrl"). + +-export([file1/1, + file2/1, + file3/1]). + +file1(F) -> + xmerl_scan:file(F). + +file2(F) -> + xmerl_scan:file(F, [{space,normalize}]). + +file3(F) -> + Acc = fun(#xmlText{value = " ", pos = P}, Acc, S) -> + {Acc, P, S}; % new return format + (X, Acc, S) -> + {[X|Acc], S} + end, + xmerl_scan:file(F, [{space,normalize}, {acc_fun, Acc}]). + +</code></pre></td></tr></table> + + + + +</body> +</html> |