diff options
author | Sverker Eriksson <[email protected]> | 2017-08-30 21:00:35 +0200 |
---|---|---|
committer | Sverker Eriksson <[email protected]> | 2017-08-30 21:00:35 +0200 |
commit | 44a83c8860bbd00878c720a7b9d940b4630bab8a (patch) | |
tree | 101b3c52ec505a94f56c8f70e078ecb8a2e8c6cd /lib/xmerl | |
parent | 7c67bbddb53c364086f66260701bc54a61c9659c (diff) | |
parent | 040bdce67f88d833bfb59adae130a4ffb4c180f0 (diff) | |
download | otp-44a83c8860bbd00878c720a7b9d940b4630bab8a.tar.gz otp-44a83c8860bbd00878c720a7b9d940b4630bab8a.tar.bz2 otp-44a83c8860bbd00878c720a7b9d940b4630bab8a.zip |
Merge tag 'OTP-20.0' into sverker/20/binary_to_atom-utf8-crash/ERL-474/OTP-14590
Diffstat (limited to 'lib/xmerl')
29 files changed, 1344 insertions, 422 deletions
diff --git a/lib/xmerl/doc/src/notes.xml b/lib/xmerl/doc/src/notes.xml index 0abcb87998..1162561225 100644 --- a/lib/xmerl/doc/src/notes.xml +++ b/lib/xmerl/doc/src/notes.xml @@ -4,7 +4,7 @@ <chapter> <header> <copyright> - <year>2004</year><year>2016</year> + <year>2004</year><year>2017</year> <holder>Ericsson AB. All Rights Reserved.</holder> </copyright> <legalnotice> @@ -32,6 +32,116 @@ <p>This document describes the changes made to the Xmerl application.</p> +<section><title>Xmerl 1.3.15</title> + + <section><title>Fixed Bugs and Malfunctions</title> + <list> + <item> + <p> + Improves accumulator fun in xmerl_scan so that only one + #xmlText record is returned for strings which have + character references.</p> + <p> + (Thanks to Jimmy Zöger)</p> + <p> + Own Id: OTP-14377 Aux Id: PR-1369 </p> + </item> + </list> + </section> + +</section> + +<section><title>Xmerl 1.3.14</title> + + <section><title>Fixed Bugs and Malfunctions</title> + <list> + <item> + <p> A couple of bugs are fixed in the sax parser + (xmerl_sax_parser). </p> <list> <item>The continuation + function was not called correctly when the XML directive + was fragmented. </item> <item>When the event callback + modules (xmerl_sax_old_dom and xmerl_sax_simple) got an + endDocument event at certain conditions the parser + crashed.</item> <item>Replaced internal ets table with + map to avoid table leakage.</item> </list> + <p> + Own Id: OTP-14430</p> + </item> + </list> + </section> + +</section> + +<section><title>Xmerl 1.3.13</title> + + <section><title>Fixed Bugs and Malfunctions</title> + <list> + <item> + <p> + The namespace_conformant option in xmerl_scan did not + work when parsing documents without explicit XML + namespace declaration.</p> + <p> + Own Id: OTP-14139</p> + </item> + <item> + <p> Fix a "well-formedness" bug in the XML Sax parser so + it returns an error if there are something more in the + file after the matching document. If one using the + xmerl_sax_parser:stream() a rest is allowed which then + can be sent to a new call of xmerl_sax_parser:stream() to + parse next document. </p> <p> This is done to be + compliant with XML conformance tests. </p> + <p> + Own Id: OTP-14211</p> + </item> + <item> + <p> Fixed compiler and dialyzer warnings in the XML SAX + parser. </p> + <p> + Own Id: OTP-14212</p> + </item> + <item> + <p> Change how to interpret end of document in the XML + SAX parser to comply with Tim Brays comment on the + standard. This makes it possible to handle more than one + doc on a stream, the standard makes it impossible to know + when the document is ended without waiting for the next + document (and not always even that). </p> <p> Tim Brays + comment: </p> <p> Trailing "Misc"<br/> The fact that + you're allowed some trailing junk after the root element, + I decided (but unfortunately too late) is a real design + error in XML. If I'm writing a network client, I'm + probably going to close the link as soon as a I see the + root element end-tag, and not depend on the other end + closing it down properly.<br/> Furthermore, if I want to + send a succession of XML documents over a network link, + if I find a processing instruction after a root element, + is it a trailer on the previous document, or part of the + prolog of the next? </p> + <p> + Own Id: OTP-14213</p> + </item> + </list> + </section> + +</section> + +<section><title>Xmerl 1.3.12</title> + + <section><title>Fixed Bugs and Malfunctions</title> + <list> + <item> + <p> Fix a number of broken links in the xmerl + documentation. </p> + <p> + Own Id: OTP-13880</p> + </item> + </list> + </section> + +</section> + <section><title>Xmerl 1.3.11</title> <section><title>Improvements and New Features</title> diff --git a/lib/xmerl/notes.html b/lib/xmerl/notes.html deleted file mode 100644 index 43610a37fa..0000000000 --- a/lib/xmerl/notes.html +++ /dev/null @@ -1,28 +0,0 @@ -<HTML> -<HEAD> - <TITLE>xmerl Release Notes</TITLE> - <style type="text/css"> -<!-- - body { background: white; margin: 3em } - - body { font-family: Verdana, Arial, Helvetica, sans-serif } - h1 h2 h3 h4 { font-family: Verdana, Arial, Helvetica, sans-serif } - h1 { font-size: 48 } - p li { font-family: Verdana, Arial, Helvetica, sans-serif } ---> - </style> -</HEAD> - -<BODY BGCOLOR="#FFFFFF"> - -<CENTER><H1>xmerl Release Notes</H1></CENTER> - -<h2>xmerl 1.0</h2> - - -<p> -There are also release notes for -<a href="notes_history.html">older versions</a>. - -</body> -</html> diff --git a/lib/xmerl/src/xmerl_eventp.erl b/lib/xmerl/src/xmerl_eventp.erl index 2cb76abc6e..8d7ea25e24 100644 --- a/lib/xmerl/src/xmerl_eventp.erl +++ b/lib/xmerl/src/xmerl_eventp.erl @@ -25,6 +25,90 @@ %% Each contain more elaborate settings of xmerl_scan that makes usage of %% the customization functions. %% +%% @type xmlElement() = #xmlElement{}. +%% +%% @type option_list(). <p>Options allow to customize the behaviour of the +%% scanner. +%% See also <a href="xmerl_examples.html">tutorial</a> on customization +%% functions. +%% </p> +%% <p> +%% Possible options are: +%% </p> +%% <dl> +%% <dt><code>{acc_fun, Fun}</code></dt> +%% <dd>Call back function to accumulate contents of entity.</dd> +%% <dt><code>{continuation_fun, Fun} | +%% {continuation_fun, Fun, ContinuationState}</code></dt> +%% <dd>Call back function to decide what to do if the scanner runs into EOF +%% before the document is complete.</dd> +%% <dt><code>{event_fun, Fun} | +%% {event_fun, Fun, EventState}</code></dt> +%% <dd>Call back function to handle scanner events.</dd> +%% <dt><code>{fetch_fun, Fun} | +%% {fetch_fun, Fun, FetchState}</code></dt> +%% <dd>Call back function to fetch an external resource.</dd> +%% <dt><code>{hook_fun, Fun} | +%% {hook_fun, Fun, HookState}</code></dt> +%% <dd>Call back function to process the document entities once +%% identified.</dd> +%% <dt><code>{close_fun, Fun}</code></dt> +%% <dd>Called when document has been completely parsed.</dd> +%% <dt><code>{rules, ReadFun, WriteFun, RulesState} | +%% {rules, Rules}</code></dt> +%% <dd>Handles storing of scanner information when parsing.</dd> +%% <dt><code>{user_state, UserState}</code></dt> +%% <dd>Global state variable accessible from all customization functions</dd> +%% +%% <dt><code>{fetch_path, PathList}</code></dt> +%% <dd>PathList is a list of +%% directories to search when fetching files. If the file in question +%% is not in the fetch_path, the URI will be used as a file +%% name.</dd> +%% <dt><code>{space, Flag}</code></dt> +%% <dd>'preserve' (default) to preserve spaces, 'normalize' to +%% accumulate consecutive whitespace and replace it with one space.</dd> +%% <dt><code>{line, Line}</code></dt> +%% <dd>To specify starting line for scanning in document which contains +%% fragments of XML.</dd> +%% <dt><code>{namespace_conformant, Flag}</code></dt> +%% <dd>Controls whether to behave as a namespace conformant XML parser, +%% 'false' (default) to not otherwise 'true'.</dd> +%% <dt><code>{validation, Flag}</code></dt> +%% <dd>Controls whether to process as a validating XML parser: +%% 'off' (default) no validation, or validation 'dtd' by DTD or 'schema' +%% by XML Schema. 'false' and 'true' options are obsolete +%% (i.e. they may be removed in a future release), if used 'false' +%% equals 'off' and 'true' equals 'dtd'.</dd> +%% <dt><code>{schemaLocation, [{Namespace,Link}|...]}</code></dt> +%% <dd>Tells explicitly which XML Schema documents to use to validate +%% the XML document. Used together with the +%% <code>{validation,schema}</code> option.</dd> +%% <dt><code>{quiet, Flag}</code></dt> +%% <dd>Set to 'true' if xmerl should behave quietly and not output any +%% information to standard output (default 'false').</dd> +%% <dt><code>{doctype_DTD, DTD}</code></dt> +%% <dd>Allows to specify DTD name when it isn't available in the XML +%% document. This option has effect only together with +%% <code>{validation,'dtd'</code> option.</dd> +%% <dt><code>{xmlbase, Dir}</code></dt> +%% <dd>XML Base directory. If using string/1 default is current directory. +%% If using file/1 default is directory of given file.</dd> +%% <dt><code>{encoding, Enc}</code></dt> +%% <dd>Set default character set used (default UTF-8). +%% This character set is used only if not explicitly given by the XML +%% declaration. </dd> +%% <dt><code>{document, Flag}</code></dt> +%% <dd>Set to 'true' if xmerl should return a complete XML document +%% as an xmlDocument record (default 'false').</dd> +%% <dt><code>{comments, Flag}</code></dt> +%% <dd>Set to 'false' if xmerl should skip comments otherwise they will +%% be returned as xmlComment records (default 'true').</dd> +%% <dt><code>{default_attrs, Flag}</code></dt> +%% <dd>Set to 'true' if xmerl should add to elements missing attributes +%% with a defined default value (default 'false').</dd> +%% </dl> +%% -module(xmerl_eventp). -vsn('0.19'). -date('03-09-17'). diff --git a/lib/xmerl/src/xmerl_regexp.erl b/lib/xmerl/src/xmerl_regexp.erl index fc89b80ff1..1bf8496673 100644 --- a/lib/xmerl/src/xmerl_regexp.erl +++ b/lib/xmerl/src/xmerl_regexp.erl @@ -1,7 +1,7 @@ %% %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2006-2016. All Rights Reserved. +%% Copyright Ericsson AB 2006-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -1154,7 +1154,7 @@ comp_crs([], Last) -> [{Last,maxchar}]. %% build_dfa(NFA, NfaStartState) -> {DFA,DfaStartState}. %% Build a DFA from an NFA using "subset construction". The major %% difference from the book is that we keep the marked and unmarked -%% DFA states in seperate lists. New DFA states are added to the +%% DFA states in separate lists. New DFA states are added to the %% unmarked list and states are marked by moving them to the marked %% list. We assume that the NFA accepting state numbers are in %% ascending order for the rules and use ordsets to keep this order. diff --git a/lib/xmerl/src/xmerl_sax_old_dom.erl b/lib/xmerl/src/xmerl_sax_old_dom.erl index fefcf03fce..6d0d836487 100644 --- a/lib/xmerl/src/xmerl_sax_old_dom.erl +++ b/lib/xmerl/src/xmerl_sax_old_dom.erl @@ -2,7 +2,7 @@ %%-------------------------------------------------------------------- %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2009-2016. All Rights Reserved. +%% Copyright Ericsson AB 2009-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -127,9 +127,10 @@ build_dom(endDocument, State#xmerl_sax_old_dom_state{dom=[Decl, Current#xmlElement{ content=lists:reverse(C) }]}; - _ -> - %%?dbg("~p\n", [D]), - ?error("we're not at end the document when endDocument event is encountered.") + _ -> + %% endDocument is also sent by the parser when a fault occur to tell + %% the event receiver that no more input will be sent + State end; %% Element diff --git a/lib/xmerl/src/xmerl_sax_parser.erl b/lib/xmerl/src/xmerl_sax_parser.erl index 318a0cf7f4..e383c4c349 100644 --- a/lib/xmerl/src/xmerl_sax_parser.erl +++ b/lib/xmerl/src/xmerl_sax_parser.erl @@ -1,7 +1,7 @@ %%-------------------------------------------------------------------- %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2008-2016. All Rights Reserved. +%% Copyright Ericsson AB 2008-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -33,6 +33,7 @@ %% External exports %%---------------------------------------------------------------------- -export([file/2, + stream/3, stream/2]). %%---------------------------------------------------------------------- @@ -63,7 +64,7 @@ %% Description: Parse file containing an XML document. %%---------------------------------------------------------------------- file(Name,Options) -> - case file:open(Name, [raw, read,binary]) of + case file:open(Name, [raw, read_ahead, read,binary]) of {error, Reason} -> {error,{Name, file:format_error(Reason)}}; {ok, FD} -> @@ -72,11 +73,12 @@ file(Name,Options) -> File = filename:basename(Name), ContinuationFun = fun default_continuation_cb/1, Res = stream(<<>>, - [{continuation_fun, ContinuationFun}, - {continuation_state, FD}, - {current_location, CL}, - {entity, File} - |Options]), + [{continuation_fun, ContinuationFun}, + {continuation_state, FD}, + {current_location, CL}, + {entity, File} + |Options], + file), ok = file:close(FD), Res end. @@ -92,19 +94,22 @@ file(Name,Options) -> %% EventState = term() %% Description: Parse a stream containing an XML document. %%---------------------------------------------------------------------- -stream(Xml, Options) when is_list(Xml), is_list(Options) -> +stream(Xml, Options) -> + stream(Xml, Options, stream). + +stream(Xml, Options, InputType) when is_list(Xml), is_list(Options) -> State = parse_options(Options, initial_state()), - case State#xmerl_sax_parser_state.file_type of + case State#xmerl_sax_parser_state.file_type of dtd -> xmerl_sax_parser_list:parse_dtd(Xml, State#xmerl_sax_parser_state{encoding = list, - input_type = stream}); + input_type = InputType}); normal -> xmerl_sax_parser_list:parse(Xml, State#xmerl_sax_parser_state{encoding = list, - input_type = stream}) + input_type = InputType}) end; -stream(Xml, Options) when is_binary(Xml), is_list(Options) -> +stream(Xml, Options, InputType) when is_binary(Xml), is_list(Options) -> case parse_options(Options, initial_state()) of {error, Reason} -> {error, Reason}; State -> @@ -115,21 +120,22 @@ stream(Xml, Options) when is_binary(Xml), is_list(Options) -> normal -> parse end, - case detect_charset(Xml, State) of - {error, Reason} -> {fatal_error, - { - State#xmerl_sax_parser_state.current_location, - State#xmerl_sax_parser_state.entity, - 1 - }, - Reason, - [], - State#xmerl_sax_parser_state.event_state}; - {Xml1, State1} -> - parse_binary(Xml1, - State1#xmerl_sax_parser_state{input_type = stream}, - ParseFunction) - end + try + {Xml1, State1} = detect_charset(Xml, State), + parse_binary(Xml1, + State1#xmerl_sax_parser_state{input_type = InputType}, + ParseFunction) + catch + throw:{fatal_error, {State2, Reason}} -> + {fatal_error, + { + State2#xmerl_sax_parser_state.current_location, + State2#xmerl_sax_parser_state.entity, + 1 + }, + Reason, [], + State2#xmerl_sax_parser_state.event_state} + end end. %%---------------------------------------------------------------------- @@ -151,8 +157,8 @@ parse_binary(Xml, #xmerl_sax_parser_state{encoding={utf16,big}}=State, F) -> xmerl_sax_parser_utf16be:F(Xml, State); parse_binary(Xml, #xmerl_sax_parser_state{encoding=latin1}=State, F) -> xmerl_sax_parser_latin1:F(Xml, State); -parse_binary(_, #xmerl_sax_parser_state{encoding=Enc}, _) -> - {error, lists:flatten(io_lib:format("Charcter set ~p not supported", [Enc]))}. +parse_binary(_, #xmerl_sax_parser_state{encoding=Enc}, State) -> + ?fatal_error(State, lists:flatten(io_lib:format("Charcter set ~p not supported", [Enc]))). %%---------------------------------------------------------------------- %% Function: initial_state/0 @@ -206,8 +212,7 @@ parse_options([{entity, Entity} |Options], State) -> parse_options([skip_external_dtd |Options], State) -> parse_options(Options, State#xmerl_sax_parser_state{skip_external_dtd = true}); parse_options([O |_], _State) -> - {error, - lists:flatten(io_lib:format("Option: ~p not supported", [O]))}. + {error, lists:flatten(io_lib:format("Option: ~p not supported", [O]))}. check_encoding_option(E) when E==utf8; E=={utf16,little}; E=={utf16,big}; @@ -225,16 +230,10 @@ check_encoding_option(E) -> %% Output: {utf8|utf16le|utf16be|iso8859, Xml, State} %% Description: Detects which character set is used in a binary stream. %%---------------------------------------------------------------------- -detect_charset(<<>>, #xmerl_sax_parser_state{continuation_fun = undefined} = _) -> - throw({error, "Can't detect character encoding due to no indata"}); -detect_charset(<<>>, #xmerl_sax_parser_state{continuation_fun = CFun, - continuation_state = CState} = State) -> - case CFun(CState) of - {<<>>, _} -> - throw({error, "Can't detect character encoding due to lack of indata"}); - {NewBytes, NewContState} -> - detect_charset(NewBytes, State#xmerl_sax_parser_state{continuation_state = NewContState}) - end; +detect_charset(<<>>, #xmerl_sax_parser_state{continuation_fun = undefined} = State) -> + ?fatal_error(State, "Can't detect character encoding due to lack of indata"); +detect_charset(<<>>, State) -> + cf(<<>>, State, fun detect_charset/2); detect_charset(Bytes, State) -> case unicode:bom_to_encoding(Bytes) of {latin1, 0} -> @@ -244,25 +243,47 @@ detect_charset(Bytes, State) -> {RealBytes, State#xmerl_sax_parser_state{encoding=Enc}} end. +detect_charset_1(<<16#00>> = Xml, State) -> + cf(Xml, State, fun detect_charset_1/2); +detect_charset_1(<<16#00, 16#3C>> = Xml, State) -> + cf(Xml, State, fun detect_charset_1/2); +detect_charset_1(<<16#00, 16#3C, 16#00>> = Xml, State) -> + cf(Xml, State, fun detect_charset_1/2); detect_charset_1(<<16#00, 16#3C, 16#00, 16#3F, _/binary>> = Xml, State) -> {Xml, State#xmerl_sax_parser_state{encoding={utf16, big}}}; +detect_charset_1(<<16#3C>> = Xml, State) -> + cf(Xml, State, fun detect_charset_1/2); +detect_charset_1(<<16#3C, 16#00>> = Xml, State) -> + cf(Xml, State, fun detect_charset_1/2); +detect_charset_1(<<16#3C, 16#00, 16#3F>> = Xml, State) -> + cf(Xml, State, fun detect_charset_1/2); detect_charset_1(<<16#3C, 16#00, 16#3F, 16#00, _/binary>> = Xml, State) -> {Xml, State#xmerl_sax_parser_state{encoding={utf16, little}}}; -detect_charset_1(<<16#3C, 16#3F, 16#78, 16#6D, 16#6C, Xml2/binary>> = Xml, State) -> - case parse_xml_directive(Xml2) of +detect_charset_1(<<16#3C>> = Xml, State) -> + cf(Xml, State, fun detect_charset_1/2); +detect_charset_1(<<16#3C, 16#3F>> = Xml, State) -> + cf(Xml, State, fun detect_charset_1/2); +detect_charset_1(<<16#3C, 16#3F, 16#78>> = Xml, State) -> + cf(Xml, State, fun detect_charset_1/2); +detect_charset_1(<<16#3C, 16#3F, 16#78, 16#6D>> = Xml, State) -> + cf(Xml, State, fun detect_charset_1/2); +detect_charset_1(<<16#3C, 16#3F, 16#78, 16#6D, 16#6C, Xml2/binary>>, State) -> + {Xml3, State1} = read_until_end_of_xml_directive(Xml2, State), + case parse_xml_directive(Xml3) of {error, Reason} -> - {error, Reason}; + ?fatal_error(State, Reason); AttrList -> case lists:keysearch("encoding", 1, AttrList) of {value, {_, E}} -> case convert_encoding(E) of {error, Reason} -> - {error, Reason}; + ?fatal_error(State, Reason); Enc -> - {Xml, State#xmerl_sax_parser_state{encoding=Enc}} + {<<16#3C, 16#3F, 16#78, 16#6D, 16#6C, Xml3/binary>>, + State1#xmerl_sax_parser_state{encoding=Enc}} end; _ -> - {Xml, State} + {<<16#3C, 16#3F, 16#78, 16#6D, 16#6C, Xml3/binary>>, State1} end end; detect_charset_1(Xml, State) -> @@ -372,7 +393,7 @@ parse_value_1(<<C, Rest/binary>>, Stop, Acc) -> parse_value_1(Rest, Stop, [C |Acc]). %%====================================================================== -%%Default functions +%% Default functions %%====================================================================== %%---------------------------------------------------------------------- %% Function: default_event_cb(Event, LineNo, State) -> Result @@ -388,7 +409,7 @@ default_event_cb(_Event, _LineNo, State) -> %%---------------------------------------------------------------------- %% Function: default_continuation_cb(IoDevice) -> Result %% IoDevice = iodevice() -%% Output: Result = {[char()], State} +%% Output: Result = {binary(), IoDevice} %% Description: Default continuation callback reading blocks. %%---------------------------------------------------------------------- default_continuation_cb(IoDevice) -> @@ -398,3 +419,82 @@ default_continuation_cb(IoDevice) -> {ok, FileBin} -> {FileBin, IoDevice} end. + +%%---------------------------------------------------------------------- +%% Function: read_until_end_of_xml_directive(Rest, State) -> Result +%% Rest = binary() +%% Output: Result = {binary(), State} +%% Description: Reads a utf8 or latin1 until it finds '?>' +%%---------------------------------------------------------------------- +read_until_end_of_xml_directive(Rest, State) -> + case binary:match(Rest, <<"?>">>) of + nomatch -> + case cf(Rest, State) of + {<<>>, _} -> + ?fatal_error(State, "Can't detect character encoding due to lack of indata"); + {NewBytes, NewState} -> + read_until_end_of_xml_directive(NewBytes, NewState) + end; + _ -> + {Rest, State} + end. + + +%%---------------------------------------------------------------------- +%% Function : cf(Rest, State) -> Result +%% Parameters: Rest = binary() +%% State = #xmerl_sax_parser_state{} +%% NextCall = fun() +%% Result : {Rest, State} +%% Description: Function that uses provided fun to read another chunk from +%% input stream and calls the fun in NextCall. +%%---------------------------------------------------------------------- +cf(_Rest, #xmerl_sax_parser_state{continuation_fun = undefined} = State) -> + ?fatal_error(State, "Continuation function undefined"); +cf(Rest, #xmerl_sax_parser_state{continuation_fun = CFun, continuation_state = CState} = State) -> + Result = + try + CFun(CState) + catch + throw:ErrorTerm -> + ?fatal_error(State, ErrorTerm); + exit:Reason -> + ?fatal_error(State, {'EXIT', Reason}) + end, + case Result of + {<<>>, _} -> + ?fatal_error(State, "Can't detect character encoding due to lack of indata"); + {NewBytes, NewContState} -> + {<<Rest/binary, NewBytes/binary>>, + State#xmerl_sax_parser_state{continuation_state = NewContState}} + end. + +%%---------------------------------------------------------------------- +%% Function : cf(Rest, State, NextCall) -> Result +%% Parameters: Rest = binary() +%% State = #xmerl_sax_parser_state{} +%% NextCall = fun() +%% Result : {Rest, State} +%% Description: Function that uses provided fun to read another chunk from +%% input stream and calls the fun in NextCall. +%%---------------------------------------------------------------------- +cf(_Rest, #xmerl_sax_parser_state{continuation_fun = undefined} = State, _) -> + ?fatal_error(State, "Continuation function undefined"); +cf(Rest, #xmerl_sax_parser_state{continuation_fun = CFun, continuation_state = CState} = State, + NextCall) -> + Result = + try + CFun(CState) + catch + throw:ErrorTerm -> + ?fatal_error(State, ErrorTerm); + exit:Reason -> + ?fatal_error(State, {'EXIT', Reason}) + end, + case Result of + {<<>>, _} -> + ?fatal_error(State, "Can't detect character encoding due to lack of indata"); + {NewBytes, NewContState} -> + NextCall(<<Rest/binary, NewBytes/binary>>, + State#xmerl_sax_parser_state{continuation_state = NewContState}) + end. diff --git a/lib/xmerl/src/xmerl_sax_parser.hrl b/lib/xmerl/src/xmerl_sax_parser.hrl index 932ab0cec5..56a3a42e5f 100644 --- a/lib/xmerl/src/xmerl_sax_parser.hrl +++ b/lib/xmerl/src/xmerl_sax_parser.hrl @@ -1,7 +1,7 @@ %%-------------------------------------------------------------------- %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2008-2016. All Rights Reserved. +%% Copyright Ericsson AB 2008-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -88,14 +88,7 @@ current_location, % Location of the currently parsed XML entity entity, % Parsed XML entity skip_external_dtd = false,% If true the external DTD is skipped during parsing - input_type % Source type: file | stream. - % This field is a preparation for an fix in R17 of a bug in - % the conformance against the standard. - % Today a file which contains two XML documents will be considered - % well-formed and the second is placed in the rest part of the - % return tuple, according to the conformance tests this should fail. - % In the future this will fail if xmerl_sax_aprser:file/2 is used but - % left to the user in the xmerl_sax_aprser:stream/2 case. + input_type % Source type: file | stream }). diff --git a/lib/xmerl/src/xmerl_sax_parser_base.erlsrc b/lib/xmerl/src/xmerl_sax_parser_base.erlsrc index 4d75805b9b..1dca9608cb 100644 --- a/lib/xmerl/src/xmerl_sax_parser_base.erlsrc +++ b/lib/xmerl/src/xmerl_sax_parser_base.erlsrc @@ -1,7 +1,7 @@ %%-*-erlang-*- %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2008-2016. All Rights Reserved. +%% Copyright Ericsson AB 2008-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -64,29 +64,42 @@ %% Description: Parsing XML from input stream. %%---------------------------------------------------------------------- parse(Xml, State) -> - RefTable = ets:new(xmerl_sax_entity_refs, [private]), - - State1 = event_callback(startDocument, State), - - case catch parse_document(Xml, State1#xmerl_sax_parser_state{ref_table=RefTable}) of - {ok, Rest, State2} -> - State3 = event_callback(endDocument, State2), - ets:delete(RefTable), - {ok, State3#xmerl_sax_parser_state.event_state, Rest}; - {fatal_error, {State2, Reason}} -> - State3 = event_callback(endDocument, State2), - ets:delete(RefTable), - format_error(fatal_error, State3, Reason); - {event_receiver_error, State2, {Tag, Reason}} -> - State3 = event_callback(endDocument, State2), - ets:delete(RefTable), - format_error(Tag, State3, Reason); - Other -> - _State2 = event_callback(endDocument, State1), - ets:delete(RefTable), - throw(Other) + RefTable = maps:new(), + + try + State1 = event_callback(startDocument, State), + Result = parse_document(Xml, State1#xmerl_sax_parser_state{ref_table=RefTable}), + handle_end_document(Result) + catch + throw:Exception -> + handle_end_document(Exception); + _:OtherError -> + handle_end_document({other, OtherError, State}) end. + % case catch parse_document(Xml, State1#xmerl_sax_parser_state{ref_table=RefTable}) of + % {ok, Rest, State2} -> + % State3 = event_callback(endDocument, State2), + % case check_if_rest_ok(State3#xmerl_sax_parser_state.input_type, Rest) of + % true -> + % {ok, State3#xmerl_sax_parser_state.event_state, Rest}; + % false -> + % format_error(fatal_error, State3, "Input found after legal document") + % end; + % {fatal_error, {State2, Reason}} -> + % State3 = event_callback(endDocument, State2), + % format_error(fatal_error, State3, Reason); + % {event_receiver_error, State2, {Tag, Reason}} -> + % State3 = event_callback(endDocument, State2), + % format_error(Tag, State3, Reason); + % {endDocument, Rest, State2} -> + % State3 = event_callback(endDocument, State2), + % {ok, State3#xmerl_sax_parser_state.event_state, Rest}; + % Other -> + % _State2 = event_callback(endDocument, State1), + % {fatal_error, Other} + % end. + %%---------------------------------------------------------------------- %% Function: parse_dtd(Xml, State) -> Result %% Input: Xml = string() | binary() @@ -96,38 +109,120 @@ parse(Xml, State) -> %% Description: Parsing XML DTD from input stream. %%---------------------------------------------------------------------- parse_dtd(Xml, State) -> - RefTable = ets:new(xmerl_sax_entity_refs, [private]), - - State1 = event_callback(startDocument, State), - - case catch parse_external_entity_1(Xml, State1#xmerl_sax_parser_state{ref_table=RefTable}) of - {fatal_error, {State2, Reason}} -> - State3 = event_callback(endDocument, State2), - ets:delete(RefTable), - format_error(fatal_error, State3, Reason); - {event_receiver_error, State2, {Tag, Reason}} -> - State3 = event_callback(endDocument, State2), - format_error(Tag, State3, Reason); - {Rest, State2} when is_record(State2, xmerl_sax_parser_state) -> - State3 = event_callback(endDocument, State2), - ets:delete(RefTable), - {ok, State3#xmerl_sax_parser_state.event_state, Rest}; - {endDocument, Rest, State2} when is_record(State2, xmerl_sax_parser_state) -> - State3 = event_callback(endDocument, State2), - ets:delete(RefTable), - {ok, State3#xmerl_sax_parser_state.event_state, Rest}; - Other -> - _State2 = event_callback(endDocument, State1), - ets:delete(RefTable), - throw(Other) + RefTable = maps:new(), + + try + State1 = event_callback(startDocument, State), + Result = parse_external_entity_1(Xml, State1#xmerl_sax_parser_state{ref_table=RefTable}), + handle_end_document(Result) + catch + throw:Exception -> + handle_end_document(Exception); + _:OtherError -> + handle_end_document({other, OtherError, State}) end. + % case catch parse_external_entity_1(Xml, State1#xmerl_sax_parser_state{ref_table=RefTable}) of + % {fatal_error, {State2, Reason}} -> + % State3 = event_callback(endDocument, State2), + % format_error(fatal_error, State3, Reason); + % {event_receiver_error, State2, {Tag, Reason}} -> + % State3 = event_callback(endDocument, State2), + % format_error(Tag, State3, Reason); + % {Rest, State2} when is_record(State2, xmerl_sax_parser_state) -> + % State3 = event_callback(endDocument, State2), + % {ok, State3#xmerl_sax_parser_state.event_state, Rest}; + % {endDocument, Rest, State2} when is_record(State2, xmerl_sax_parser_state) -> + % State3 = event_callback(endDocument, State2), + % {ok, State3#xmerl_sax_parser_state.event_state, Rest}; + % Other -> + % _State2 = event_callback(endDocument, State1), + % {fatal_error, Other} + % end. + + %%====================================================================== %% Internal functions %%====================================================================== %%---------------------------------------------------------------------- +%% Function: handle_end_document(ParserResult) -> Result +%% Input: ParseResult = term() +%% Output: Result = {ok, Rest, EventState} | +%% EventState = term() +%% Description: Ends the parsing and formats output +%%---------------------------------------------------------------------- +handle_end_document({ok, Rest, State}) -> + %%ok case from parse + try + State1 = event_callback(endDocument, State), + case check_if_rest_ok(State1#xmerl_sax_parser_state.input_type, Rest) of + true -> + {ok, State1#xmerl_sax_parser_state.event_state, Rest}; + false -> + format_error(fatal_error, State1, "Input found after legal document") + end + catch + throw:{event_receiver_error, State2, {Tag, Reason}} -> + format_error(Tag, State2, Reason); + _:Other -> + {fatal_error, Other} + end; +handle_end_document({endDocument, Rest, State}) -> + %% ok case from parse and parse_dtd + try + State1 = event_callback(endDocument, State), + {ok, State1#xmerl_sax_parser_state.event_state, Rest} + catch + throw:{event_receiver_error, State2, {Tag, Reason}} -> + format_error(Tag, State2, Reason); + _:Other -> + {fatal_error, Other} + end; +handle_end_document({fatal_error, {State, Reason}}) -> + try + State1 = event_callback(endDocument, State), + format_error(fatal_error, State1, Reason) + catch + throw:{event_receiver_error, State2, {Tag, Reason}} -> + format_error(Tag, State2, Reason); + _:Other -> + {fatal_error, Other} + end; +handle_end_document({event_receiver_error, State, {Tag, Reason}}) -> + try + State1 = event_callback(endDocument, State), + format_error(Tag, State1, Reason) + catch + throw:{event_receiver_error, State2, {Tag, Reason}} -> + format_error(Tag, State2, Reason); + _:Other -> + {fatal_error, Other} + end; +handle_end_document({Rest, State}) when is_record(State, xmerl_sax_parser_state) -> + %%ok case from parse_dtd + try + State1 = event_callback(endDocument, State), + {ok, State1#xmerl_sax_parser_state.event_state, Rest} + catch + throw:{event_receiver_error, State2, {Tag, Reason}} -> + format_error(Tag, State2, Reason); + _:Other -> + {fatal_error, Other} + end; +handle_end_document({other, Error, State}) -> + try + _State1 = event_callback(endDocument, State), + {fatal_error, Error} + catch + throw:{event_receiver_error, State2, {Tag, Reason}} -> + format_error(Tag, State2, Reason); + _:Other -> + {fatal_error, Other} + end. + +%%---------------------------------------------------------------------- %% Function: parse_document(Rest, State) -> Result %% Input: Rest = string() | binary() %% State = #xmerl_sax_parser_state{} @@ -136,10 +231,11 @@ parse_dtd(Xml, State) -> %% [1] document ::= prolog element Misc* %%---------------------------------------------------------------------- parse_document(Rest, State) when is_record(State, xmerl_sax_parser_state) -> - {Rest1, State1} = parse_xml_decl(Rest, State), + {Rest1, State1} = parse_byte_order_mark(Rest, State), {Rest2, State2} = parse_misc(Rest1, State1, true), {ok, Rest2, State2}. +?PARSE_BYTE_ORDER_MARK(Bytes, State). %%---------------------------------------------------------------------- %% Function: parse_xml_decl(Rest, State) -> Result @@ -150,15 +246,8 @@ parse_document(Rest, State) when is_record(State, xmerl_sax_parser_state) -> %% [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? %% [23] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>' %%---------------------------------------------------------------------- --dialyzer({[no_fail_call, no_match], parse_xml_decl/2}). parse_xml_decl(?STRING_EMPTY, State) -> cf(?STRING_EMPTY, State, fun parse_xml_decl/2); -parse_xml_decl(?BYTE_ORDER_MARK_1, State) -> - cf(?BYTE_ORDER_MARK_1, State, fun parse_xml_decl/2); -parse_xml_decl(?BYTE_ORDER_MARK_2, State) -> - cf(?BYTE_ORDER_MARK_2, State, fun parse_xml_decl/2); -parse_xml_decl(?BYTE_ORDER_MARK_REST(Rest), State) -> - cf(Rest, State, fun parse_xml_decl/2); parse_xml_decl(?STRING("<") = Bytes, State) -> cf(Bytes, State, fun parse_xml_decl/2); parse_xml_decl(?STRING("<?") = Bytes, State) -> @@ -170,31 +259,19 @@ parse_xml_decl(?STRING("<?xm") = Bytes, State) -> parse_xml_decl(?STRING("<?xml") = Bytes, State) -> cf(Bytes, State, fun parse_xml_decl/2); parse_xml_decl(?STRING_REST("<?xml", Rest1), State) -> - parse_xml_decl_1(Rest1, State); -parse_xml_decl(Bytes, #xmerl_sax_parser_state{encoding=Enc} = State) when is_binary(Bytes) -> - case unicode:characters_to_list(Bytes, Enc) of - {incomplete, _, _} -> - cf(Bytes, State, fun parse_xml_decl/2); - {error, _Encoded, _Rest} -> - ?fatal_error(State, lists:flatten(io_lib:format("Bad character, not in ~p\n", [Enc]))); - _ -> - parse_prolog(Bytes, State) - end; -parse_xml_decl(Bytes, State) -> - parse_prolog(Bytes, State). - + parse_xml_decl_rest(Rest1, State); +?PARSE_XML_DECL(Bytes, State). -parse_xml_decl_1(?STRING_UNBOUND_REST(C, Rest) = Bytes, State) -> +parse_xml_decl_rest(?STRING_UNBOUND_REST(C, Rest) = Bytes, State) -> if ?is_whitespace(C) -> {_XmlAttributes, Rest1, State1} = parse_version_info(Rest, State, []), - %State2 = event_callback({processingInstruction, "xml", XmlAttributes}, State1),% The XML decl. should not be reported as a PI parse_prolog(Rest1, State1); true -> parse_prolog(?STRING_REST("<?xml", Bytes), State) end; -parse_xml_decl_1(Bytes, State) -> - unicode_incomplete_check([Bytes, State, fun parse_xml_decl_1/2], undefined). +parse_xml_decl_rest(Bytes, State) -> + unicode_incomplete_check([Bytes, State, fun parse_xml_decl_rest/2], undefined). @@ -216,8 +293,6 @@ parse_prolog(?STRING_REST("<?", Rest), State) -> parse_prolog(Rest1, State1); {endDocument, Rest1, State1} -> parse_prolog(Rest1, State1) - % IValue = ?TO_INPUT_FORMAT("<?"), - % {?APPEND_STRING(IValue, Rest1), State1} end; parse_prolog(?STRING_REST("<!", Rest), State) -> parse_prolog_1(Rest, State); @@ -230,7 +305,6 @@ parse_prolog(Bytes, State) -> unicode_incomplete_check([Bytes, State, fun parse_prolog/2], "expecting < or whitespace"). - parse_prolog_1(?STRING_EMPTY, State) -> cf(?STRING_EMPTY, State, fun parse_prolog_1/2); parse_prolog_1(?STRING("D") = Bytes, State) -> @@ -442,6 +516,16 @@ check_if_new_doc_allowed(stream, []) -> check_if_new_doc_allowed(_, _) -> false. +check_if_rest_ok(file, []) -> + true; +check_if_rest_ok(file, <<>>) -> + true; +check_if_rest_ok(stream, _) -> + true; +check_if_rest_ok(_, _) -> + false. + + %%---------------------------------------------------------------------- %% Function: parse_pi_1(Rest, State) -> Result %% Input: Rest = string() | binary() @@ -886,11 +970,11 @@ send_end_prefix_mapping_event([{Prefix, _Uri} |Ns], State) -> parse_eq(?STRING_EMPTY, State) -> cf(?STRING_EMPTY, State, fun parse_eq/2); parse_eq(?STRING_REST("=", Rest), State) -> - {Rest, State}; + {Rest, State}; parse_eq(?STRING_UNBOUND_REST(C, _) = Bytes, State) when ?is_whitespace(C) -> - {_WS, Rest, State1} = - whitespace(Bytes, State, []), - parse_eq(Rest, State1); + {_WS, Rest, State1} = + whitespace(Bytes, State, []), + parse_eq(Rest, State1); parse_eq(Bytes, State) -> unicode_incomplete_check([Bytes, State, fun parse_eq/2], "expecting = or whitespace"). @@ -908,11 +992,11 @@ parse_eq(Bytes, State) -> parse_att_value(?STRING_EMPTY, State) -> cf(?STRING_EMPTY, State, fun parse_att_value/2); parse_att_value(?STRING_UNBOUND_REST(C, Rest), State) when C == $'; C == $" -> - parse_att_value(Rest, State, C, []); + parse_att_value(Rest, State, C, []); parse_att_value(?STRING_UNBOUND_REST(C, _) = Bytes, State) when ?is_whitespace(C) -> - {_WS, Rest, State1} = - whitespace(Bytes, State, []), - parse_att_value(Rest, State1); + {_WS, Rest, State1} = + whitespace(Bytes, State, []), + parse_att_value(Rest, State1); parse_att_value(Bytes, State) -> unicode_incomplete_check([Bytes, State, fun parse_att_value/2], "\', \" or whitespace expected"). @@ -1024,16 +1108,21 @@ parse_etag(Bytes, State) -> unicode_incomplete_check([Bytes, State, fun parse_etag/2], undefined). - parse_etag_1(?STRING_REST(">", Rest), #xmerl_sax_parser_state{end_tags=[{_ETag, Uri, LocalName, QName, OldNsList, NewNsList} - |RestOfETags]} = State, _Tag) -> + |RestOfETags], + input_type=InputType} = State, _Tag) -> State1 = event_callback({endElement, Uri, LocalName, QName}, State), State2 = send_end_prefix_mapping_event(NewNsList, State1), - parse_content(Rest, - State2#xmerl_sax_parser_state{end_tags=RestOfETags, - ns = OldNsList}, - [], true); + case check_if_new_doc_allowed(InputType, RestOfETags) of + true -> + throw({endDocument, Rest, State2#xmerl_sax_parser_state{ns = OldNsList}}); + false -> + parse_content(Rest, + State2#xmerl_sax_parser_state{end_tags=RestOfETags, + ns = OldNsList}, + [], true) + end; parse_etag_1(?STRING_UNBOUND_REST(_C, _), State, Tag) -> {P,TN} = Tag, ?fatal_error(State, "Bad EndTag: " ++ P ++ ":" ++ TN); @@ -1051,21 +1140,26 @@ parse_etag_1(Bytes, State, Tag) -> %% Description: Parsing the content part of tags %% [43] content ::= (element | CharData | Reference | CDSect | PI | Comment)* %%---------------------------------------------------------------------- - parse_content(?STRING_EMPTY, State, Acc, IgnorableWS) -> - case catch cf(?STRING_EMPTY, State, Acc, IgnorableWS, fun parse_content/4) of - {Rest, State1} when is_record(State1, xmerl_sax_parser_state) -> - {Rest, State1}; - {fatal_error, {State1, Msg}} -> - case check_if_document_complete(State1, Msg) of - true -> - State2 = send_character_event(length(Acc), IgnorableWS, lists:reverse(Acc), State1), - {?STRING_EMPTY, State2}; - false -> - ?fatal_error(State1, Msg) - end; - Other -> - throw(Other) + case check_if_document_complete(State, "No more bytes") of + true -> + State1 = send_character_event(length(Acc), IgnorableWS, lists:reverse(Acc), State), + {?STRING_EMPTY, State1}; + false -> + case catch cf(?STRING_EMPTY, State, Acc, IgnorableWS, fun parse_content/4) of + {Rest, State1} when is_record(State1, xmerl_sax_parser_state) -> + {Rest, State1}; + {fatal_error, {State1, Msg}} -> + case check_if_document_complete(State1, Msg) of + true -> + State2 = send_character_event(length(Acc), IgnorableWS, lists:reverse(Acc), State1), + {?STRING_EMPTY, State2}; + false -> + ?fatal_error(State1, Msg) + end; + Other -> + throw(Other) + end end; parse_content(?STRING("\r") = Bytes, State, Acc, IgnorableWS) -> cf(Bytes, State, Acc, IgnorableWS, fun parse_content/4); @@ -1094,7 +1188,7 @@ parse_content(?STRING_REST("<?", Rest), State, Acc, IgnorableWS) -> parse_content(?STRING_REST("<!", Rest1) = Rest, #xmerl_sax_parser_state{end_tags = ET} = State, Acc, IgnorableWS) -> case ET of [] -> - {Rest, State}; %%LATH : Skicka ignorable WS ??? + {Rest, State}; %% Skicka ignorable WS ??? _ -> State1 = send_character_event(length(Acc), IgnorableWS, lists:reverse(Acc), State), parse_cdata(Rest1, State1) @@ -1102,7 +1196,7 @@ parse_content(?STRING_REST("<!", Rest1) = Rest, #xmerl_sax_parser_state{end_tags parse_content(?STRING_REST("<", Rest1) = Rest, #xmerl_sax_parser_state{end_tags = ET} = State, Acc, IgnorableWS) -> case ET of [] -> - {Rest, State}; %%LATH : Skicka ignorable WS ??? + {Rest, State}; %% Skicka ignorable WS ??? _ -> State1 = send_character_event(length(Acc), IgnorableWS, lists:reverse(Acc), State), parse_stag(Rest1, State1) @@ -1204,7 +1298,6 @@ send_character_event(_, true, String, State) -> %% Description: Parse whitespaces. %% [3] S ::= (#x20 | #x9 | #xD | #xA)+ %%---------------------------------------------------------------------- --dialyzer({no_fail_call, whitespace/3}). whitespace(?STRING_EMPTY, State, Acc) -> case cf(?STRING_EMPTY, State, Acc, fun whitespace/3) of {?STRING_EMPTY, State} -> @@ -1230,16 +1323,7 @@ whitespace(?STRING_REST("\r", Rest), State, Acc) -> whitespace(Rest, State#xmerl_sax_parser_state{line_no=N+1}, [?lf |Acc]); whitespace(?STRING_UNBOUND_REST(C, Rest), State, Acc) when ?is_whitespace(C) -> whitespace(Rest, State, [C|Acc]); -whitespace(?STRING_UNBOUND_REST(_C, _) = Bytes, State, Acc) -> - {lists:reverse(Acc), Bytes, State}; -whitespace(Bytes, #xmerl_sax_parser_state{encoding=Enc} = State, Acc) when is_binary(Bytes) -> - case unicode:characters_to_list(Bytes, Enc) of - {incomplete, _, _} -> - cf(Bytes, State, Acc, fun whitespace/3); - {error, _Encoded, _Rest} -> - ?fatal_error(State, lists:flatten(io_lib:format("Bad character, not in ~p\n", [Enc]))) - end. - +?WHITESPACE(Bytes, State, Acc). %%---------------------------------------------------------------------- %% Function: parse_reference(Rest, State, HaveToExist) -> Result @@ -1362,23 +1446,24 @@ parse_pe_reference_1(Bytes, State, Name) -> "missing ; after reference " ++ Name). - %%---------------------------------------------------------------------- -%% Function: insert_reference(Reference, State) -> Result -%% Parameters: Reference = string() +%% Function: insert_reference(Name, Ref, State) -> Result +%% Parameters: Name = string() +%% Ref = {Type, Value} +%% Type = atom() +%% Value = term() %% State = #xmerl_sax_parser_state{} %% Result : %%---------------------------------------------------------------------- -insert_reference({Name, Type, Value}, Table) -> - case ets:lookup(Table, Name) of - [{Name, _, _}] -> - ok; +insert_reference(Name, Value, #xmerl_sax_parser_state{ref_table = Map} = State) -> + case maps:find(Name, Map) of + error -> + State#xmerl_sax_parser_state{ref_table = maps:put(Name, Value, Map)}; _ -> - ets:insert(Table, {Name, Type, Value}) + State end. - %%---------------------------------------------------------------------- %% Function: look_up_reference(Reference, State) -> Result %% Parameters: Reference = string() @@ -1396,8 +1481,8 @@ look_up_reference("apos", _, _) -> look_up_reference("quot", _, _) -> {internal_general, "quot", "\""}; look_up_reference(Name, HaveToExist, State) -> - case ets:lookup(State#xmerl_sax_parser_state.ref_table, Name) of - [{Name, Type, Value}] -> + case maps:find(Name, State#xmerl_sax_parser_state.ref_table) of + {ok, {Type, Value}} -> {Type, Name, Value}; _ -> case HaveToExist of @@ -1479,7 +1564,7 @@ parse_system_litteral(?STRING_EMPTY, State, Stop, Acc) -> parse_system_litteral(?STRING_UNBOUND_REST(Stop, Rest), State, Stop, Acc) -> {lists:reverse(Acc), Rest, State}; parse_system_litteral(?STRING_UNBOUND_REST(C, Rest), State, Stop, Acc) -> - parse_system_litteral(Rest, State, Stop, [C |Acc]); + parse_system_litteral(Rest, State, Stop, [C |Acc]); parse_system_litteral(Bytes, State, Stop, Acc) -> unicode_incomplete_check([Bytes, State, Stop, Acc, fun parse_system_litteral/4], undefined). @@ -1651,9 +1736,11 @@ parse_external_entity(State, _PubId, SysId) -> end_tags = []}, - EventState = handle_external_entity(ExtRef, State1), + {EventState, RefTable} = handle_external_entity(ExtRef, State1), - NewState = event_callback({endEntity, SysId}, SaveState#xmerl_sax_parser_state{event_state=EventState}), + NewState = event_callback({endEntity, SysId}, + SaveState#xmerl_sax_parser_state{event_state=EventState, + ref_table=RefTable}), NewState#xmerl_sax_parser_state{file_type=normal}. @@ -1680,7 +1767,8 @@ handle_external_entity({file, FileToOpen}, State) -> entity=filename:basename(FileToOpen), input_type=file}), ok = file:close(FD), - EntityState#xmerl_sax_parser_state.event_state + {EntityState#xmerl_sax_parser_state.event_state, + EntityState#xmerl_sax_parser_state.ref_table} end; handle_external_entity({http, Url}, State) -> @@ -1693,14 +1781,16 @@ handle_external_entity({http, Url}, State) -> ++ file:format_error(Reason)); {ok, FD} -> {?STRING_EMPTY, EntityState} = - parse_external_entity_1(<<>>, + parse_external_entity_byte_order_mark(<<>>, State#xmerl_sax_parser_state{continuation_state=FD, current_location=filename:dirname(Url), entity=filename:basename(Url), input_type=file}), ok = file:close(FD), ok = file:delete(TmpFile), - EntityState#xmerl_sax_parser_state.event_state + {EntityState#xmerl_sax_parser_state.event_state, + EntityState#xmerl_sax_parser_state.ref_table} + end catch throw:{error, Error} -> @@ -1709,6 +1799,8 @@ handle_external_entity({http, Url}, State) -> handle_external_entity({Tag, _Url}, State) -> ?fatal_error(State, "Unsupported URI type: " ++ atom_to_list(Tag)). +?PARSE_EXTERNAL_ENTITY_BYTE_ORDER_MARK(Bytes, State). + %%---------------------------------------------------------------------- %% Function : parse_external_entity_1(Rest, State) -> Result %% Parameters: Rest = string() | binary() @@ -1716,22 +1808,15 @@ handle_external_entity({Tag, _Url}, State) -> %% Result : {Rest, State} %% Description: Parse the external entity. %%---------------------------------------------------------------------- --dialyzer({[no_fail_call, no_match], parse_external_entity_1/2}). parse_external_entity_1(?STRING_EMPTY, #xmerl_sax_parser_state{file_type=Type} = State) -> case catch cf(?STRING_EMPTY, State, fun parse_external_entity_1/2) of {Rest, State1} when is_record(State1, xmerl_sax_parser_state) -> - {Rest, State}; + {Rest, State1}; {fatal_error, {State1, "No more bytes"}} when Type == dtd; Type == entity -> {?STRING_EMPTY, State1}; Other -> throw(Other) end; -parse_external_entity_1(?BYTE_ORDER_MARK_1, State) -> - cf(?BYTE_ORDER_MARK_1, State, fun parse_external_entity_1/2); -parse_external_entity_1(?BYTE_ORDER_MARK_2, State) -> - cf(?BYTE_ORDER_MARK_2, State, fun parse_external_entity_1/2); -parse_external_entity_1(?BYTE_ORDER_MARK_REST(Rest), State) -> - parse_external_entity_1(Rest, State); parse_external_entity_1(?STRING("<") = Bytes, State) -> cf(Bytes, State, fun parse_external_entity_1/2); parse_external_entity_1(?STRING("<?") = Bytes, State) -> @@ -2452,24 +2537,24 @@ parse_entity_def(?STRING_EMPTY, State, Name) -> cf(?STRING_EMPTY, State, Name, fun parse_entity_def/3); parse_entity_def(?STRING_UNBOUND_REST(C, Rest), State, Name) when C == $'; C == $" -> {Value, Rest1, State1} = parse_entity_value(Rest, State, C, []), - insert_reference({Name, internal_general, Value}, State1#xmerl_sax_parser_state.ref_table), - State2 = event_callback({internalEntityDecl, Name, Value}, State1), - {_WS, Rest2, State3} = whitespace(Rest1, State2, []), - parse_def_end(Rest2, State3); + State2 = insert_reference(Name, {internal_general, Value}, State1), + State3 = event_callback({internalEntityDecl, Name, Value}, State2), + {_WS, Rest2, State4} = whitespace(Rest1, State3, []), + parse_def_end(Rest2, State4); parse_entity_def(?STRING_UNBOUND_REST(C, _) = Rest, State, Name) when C == $S; C == $P -> {PubId, SysId, Rest1, State1} = parse_external_id(Rest, State, false), {Ndata, Rest2, State2} = parse_ndata(Rest1, State1), case Ndata of undefined -> - insert_reference({Name, external_general, {PubId, SysId}}, - State2#xmerl_sax_parser_state.ref_table), - State3 = event_callback({externalEntityDecl, Name, PubId, SysId}, State2), - {Rest2, State3}; + State3 = insert_reference(Name, {external_general, {PubId, SysId}}, + State2), + State4 = event_callback({externalEntityDecl, Name, PubId, SysId}, State3), + {Rest2, State4}; _ -> - insert_reference({Name, unparsed, {PubId, SysId, Ndata}}, - State2#xmerl_sax_parser_state.ref_table), - State3 = event_callback({unparsedEntityDecl, Name, PubId, SysId, Ndata}, State2), - {Rest2, State3} + State3 = insert_reference(Name, {unparsed, {PubId, SysId, Ndata}}, + State2), + State4 = event_callback({unparsedEntityDecl, Name, PubId, SysId, Ndata}, State3), + {Rest2, State4} end; parse_entity_def(Bytes, State, Name) -> unicode_incomplete_check([Bytes, State, Name, fun parse_entity_def/3], @@ -2646,19 +2731,19 @@ parse_pe_def(?STRING_EMPTY, State, Name) -> parse_pe_def(?STRING_UNBOUND_REST(C, Rest), State, Name) when C == $'; C == $" -> {Value, Rest1, State1} = parse_entity_value(Rest, State, C, []), Name1 = "%" ++ Name, - insert_reference({Name1, internal_parameter, Value}, - State1#xmerl_sax_parser_state.ref_table), - State2 = event_callback({internalEntityDecl, Name1, Value}, State1), - {_WS, Rest2, State3} = whitespace(Rest1, State2, []), - parse_def_end(Rest2, State3); + State2 = insert_reference(Name1, {internal_parameter, Value}, + State1), + State3 = event_callback({internalEntityDecl, Name1, Value}, State2), + {_WS, Rest2, State4} = whitespace(Rest1, State3, []), + parse_def_end(Rest2, State4); parse_pe_def(?STRING_UNBOUND_REST(C, _) = Bytes, State, Name) when C == $S; C == $P -> {PubId, SysId, Rest1, State1} = parse_external_id(Bytes, State, false), Name1 = "%" ++ Name, - insert_reference({Name1, external_parameter, {PubId, SysId}}, - State1#xmerl_sax_parser_state.ref_table), - State2 = event_callback({externalEntityDecl, Name1, PubId, SysId}, State1), - {_WS, Rest2, State3} = whitespace(Rest1, State2, []), - parse_def_end(Rest2, State3); + State2 = insert_reference(Name1, {external_parameter, {PubId, SysId}}, + State1), + State3 = event_callback({externalEntityDecl, Name1, PubId, SysId}, State2), + {_WS, Rest2, State4} = whitespace(Rest1, State3, []), + parse_def_end(Rest2, State4); parse_pe_def(Bytes, State, Name) -> unicode_incomplete_check([Bytes, State, Name, fun parse_pe_def/3], "\", \', SYSTEM or PUBLIC expected"). @@ -3290,7 +3375,7 @@ cf(Rest, #xmerl_sax_parser_state{continuation_fun = CFun, continuation_state = C catch throw:ErrorTerm -> ?fatal_error(State, ErrorTerm); - exit:Reason -> + exit:Reason -> ?fatal_error(State, {'EXIT', Reason}) end, case Result of diff --git a/lib/xmerl/src/xmerl_sax_parser_latin1.erlsrc b/lib/xmerl/src/xmerl_sax_parser_latin1.erlsrc index 961806bf4c..6e59347fb8 100644 --- a/lib/xmerl/src/xmerl_sax_parser_latin1.erlsrc +++ b/lib/xmerl/src/xmerl_sax_parser_latin1.erlsrc @@ -2,7 +2,7 @@ %%-------------------------------------------------------------------- %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2008-2016. All Rights Reserved. +%% Copyright Ericsson AB 2008-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -34,8 +34,36 @@ -define(APPEND_STRING(Rest, New), <<Rest/binary, New/binary>>). -define(TO_INPUT_FORMAT(Val), unicode:characters_to_binary(Val, unicode, latin1)). -%% STRING_REST and STRING_UNBOUND_REST is only different in the list case -define(STRING_UNBOUND_REST(MatchChar, Rest), <<MatchChar, Rest/binary>>). --define(BYTE_ORDER_MARK_1, undefined_bom1). --define(BYTE_ORDER_MARK_2, undefined_bom2). --define(BYTE_ORDER_MARK_REST(Rest), <<undefined, Rest/binary>>). + +-define(PARSE_BYTE_ORDER_MARK(Bytes, State), + parse_byte_order_mark(Bytes, State) -> + parse_xml_decl(Bytes, State)). + +-define(PARSE_XML_DECL(Bytes, State), + parse_xml_decl(Bytes, #xmerl_sax_parser_state{encoding=Enc} = State) when is_binary(Bytes) -> + case unicode:characters_to_list(Bytes, Enc) of + {incomplete, _, _} -> + cf(Bytes, State, fun parse_xml_decl/2); + {error, _Encoded, _Rest} -> + ?fatal_error(State, lists:flatten(io_lib:format("Bad character, not in ~p\n", [Enc]))); + _ -> + parse_prolog(Bytes, State) + end; + parse_xml_decl(Bytes, State) -> + parse_prolog(Bytes, State)). + +-define(WHITESPACE(Bytes, State, Acc), + whitespace(?STRING_UNBOUND_REST(_C, _) = Bytes, State, Acc) -> + {lists:reverse(Acc), Bytes, State}; + whitespace(Bytes, #xmerl_sax_parser_state{encoding=Enc} = State, Acc) when is_binary(Bytes) -> + case unicode:characters_to_list(Bytes, Enc) of + {incomplete, _, _} -> + cf(Bytes, State, Acc, fun whitespace/3); + {error, _Encoded, _Rest} -> + ?fatal_error(State, lists:flatten(io_lib:format("Bad character, not in ~p\n", [Enc]))) + end). + +-define(PARSE_EXTERNAL_ENTITY_BYTE_ORDER_MARK(Bytes, State), + parse_external_entity_byte_order_mark(Bytes, State) -> + parse_external_entity_1(Bytes, State)). diff --git a/lib/xmerl/src/xmerl_sax_parser_list.erlsrc b/lib/xmerl/src/xmerl_sax_parser_list.erlsrc index 624a621d92..ac89896215 100644 --- a/lib/xmerl/src/xmerl_sax_parser_list.erlsrc +++ b/lib/xmerl/src/xmerl_sax_parser_list.erlsrc @@ -2,7 +2,7 @@ %%-------------------------------------------------------------------- %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2008-2016. All Rights Reserved. +%% Copyright Ericsson AB 2008-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -36,6 +36,19 @@ %% In the list case we can't use a '++' when matchin against an unbound variable -define(STRING_UNBOUND_REST(MatchChar, Rest), [MatchChar | Rest]). --define(BYTE_ORDER_MARK_1, undefined_bom1). --define(BYTE_ORDER_MARK_2, undefined_bom2). --define(BYTE_ORDER_MARK_REST(Rest), [undefined|Rest]). + +-define(PARSE_BYTE_ORDER_MARK(Bytes, State), + parse_byte_order_mark(Bytes, State) -> + parse_xml_decl(Bytes, State)). + +-define(PARSE_XML_DECL(Bytes, State), + parse_xml_decl(Bytes, State) -> + parse_prolog(Bytes, State)). + +-define(WHITESPACE(Bytes, State, Acc), + whitespace(?STRING_UNBOUND_REST(_C, _) = Bytes, State, Acc) -> + {lists:reverse(Acc), Bytes, State}). + +-define(PARSE_EXTERNAL_ENTITY_BYTE_ORDER_MARK(Bytes, State), + parse_external_entity_byte_order_mark(Bytes, State) -> + parse_external_entity_1(Bytes, State)). diff --git a/lib/xmerl/src/xmerl_sax_parser_utf16be.erlsrc b/lib/xmerl/src/xmerl_sax_parser_utf16be.erlsrc index ff84ece97a..ec89024729 100644 --- a/lib/xmerl/src/xmerl_sax_parser_utf16be.erlsrc +++ b/lib/xmerl/src/xmerl_sax_parser_utf16be.erlsrc @@ -2,7 +2,7 @@ %%-------------------------------------------------------------------- %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2008-2016. All Rights Reserved. +%% Copyright Ericsson AB 2008-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -34,8 +34,50 @@ -define(APPEND_STRING(Rest, New), <<Rest/binary, New/binary>>). -define(TO_INPUT_FORMAT(Val), unicode:characters_to_binary(Val, unicode, {utf16, big})). -%% STRING_REST and STRING_UNBOUND_REST is only different in the list case -define(STRING_UNBOUND_REST(MatchChar, Rest), <<MatchChar/big-utf16, Rest/binary>>). --define(BYTE_ORDER_MARK_1, undefined_bom1). --define(BYTE_ORDER_MARK_2, <<16#FE>>). +-define(BYTE_ORDER_MARK_1, <<16#FE>>). -define(BYTE_ORDER_MARK_REST(Rest), <<16#FE, 16#FF, Rest/binary>>). + +-define(PARSE_BYTE_ORDER_MARK(Bytes, State), + parse_byte_order_mark(?STRING_EMPTY, State) -> + cf(?STRING_EMPTY, State, fun parse_byte_order_mark/2); + parse_byte_order_mark(?BYTE_ORDER_MARK_1, State) -> + cf(?BYTE_ORDER_MARK_1, State, fun parse_byte_order_mark/2); + parse_byte_order_mark(?BYTE_ORDER_MARK_REST(Rest), State) -> + parse_xml_decl(Rest, State); + parse_byte_order_mark(Bytes, State) -> + parse_xml_decl(Bytes, State)). + +-define(PARSE_XML_DECL(Bytes, State), + parse_xml_decl(Bytes, #xmerl_sax_parser_state{encoding=Enc} = State) when is_binary(Bytes) -> + case unicode:characters_to_list(Bytes, Enc) of + {incomplete, _, _} -> + cf(Bytes, State, fun parse_xml_decl/2); + {error, _Encoded, _Rest} -> + ?fatal_error(State, lists:flatten(io_lib:format("Bad character, not in ~p\n", [Enc]))); + _ -> + parse_prolog(Bytes, State) + end; + parse_xml_decl(Bytes, State) -> + parse_prolog(Bytes, State)). + +-define(WHITESPACE(Bytes, State, Acc), + whitespace(?STRING_UNBOUND_REST(_C, _) = Bytes, State, Acc) -> + {lists:reverse(Acc), Bytes, State}; + whitespace(Bytes, #xmerl_sax_parser_state{encoding=Enc} = State, Acc) when is_binary(Bytes) -> + case unicode:characters_to_list(Bytes, Enc) of + {incomplete, _, _} -> + cf(Bytes, State, Acc, fun whitespace/3); + {error, _Encoded, _Rest} -> + ?fatal_error(State, lists:flatten(io_lib:format("Bad character, not in ~p\n", [Enc]))) + end). + +-define(PARSE_EXTERNAL_ENTITY_BYTE_ORDER_MARK(Bytes, State), + parse_external_entity_byte_order_mark(?STRING_EMPTY, State) -> + cf(?STRING_EMPTY, State, fun parse_external_entity_byte_order_mark/2); + parse_external_entity_byte_order_mark(?BYTE_ORDER_MARK_1, State) -> + cf(?BYTE_ORDER_MARK_1, State, fun parse_external_entity_byte_order_mark/2); + parse_external_entity_byte_order_mark(?BYTE_ORDER_MARK_REST(Rest), State) -> + parse_external_entity_1(Rest, State); + parse_external_entity_byte_order_mark(Bytes, State) -> + parse_external_entity_1(Bytes, State)). diff --git a/lib/xmerl/src/xmerl_sax_parser_utf16le.erlsrc b/lib/xmerl/src/xmerl_sax_parser_utf16le.erlsrc index a330fce8d0..566333a045 100644 --- a/lib/xmerl/src/xmerl_sax_parser_utf16le.erlsrc +++ b/lib/xmerl/src/xmerl_sax_parser_utf16le.erlsrc @@ -2,7 +2,7 @@ %%-------------------------------------------------------------------- %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2008-2016. All Rights Reserved. +%% Copyright Ericsson AB 2008-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -34,8 +34,50 @@ -define(APPEND_STRING(Rest, New), <<Rest/binary, New/binary>>). -define(TO_INPUT_FORMAT(Val), unicode:characters_to_binary(Val, unicode, {utf16, little})). -%% STRING_REST and STRING_UNBOUND_REST is only different in the list case -define(STRING_UNBOUND_REST(MatchChar, Rest), <<MatchChar/little-utf16, Rest/binary>>). --define(BYTE_ORDER_MARK_1, undefined_bom1). --define(BYTE_ORDER_MARK_2, <<16#FF>>). +-define(BYTE_ORDER_MARK_1, <<16#FF>>). -define(BYTE_ORDER_MARK_REST(Rest), <<16#FF, 16#FE, Rest/binary>>). + +-define(PARSE_BYTE_ORDER_MARK(Bytes, State), + parse_byte_order_mark(?STRING_EMPTY, State) -> + cf(?STRING_EMPTY, State, fun parse_byte_order_mark/2); + parse_byte_order_mark(?BYTE_ORDER_MARK_1, State) -> + cf(?BYTE_ORDER_MARK_1, State, fun parse_byte_order_mark/2); + parse_byte_order_mark(?BYTE_ORDER_MARK_REST(Rest), State) -> + parse_xml_decl(Rest, State); + parse_byte_order_mark(Bytes, State) -> + parse_xml_decl(Bytes, State)). + +-define(PARSE_XML_DECL(Bytes, State), + parse_xml_decl(Bytes, #xmerl_sax_parser_state{encoding=Enc} = State) when is_binary(Bytes) -> + case unicode:characters_to_list(Bytes, Enc) of + {incomplete, _, _} -> + cf(Bytes, State, fun parse_xml_decl/2); + {error, _Encoded, _Rest} -> + ?fatal_error(State, lists:flatten(io_lib:format("Bad character, not in ~p\n", [Enc]))); + _ -> + parse_prolog(Bytes, State) + end; + parse_xml_decl(Bytes, State) -> + parse_prolog(Bytes, State)). + +-define(WHITESPACE(Bytes, State, Acc), + whitespace(?STRING_UNBOUND_REST(_C, _) = Bytes, State, Acc) -> + {lists:reverse(Acc), Bytes, State}; + whitespace(Bytes, #xmerl_sax_parser_state{encoding=Enc} = State, Acc) when is_binary(Bytes) -> + case unicode:characters_to_list(Bytes, Enc) of + {incomplete, _, _} -> + cf(Bytes, State, Acc, fun whitespace/3); + {error, _Encoded, _Rest} -> + ?fatal_error(State, lists:flatten(io_lib:format("Bad character, not in ~p\n", [Enc]))) + end). + +-define(PARSE_EXTERNAL_ENTITY_BYTE_ORDER_MARK(Bytes, State), + parse_external_entity_byte_order_mark(?STRING_EMPTY, State) -> + cf(?STRING_EMPTY, State, fun parse_external_entity_byte_order_mark/2); + parse_external_entity_byte_order_mark(?BYTE_ORDER_MARK_1, State) -> + cf(?BYTE_ORDER_MARK_1, State, fun parse_external_entity_byte_order_mark/2); + parse_external_entity_byte_order_mark(?BYTE_ORDER_MARK_REST(Rest), State) -> + parse_external_entity_1(Rest, State); + parse_external_entity_byte_order_mark(Bytes, State) -> + parse_external_entity_1(Bytes, State)). diff --git a/lib/xmerl/src/xmerl_sax_parser_utf8.erlsrc b/lib/xmerl/src/xmerl_sax_parser_utf8.erlsrc index d46d60d237..f41d06d013 100644 --- a/lib/xmerl/src/xmerl_sax_parser_utf8.erlsrc +++ b/lib/xmerl/src/xmerl_sax_parser_utf8.erlsrc @@ -2,7 +2,7 @@ %%-------------------------------------------------------------------- %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2008-2016. All Rights Reserved. +%% Copyright Ericsson AB 2008-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -34,11 +34,55 @@ -define(APPEND_STRING(Rest, New), <<Rest/binary, New/binary>>). -define(TO_INPUT_FORMAT(Val), unicode:characters_to_binary(Val, unicode, utf8)). - -%% STRING_REST and STRING_UNBOUND_REST is only different in the list case -define(STRING_UNBOUND_REST(MatchChar, Rest), <<MatchChar/utf8, Rest/binary>>). -define(BYTE_ORDER_MARK_1, <<16#EF>>). -define(BYTE_ORDER_MARK_2, <<16#EF, 16#BB>>). -define(BYTE_ORDER_MARK_REST(Rest), <<16#EF, 16#BB, 16#BF, Rest/binary>>). +-define(PARSE_BYTE_ORDER_MARK(Bytes, State), + parse_byte_order_mark(?STRING_EMPTY, State) -> + cf(?STRING_EMPTY, State, fun parse_byte_order_mark/2); + parse_byte_order_mark(?BYTE_ORDER_MARK_1, State) -> + cf(?BYTE_ORDER_MARK_1, State, fun parse_byte_order_mark/2); + parse_byte_order_mark(?BYTE_ORDER_MARK_2, State) -> + cf(?BYTE_ORDER_MARK_2, State, fun parse_byte_order_mark/2); + parse_byte_order_mark(?BYTE_ORDER_MARK_REST(Rest), State) -> + parse_xml_decl(Rest, State); + parse_byte_order_mark(Bytes, State) -> + parse_xml_decl(Bytes, State)). + +-define(PARSE_XML_DECL(Bytes, State), + parse_xml_decl(Bytes, #xmerl_sax_parser_state{encoding=Enc} = State) when is_binary(Bytes) -> + case unicode:characters_to_list(Bytes, Enc) of + {incomplete, _, _} -> + cf(Bytes, State, fun parse_xml_decl/2); + {error, _Encoded, _Rest} -> + ?fatal_error(State, lists:flatten(io_lib:format("Bad character, not in ~p\n", [Enc]))); + _ -> + parse_prolog(Bytes, State) + end; + parse_xml_decl(Bytes, State) -> + parse_prolog(Bytes, State)). + +-define(WHITESPACE(Bytes, State, Acc), + whitespace(?STRING_UNBOUND_REST(_C, _) = Bytes, State, Acc) -> + {lists:reverse(Acc), Bytes, State}; + whitespace(Bytes, #xmerl_sax_parser_state{encoding=Enc} = State, Acc) when is_binary(Bytes) -> + case unicode:characters_to_list(Bytes, Enc) of + {incomplete, _, _} -> + cf(Bytes, State, Acc, fun whitespace/3); + {error, _Encoded, _Rest} -> + ?fatal_error(State, lists:flatten(io_lib:format("Bad character, not in ~p\n", [Enc]))) + end). +-define(PARSE_EXTERNAL_ENTITY_BYTE_ORDER_MARK(Bytes, State), + parse_external_entity_byte_order_mark(?STRING_EMPTY, State) -> + cf(?STRING_EMPTY, State, fun parse_external_entity_byte_order_mark/2); + parse_external_entity_byte_order_mark(?BYTE_ORDER_MARK_1, State) -> + cf(?BYTE_ORDER_MARK_1, State, fun parse_external_entity_byte_order_mark/2); + parse_external_entity_byte_order_mark(?BYTE_ORDER_MARK_2, State) -> + cf(?BYTE_ORDER_MARK_2, State, fun parse_external_entity_byte_order_mark/2); + parse_external_entity_byte_order_mark(?BYTE_ORDER_MARK_REST(Rest), State) -> + parse_external_entity_1(Rest, State); + parse_external_entity_byte_order_mark(Bytes, State) -> + parse_external_entity_1(Bytes, State)). diff --git a/lib/xmerl/src/xmerl_sax_simple_dom.erl b/lib/xmerl/src/xmerl_sax_simple_dom.erl index 7eb3afd499..7b15cd92dc 100644 --- a/lib/xmerl/src/xmerl_sax_simple_dom.erl +++ b/lib/xmerl/src/xmerl_sax_simple_dom.erl @@ -2,7 +2,7 @@ %%-------------------------------------------------------------------- %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2009-2016. All Rights Reserved. +%% Copyright Ericsson AB 2009-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -129,8 +129,9 @@ build_dom(endDocument, State#xmerl_sax_simple_dom_state{dom=[Decl, {Tag, Attributes, lists:reverse(Content)}]}; _ -> - ?dbg("~p\n", [D]), - ?error("we're not at end the document when endDocument event is encountered.") + %% endDocument is also sent by the parser when a fault occur to tell + %% the event receiver that no more input will be sent + State end; %% Element diff --git a/lib/xmerl/src/xmerl_scan.erl b/lib/xmerl/src/xmerl_scan.erl index 2147a46a13..a1f6ad4e2c 100644 --- a/lib/xmerl/src/xmerl_scan.erl +++ b/lib/xmerl/src/xmerl_scan.erl @@ -1,7 +1,7 @@ %% %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2003-2016. All Rights Reserved. +%% Copyright Ericsson AB 2003-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -111,13 +111,16 @@ %% <dd>Set to 'true' if xmerl should add to elements missing attributes %% with a defined default value (default 'false').</dd> %% </dl> +%% @type xmlElement() = #xmlElement{}. +%% The record definition is found in xmerl.hrl. +%% @type xmlDocument() = #xmlDocument{}. +%% The record definition is found in xmerl.hrl. %% @type document() = xmlElement() | xmlDocument(). <p> %% The document returned by <tt>xmerl_scan:string/[1,2]</tt> and %% <tt>xmerl_scan:file/[1,2]</tt>. The type of the returned record depends on %% the value of the document option passed to the function. %% </p> - -module(xmerl_scan). -vsn('0.20'). -date('03-09-16'). @@ -471,8 +474,8 @@ event(_X, S) -> %% into multiple objects (in which case {Acc',Pos',S'} should be returned.) %% If {Acc',S'} is returned, Pos will be incremented by 1 by default. %% Below is an example of an acceptable operation -acc(X = #xmlText{value = Text}, Acc, S) -> - {[X#xmlText{value = Text}|Acc], S}; +acc(#xmlText{value = Text}, [X = #xmlText{value = AccText}], S) -> + {[X#xmlText{value = AccText ++ Text}], S}; acc(X, Acc, S) -> {[X|Acc], S}. @@ -2222,16 +2225,18 @@ processed_whole_element(S=#xmerl_scanner{hook_fun = _Hook, AllAttrs = case S#xmerl_scanner.default_attrs of true -> - [ #xmlAttribute{name = AttName, - parents = [{Name, Pos} | Parents], - language = Lang, - nsinfo = NSI, - namespace = Namespace, - value = AttValue, - normalized = true} || - {AttName, AttValue} <- get_default_attrs(S, Name), - AttValue =/= no_value, - not lists:keymember(AttName, #xmlAttribute.name, Attrs) ]; + DefaultAttrs = + [ #xmlAttribute{name = AttName, + parents = [{Name, Pos} | Parents], + language = Lang, + nsinfo = NSI, + namespace = Namespace, + value = AttValue, + normalized = true} || + {AttName, AttValue} <- get_default_attrs(S, Name), + AttValue =/= no_value, + not lists:keymember(AttName, #xmlAttribute.name, Attrs) ], + lists:append(Attrs, DefaultAttrs); false -> Attrs end, @@ -2304,7 +2309,9 @@ expanded_name(Name, [], #xmlNamespace{default = URI}, S) -> expanded_name(Name, N = {"xmlns", Local}, #xmlNamespace{nodes = Ns}, S) -> {_, Value} = lists:keyfind(Local, 1, Ns), case Name of - 'xmlns:xml' when Value =/= 'http://www.w3.org/XML/1998/namespace' -> + 'xmlns:xml' when Value =:= 'http://www.w3.org/XML/1998/namespace' -> + N; + 'xmlns:xml' when Value =/= 'http://www.w3.org/XML/1998/namespace' -> ?fatal({xml_prefix_cannot_be_redeclared, Value}, S); 'xmlns:xmlns' -> ?fatal({xmlns_prefix_cannot_be_declared, Value}, S); @@ -2318,6 +2325,8 @@ expanded_name(Name, N = {"xmlns", Local}, #xmlNamespace{nodes = Ns}, S) -> N end end; +expanded_name(_Name, {"xml", Local}, _NS, _S) -> + {'http://www.w3.org/XML/1998/namespace', list_to_atom(Local)}; expanded_name(_Name, {Prefix, Local}, #xmlNamespace{nodes = Ns}, S) -> case lists:keysearch(Prefix, 1, Ns) of {value, {_, URI}} -> @@ -2328,9 +2337,6 @@ expanded_name(_Name, {Prefix, Local}, #xmlNamespace{nodes = Ns}, S) -> ?fatal({namespace_prefix_not_declared, Prefix}, S) end. - - - keyreplaceadd(K, Pos, [H|T], Obj) when K == element(Pos, H) -> [Obj|T]; keyreplaceadd(K, Pos, [H|T], Obj) -> diff --git a/lib/xmerl/src/xmerl_xpath.erl b/lib/xmerl/src/xmerl_xpath.erl index bbebda1030..6146feba49 100644 --- a/lib/xmerl/src/xmerl_xpath.erl +++ b/lib/xmerl/src/xmerl_xpath.erl @@ -43,13 +43,27 @@ %% </pre> %% %% @type nodeEntity() = -%% xmlElement() -%% | xmlAttribute() -%% | xmlText() -%% | xmlPI() -%% | xmlComment() -%% | xmlNsNode() -%% | xmlDocument() +%% #xmlElement{} +%% | #xmlAttribute{} +%% | #xmlText{} +%% | #xmlPI{} +%% | #xmlComment{} +%% | #xmlNsNode{} +%% | #xmlDocument{} +%% +%% @type docNodes() = #xmlElement{} +%% | #xmlAttribute{} +%% | #xmlText{} +%% | #xmlPI{} +%% | #xmlComment{} +%% | #xmlNsNode{} +%% +%% @type docEntity() = #xmlDocument{} | [docNodes()] +%% +%% @type xPathString() = string() +%% +%% @type parentList() = [{atom(), integer()}] +%% %% @type option_list(). <p>Options allows to customize the behaviour of the %% XPath scanner. %% </p> @@ -115,7 +129,7 @@ string(Str, Doc, Options) -> %% Parents = parentList() %% Doc = nodeEntity() %% Options = option_list() -%% Scalar = xmlObj +%% Scalar = #xmlObj{} %% @doc Extracts the nodes from the parsed XML tree according to XPath. %% xmlObj is a record with fields type and value, %% where type is boolean | number | string diff --git a/lib/xmerl/src/xmerl_xs.erl b/lib/xmerl/src/xmerl_xs.erl index 3e9f6622b8..1ce76cfa41 100644 --- a/lib/xmerl/src/xmerl_xs.erl +++ b/lib/xmerl/src/xmerl_xs.erl @@ -45,7 +45,6 @@ % XSLT package which is written i C++. % See also the <a href="xmerl_xs_examples.html">Tutorial</a>. % </p> - -module(xmerl_xs). -export([xslapply/2, value_of/1, select/2, built_in_rules/2 ]). @@ -71,15 +70,13 @@ %% xslapply(fun template/1, E), %% "</h1>"]; %% </pre> - xslapply(Fun, EList) when is_list(EList) -> - lists:map( Fun, EList); + lists:map(Fun, EList); xslapply(Fun, E = #xmlElement{})-> lists:map( Fun, E#xmlElement.content). - %% @spec value_of(E) -> List -%% E = unknown() +%% E = term() %% %% @doc Concatenates all text nodes within the tree. %% diff --git a/lib/xmerl/src/xmerl_xsd.erl b/lib/xmerl/src/xmerl_xsd.erl index 4b5efae8dd..a89b3159ec 100644 --- a/lib/xmerl/src/xmerl_xsd.erl +++ b/lib/xmerl/src/xmerl_xsd.erl @@ -49,6 +49,7 @@ %% <dd>It is possible by this option to provide a state with process %% information from an earlier validation.</dd> %% </dl> +%% @type filename() = string() %% @end %%%------------------------------------------------------------------- -module(xmerl_xsd). @@ -138,7 +139,7 @@ state2file(S=#xsd_state{schema_name=SN}) -> %% @spec state2file(State,FileName) -> ok | {error,Reason} %% State = global_state() -%% FileName = filename() +%% FileName = string() %% @doc Saves the schema state with all information of the processed %% schema in a file. You can provide the file name for the saved %% state. FileName is saved with the <code>.xss</code> extension @@ -153,7 +154,7 @@ state2file(S,FileName) when is_record(S,xsd_state) -> %% @spec file2state(FileName) -> {ok,State} | {error,Reason} %% State = global_state() -%% FileName = filename() +%% FileName = string() %% @doc Reads the schema state with all information of the processed %% schema from a file created with <code>state2file/[1,2]</code>. The %% format of this file is internal. The state can then be used @@ -202,7 +203,7 @@ xmerl_xsd_vsn_check(S=#xsd_state{vsn=MD5_VSN}) -> process_validate(Schema,Xml) -> process_validate(Schema,Xml,[]). %% @spec process_validate(Schema,Element,Options) -> Result -%% Schema = filename() +%% Schema = string() %% Element = XmlElement %% Options = option_list() %% Result = {ValidXmlElement,State} | {error,Reason} @@ -282,7 +283,7 @@ validate3(_,_,S) -> process_schema(Schema) -> process_schema(Schema,[]). %% @spec process_schema(Schema,Options) -> Result -%% Schema = filename() +%% Schema = string() %% Result = {ok,State} | {error,Reason} %% State = global_state() %% Reason = [ErrorReason] | ErrorReason @@ -324,7 +325,7 @@ process_schema2({SE,_},State,_Schema) -> process_schemas(Schemas) -> process_schemas(Schemas,[]). %% @spec process_schemas(Schemas,Options) -> Result -%% Schemas = [{NameSpace,filename()}|Schemas] | [] +%% Schemas = [{NameSpace,string()}|Schemas] | [] %% Result = {ok,State} | {error,Reason} %% Reason = [ErrorReason] | ErrorReason %% Options = option_list() @@ -5426,7 +5427,7 @@ add_key_once(Key,N,El,L) -> %% {filename:join([[io_lib:format("/~w(~w)",[X,Y])||{X,Y}<-Parents],Type]),Pos}. %% @spec format_error(Errors) -> Result -%% Errors = error_tuple() | [error_tuple()] +%% Errors = tuple() | [tuple()] %% Result = string() | [string()] %% @doc Formats error descriptions to human readable strings. format_error(L) when is_list(L) -> diff --git a/lib/xmerl/test/Makefile b/lib/xmerl/test/Makefile index 7a326e334f..3204f081ba 100644 --- a/lib/xmerl/test/Makefile +++ b/lib/xmerl/test/Makefile @@ -1,7 +1,7 @@ # # %CopyrightBegin% # -# Copyright Ericsson AB 2004-2016. All Rights Reserved. +# Copyright Ericsson AB 2004-2017. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. @@ -55,7 +55,8 @@ SUITE_FILES= \ xmerl_xsd_SUITE.erl \ xmerl_xsd_MS2002-01-16_SUITE.erl \ xmerl_xsd_NIST2002-01-16_SUITE.erl \ - xmerl_xsd_Sun2002-01-16_SUITE.erl + xmerl_xsd_Sun2002-01-16_SUITE.erl \ + xmerl_sax_stream_SUITE.erl XML_FILES= \ testcases.dtd \ @@ -125,4 +126,6 @@ release_tests_spec: opt @tar cfh - xmerl_xsd_MS2002-01-16_SUITE_data | (cd "$(RELSYSDIR)"; tar xf -) @tar cfh - xmerl_xsd_NIST2002-01-16_SUITE_data | (cd "$(RELSYSDIR)"; tar xf -) @tar cfh - xmerl_xsd_Sun2002-01-16_SUITE_data | (cd "$(RELSYSDIR)"; tar xf -) + @tar cfh - xmerl_sax_SUITE_data | (cd "$(RELSYSDIR)"; tar xf -) + @tar cfh - xmerl_sax_stream_SUITE_data | (cd "$(RELSYSDIR)"; tar xf -) chmod -R u+w "$(RELSYSDIR)" diff --git a/lib/xmerl/test/xmerl_SUITE.erl b/lib/xmerl/test/xmerl_SUITE.erl index e97b8c6a4b..66f04f6c25 100644 --- a/lib/xmerl/test/xmerl_SUITE.erl +++ b/lib/xmerl/test/xmerl_SUITE.erl @@ -1,7 +1,7 @@ %% %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2008-2016. All Rights Reserved. +%% Copyright Ericsson AB 2008-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -54,7 +54,8 @@ groups() -> cpd_expl_provided_DTD]}, {misc, [], [latin1_alias, syntax_bug1, syntax_bug2, syntax_bug3, - pe_ref1, copyright, testXSEIF, export_simple1, export]}, + pe_ref1, copyright, testXSEIF, export_simple1, export, + default_attrs_bug, xml_ns, scan_splits_string_bug]}, {eventp_tests, [], [sax_parse_and_export]}, {ticket_tests, [], [ticket_5998, ticket_7211, ticket_7214, ticket_7430, @@ -223,6 +224,54 @@ syntax_bug3(Config) -> Err -> Err end. +default_attrs_bug(Config) -> + file:set_cwd(datadir(Config)), + Doc = "<!DOCTYPE doc [<!ATTLIST doc b CDATA \"default\">]>\n" + "<doc a=\"explicit\"/>", + {#xmlElement{attributes = [#xmlAttribute{name = a, value = "explicit"}, + #xmlAttribute{name = b, value = "default"}]}, + [] + } = xmerl_scan:string(Doc, [{default_attrs, true}]), + Doc2 = "<!DOCTYPE doc [<!ATTLIST doc b CDATA \"default\">]>\n" + "<doc b=\"also explicit\" a=\"explicit\"/>", + {#xmlElement{attributes = [#xmlAttribute{name = b, value = "also explicit"}, + #xmlAttribute{name = a, value = "explicit"}]}, + [] + } = xmerl_scan:string(Doc2, [{default_attrs, true}]), + ok. + + +xml_ns(Config) -> + Doc = "<?xml version='1.0'?>\n" + "<doc xml:attr1=\"implicit xml ns\"/>", + {#xmlElement{namespace=#xmlNamespace{default = [], nodes = []}, + attributes = [#xmlAttribute{name = 'xml:attr1', + expanded_name = {'http://www.w3.org/XML/1998/namespace',attr1}, + nsinfo = {"xml","attr1"}, + namespace = #xmlNamespace{default = [], nodes = []}}]}, + [] + } = xmerl_scan:string(Doc, [{namespace_conformant, true}]), + Doc2 = "<?xml version='1.0'?>\n" + "<doc xmlns:xml=\"http://www.w3.org/XML/1998/namespace\" xml:attr1=\"explicit xml ns\"/>", + {#xmlElement{namespace=#xmlNamespace{default = [], nodes = [{"xml",'http://www.w3.org/XML/1998/namespace'}]}, + attributes = [#xmlAttribute{name = 'xmlns:xml', + expanded_name = {"xmlns","xml"}, + nsinfo = {"xmlns","xml"}, + namespace = #xmlNamespace{default = [], + nodes = [{"xml",'http://www.w3.org/XML/1998/namespace'}]}}, + #xmlAttribute{name = 'xml:attr1', + expanded_name = {'http://www.w3.org/XML/1998/namespace',attr1}, + nsinfo = {"xml","attr1"}, + namespace = #xmlNamespace{default = [], + nodes = [{"xml",'http://www.w3.org/XML/1998/namespace'}]}}]}, + [] + } = xmerl_scan:string(Doc2, [{namespace_conformant, true}]), + ok. + +scan_splits_string_bug(_Config) -> + {#xmlElement{ content = [#xmlText{ value = "Jimmy Zöger" }] }, []} + = xmerl_scan:string("<name>Jimmy Zöger</name>"). + pe_ref1(Config) -> file:set_cwd(datadir(Config)), {#xmlElement{},[]} = xmerl_scan:file(datadir_join(Config,[misc,"PE_ref1.xml"]),[{validation,true}]). @@ -488,8 +537,7 @@ ticket_7430(Config) -> {xmlElement,a,a,[], {xmlNamespace,[],[]}, [],1,[], - [{xmlText,[{a,1}],1,[],"é",text}, - {xmlText,[{a,1}],2,[],"\né",text}], + [{xmlText,[{a,1}],1,[],"é\né",text}], [],_,undeclared} -> ok; _ -> diff --git a/lib/xmerl/test/xmerl_sax_SUITE.erl b/lib/xmerl/test/xmerl_sax_SUITE.erl index f5c0a783c4..68b9bcc4a2 100644 --- a/lib/xmerl/test/xmerl_sax_SUITE.erl +++ b/lib/xmerl/test/xmerl_sax_SUITE.erl @@ -2,7 +2,7 @@ %%---------------------------------------------------------------------- %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2010-2016. All Rights Reserved. +%% Copyright Ericsson AB 2010-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -33,7 +33,6 @@ %%====================================================================== %% External functions %%====================================================================== - %%---------------------------------------------------------------------- %% Initializations %%---------------------------------------------------------------------- @@ -42,12 +41,15 @@ all() -> [{group, bugs}]. groups() -> - [{bugs, [], [ticket_8213, ticket_8214, ticket_11551]}]. + [{bugs, [], [ticket_8213, ticket_8214, ticket_11551, + fragmented_xml_directive, + old_dom_event_fun_endDocument_bug, + event_fun_endDocument_error_test, + event_fun_startDocument_error_test]}]. -%%---------------------------------------------------------------------- +%%====================================================================== %% Tests -%%---------------------------------------------------------------------- - +%%====================================================================== %%---------------------------------------------------------------------- %% Test Case %% ID: ticket_8213 @@ -56,7 +58,6 @@ ticket_8213(_Config) -> {ok,ok,[]} = xmerl_sax_parser:stream("<elem/>", [{event_fun, fun (_E,_,_) -> ok end}]), ok. - %%---------------------------------------------------------------------- %% Test Case %% ID: ticket_8214 @@ -85,17 +86,80 @@ ticket_11551(_Config) -> <a>hej</a> <?xml version=\"1.0\" encoding=\"utf-8\" ?> <a>hej</a>">>, - {ok, undefined, <<"<?xml", _/binary>>} = xmerl_sax_parser:stream(Stream1, []), + {ok, undefined, <<"\n<?xml", _/binary>>} = xmerl_sax_parser:stream(Stream1, []), Stream2= <<"<?xml version=\"1.0\" encoding=\"utf-8\" ?> <a>hej</a> <?xml version=\"1.0\" encoding=\"utf-8\" ?> <a>hej</a>">>, - {ok, undefined, <<"<?xml", _/binary>>} = xmerl_sax_parser:stream(Stream2, []), + {ok, undefined, <<"\n\n\n<?xml", _/binary>>} = xmerl_sax_parser:stream(Stream2, []), Stream3= <<"<a>hej</a> <?xml version=\"1.0\" encoding=\"utf-8\" ?> <a>hej</a>">>, - {ok, undefined, <<"<?xml", _/binary>>} = xmerl_sax_parser:stream(Stream3, []), + {ok, undefined, <<"\n\n<?xml", _/binary>>} = xmerl_sax_parser:stream(Stream3, []), + ok. + +%%---------------------------------------------------------------------- +%% Test Case +%% ID: fragmented_xml_directive +%% Test of fragmented xml directive by reading one byte per continuation ca +fragmented_xml_directive(Config) -> + DataDir = proplists:get_value(data_dir, Config), + Name = filename:join(DataDir, "test_data_1.xml"), + {ok, Fd} = file:open(Name, [raw, read,binary]), + Cf = fun cf_fragmented_xml_directive/1, + {ok, undefined, _} = xmerl_sax_parser:stream(<<>>, + [{continuation_fun, Cf}, + {continuation_state, Fd}]), + ok. + +%%---------------------------------------------------------------------- +%% Test Case +%% ID: old_dom_event_fun_endDocument_bug +%% The old_dom backend previous generateded an uncatched exception +%% instead of the correct fatal_error from the parser. +old_dom_event_fun_endDocument_bug(_Config) -> + %% Stream contains bad characters, + {fatal_error, _, _, _, _} = + xmerl_sax_parser:stream([60,63,120,109,108,32,118,101,114,115,105,111,110,61,39,49,46,48,39,32,101,110,99,111,100,105,110,103,61,39,117,116,102,45,56,39,63,62,60, + 99,111,109,109,97,110,100,62,60,104,101,97,100,101,114,62,60,116,114,97,110,115,97,99,116,105,111,110,73,100,62,49,60,47,116,114,97,110, + 115,97,99,116,105,111,110,73,100,62,60,47,104,101,97,100,101,114,62,60,98,111,100,121,62,95,226,130,172,59,60,60,47,98,111,100,121,62,60, + 47,99,111,109,109,97,110,100,62,60,47,120,49,95,49,62], + [{event_fun,fun xmerl_sax_old_dom:event/3}, + {event_state,xmerl_sax_old_dom:initial_state()}]), ok. + +%%---------------------------------------------------------------------- +%% Test Case +%% ID: event_fun_endDocument_error_test +event_fun_endDocument_error_test(_Config) -> + Stream = <<"<?xml version=\"1.0\" encoding=\"utf-8\"?><a>hej</a>">>, + Ef = fun(endDocument, _ , _) -> throw({event_error, "endDocument error"}); + (_, _, S) -> S + end, + {event_error, _, _, _, _} = xmerl_sax_parser:stream(Stream, [{event_fun, Ef}]), + ok. + +%%---------------------------------------------------------------------- +%% Test Case +%% ID: event_fun_startDocument_error_test +event_fun_startDocument_error_test(_Config) -> + Stream = <<"<?xml version=\"1.0\" encoding=\"utf-8\"?><a>hej</a>">>, + Ef = fun(startDocument, _ , _) -> throw({event_error, "endDocument error"}); + (_, _, S) -> S + end, + {event_error, _, _, _, _} = xmerl_sax_parser:stream(Stream, [{event_fun, Ef}]), + ok. + +%%====================================================================== +%% Internal functions +%%====================================================================== +cf_fragmented_xml_directive(IoDevice) -> + case file:read(IoDevice, 1) of + eof -> + {<<>>, IoDevice}; + {ok, FileBin} -> + {FileBin, IoDevice} + end. diff --git a/lib/xmerl/test/xmerl_sax_SUITE_data/test_data_1.xml b/lib/xmerl/test/xmerl_sax_SUITE_data/test_data_1.xml new file mode 100644 index 0000000000..efbaee6b81 --- /dev/null +++ b/lib/xmerl/test/xmerl_sax_SUITE_data/test_data_1.xml @@ -0,0 +1,4 @@ +<?xml version="1.0" encoding="utf-8"?> +<a> +Hej +</a> diff --git a/lib/xmerl/test/xmerl_sax_std_SUITE.erl b/lib/xmerl/test/xmerl_sax_std_SUITE.erl index 525a3b175a..b8412206cc 100644 --- a/lib/xmerl/test/xmerl_sax_std_SUITE.erl +++ b/lib/xmerl/test/xmerl_sax_std_SUITE.erl @@ -2,7 +2,7 @@ %%---------------------------------------------------------------------- %% %CopyrightBegin% %% -%% Copyright Ericsson AB 2010-2016. All Rights Reserved. +%% Copyright Ericsson AB 2010-2017. All Rights Reserved. %% %% Licensed under the Apache License, Version 2.0 (the "License"); %% you may not use this file except in compliance with the License. @@ -507,11 +507,8 @@ end_per_testcase(_Func,_Config) -> 'not-wf-sa-036'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"xmltest","not-wf/sa/036.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_,<<"Illegal data\r\n">>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - %%check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Case @@ -522,11 +519,8 @@ end_per_testcase(_Func,_Config) -> 'not-wf-sa-037'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"xmltest","not-wf/sa/037.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_,<<" \r\n">>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - %%check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Case @@ -561,11 +555,8 @@ end_per_testcase(_Func,_Config) -> 'not-wf-sa-040'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"xmltest","not-wf/sa/040.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_,<<"<doc></doc>\r\n">>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - %%check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Case @@ -576,11 +567,8 @@ end_per_testcase(_Func,_Config) -> 'not-wf-sa-041'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"xmltest","not-wf/sa/041.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_,<<"<doc></doc>\r\n">>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - %%check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Case @@ -603,11 +591,8 @@ end_per_testcase(_Func,_Config) -> 'not-wf-sa-043'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"xmltest","not-wf/sa/043.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_,<<"Illegal data\r\n">>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - %%check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Case @@ -618,11 +603,8 @@ end_per_testcase(_Func,_Config) -> 'not-wf-sa-044'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"xmltest","not-wf/sa/044.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_,<<"<doc/>\r\n">>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - %%check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Case @@ -669,11 +651,8 @@ end_per_testcase(_Func,_Config) -> 'not-wf-sa-048'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"xmltest","not-wf/sa/048.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_,<<"<![CDATA[]]>\r\n">>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - %%check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Case @@ -1416,11 +1395,8 @@ end_per_testcase(_Func,_Config) -> 'not-wf-sa-110'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"xmltest","not-wf/sa/110.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_,<<"&e;\r\n">>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - %%check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Case @@ -1914,9 +1890,9 @@ end_per_testcase(_Func,_Config) -> %% Special case becase we returns everything after a legal document %% as an rest instead of giving and error to let the user handle %% multipple docs on a stream. - {ok,_,<<"<?xml version=\"1.0\"?>\r\n">>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - % R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), - % check_result(R, "not-wf"). + %{ok,_,<<"<?xml version=\"1.0\"?>\r\n">>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Case @@ -7784,11 +7760,8 @@ end_per_testcase(_Func,_Config) -> 'o-p01fail3'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"oasis","p01fail3.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_, <<"<bad/>", _/binary>>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - %%check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Case @@ -11417,12 +11390,8 @@ end_per_testcase(_Func,_Config) -> 'ibm-not-wf-P01-ibm01n02'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"ibm","not-wf/P01/ibm01n02.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_, <<"<?xml version=\"1.0\"?>", _/binary>>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - % R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), - % check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Case @@ -11433,11 +11402,8 @@ end_per_testcase(_Func,_Config) -> 'ibm-not-wf-P01-ibm01n03'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"ibm","not-wf/P01/ibm01n03.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_, <<"<title>Wrong combination!</title>", _/binary>>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - %%check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Cases @@ -13027,11 +12993,8 @@ end_per_testcase(_Func,_Config) -> 'ibm-not-wf-P27-ibm27n01'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"ibm","not-wf/P27/ibm27n01.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_, <<"<!ELEMENT cat EMPTY>">>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - %%check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Cases @@ -13461,11 +13424,8 @@ end_per_testcase(_Func,_Config) -> 'ibm-not-wf-P39-ibm39n06'(Config) -> file:set_cwd(xmerl_test_lib:get_data_dir(Config)), Path = filename:join([xmerl_test_lib:get_data_dir(Config),"ibm","not-wf/P39/ibm39n06.xml"]), - %% Special case becase we returns everything after a legal document - %% as an rest instead of giving and error to let the user handle - %% multipple docs on a stream. - {ok,_,<<"content after end tag\r\n">>} = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]). - %%check_result(R, "not-wf"). + R = xmerl_sax_parser:file(Path, [{event_fun, fun(_,_,S) -> S end}]), + check_result(R, "not-wf"). %%---------------------------------------------------------------------- %% Test Cases diff --git a/lib/xmerl/test/xmerl_sax_stream_SUITE.erl b/lib/xmerl/test/xmerl_sax_stream_SUITE.erl new file mode 100644 index 0000000000..7315f67374 --- /dev/null +++ b/lib/xmerl/test/xmerl_sax_stream_SUITE.erl @@ -0,0 +1,260 @@ +%%-*-erlang-*- +%%---------------------------------------------------------------------- +%% %CopyrightBegin% +%% +%% Copyright Ericsson AB 2017. All Rights Reserved. +%% +%% Licensed under the Apache License, Version 2.0 (the "License"); +%% you may not use this file except in compliance with the License. +%% You may obtain a copy of the License at +%% +%% http://www.apache.org/licenses/LICENSE-2.0 +%% +%% Unless required by applicable law or agreed to in writing, software +%% distributed under the License is distributed on an "AS IS" BASIS, +%% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +%% See the License for the specific language governing permissions and +%% limitations under the License. +%% +%% %CopyrightEnd% +%%---------------------------------------------------------------------- +%% File : xmerl_sax_stream_SUITE.erl +%%---------------------------------------------------------------------- +-module(xmerl_sax_stream_SUITE). +-compile(export_all). + +%%---------------------------------------------------------------------- +%% Include files +%%---------------------------------------------------------------------- +-include_lib("common_test/include/ct.hrl"). +-include_lib("kernel/include/file.hrl"). + +%%====================================================================== +%% External functions +%%====================================================================== + +%%---------------------------------------------------------------------- +%% Initializations +%%---------------------------------------------------------------------- +all() -> + [ + one_document, + two_documents, + one_document_and_junk, + end_of_stream + ]. + +%%---------------------------------------------------------------------- +%% Initializations +%%---------------------------------------------------------------------- + +init_per_suite(Config) -> + Config. + +end_per_suite(_Config) -> + ok. + +init_per_testcase(_TestCase, Config) -> + Config. + +end_per_testcase(_Func, _Config) -> + ok. + +%%---------------------------------------------------------------------- +%% Tests +%%---------------------------------------------------------------------- + +%%---------------------------------------------------------------------- +%% Send One doc over stream +one_document(Config) -> + Port = 11111, + + {ok, ListenSocket} = listen(Port), + Self = self(), + + spawn( + fun() -> + case catch gen_tcp:accept(ListenSocket) of + {ok, S} -> + Result = xmerl_sax_parser:stream(<<>>, + [{continuation_state, S}, + {continuation_fun, + fun(Sd) -> + io:format("Continuation called!!", []), + case gen_tcp:recv(Sd, 0) of + {ok, Packet} -> + io:format("Packet: ~p\n", [Packet]), + {Packet, Sd}; + {error, Reason} -> + throw({error, Reason}) + end + end}]), + Self ! {xmerl_sax, Result}, + close(S); + Error -> + Self ! {xmerl_sax, {error, {accept, Error}}} + end + end), + + {ok, SendSocket} = connect(localhost, Port), + + {ok, Binary} = file:read_file(filename:join([datadir(Config), "xmerl_sax_stream_one.xml"])), + + send_chunks(SendSocket, Binary), + + receive + {xmerl_sax, {ok, undefined, Rest}} -> + <<"\n">> = Rest, + io:format("Ok Rest: ~p\n", [Rest]) + after 5000 -> + ct:fail("Timeout") + end, + ok. + +%%---------------------------------------------------------------------- +%% Send Two doc over stream +two_documents(Config) -> + Port = 11111, + + {ok, ListenSocket} = listen(Port), + Self = self(), + + spawn( + fun() -> + case catch gen_tcp:accept(ListenSocket) of + {ok, S} -> + Result = xmerl_sax_parser:stream(<<>>, + [{continuation_state, S}, + {continuation_fun, + fun(Sd) -> + io:format("Continuation called!!", []), + case gen_tcp:recv(Sd, 0) of + {ok, Packet} -> + io:format("Packet: ~p\n", [Packet]), + {Packet, Sd}; + {error, Reason} -> + throw({error, Reason}) + end + end}]), + Self ! {xmerl_sax, Result}, + close(S); + Error -> + Self ! {xmerl_sax, {error, {accept, Error}}} + end + end), + + {ok, SendSocket} = connect(localhost, Port), + + {ok, Binary} = file:read_file(filename:join([datadir(Config), "xmerl_sax_stream_two.xml"])), + + send_chunks(SendSocket, Binary), + + receive + {xmerl_sax, {ok, undefined, Rest}} -> + <<"\n<?x", _R/binary>> = Rest, + io:format("Ok Rest: ~p\n", [Rest]) + after 5000 -> + ct:fail("Timeout") + end, + ok. + +%%---------------------------------------------------------------------- +%% Send one doc and then junk on stream +one_document_and_junk(Config) -> + Port = 11111, + + {ok, ListenSocket} = listen(Port), + Self = self(), + + spawn( + fun() -> + case catch gen_tcp:accept(ListenSocket) of + {ok, S} -> + Result = xmerl_sax_parser:stream(<<>>, + [{continuation_state, S}, + {continuation_fun, + fun(Sd) -> + io:format("Continuation called!!", []), + case gen_tcp:recv(Sd, 0) of + {ok, Packet} -> + io:format("Packet: ~p\n", [Packet]), + {Packet, Sd}; + {error, Reason} -> + throw({error, Reason}) + end + end}]), + Self ! {xmerl_sax, Result}, + close(S); + Error -> + Self ! {xmerl_sax, {error, {accept, Error}}} + end + end), + + {ok, SendSocket} = connect(localhost, Port), + + {ok, Binary} = file:read_file(filename:join([datadir(Config), "xmerl_sax_stream_one_junk.xml"])), + + send_chunks(SendSocket, Binary), + + receive + {xmerl_sax, {ok, undefined, Rest}} -> + <<"\nth", _R/binary>> = Rest, + io:format("Ok Rest: ~p\n", [Rest]) + after 10000 -> + ct:fail("Timeout") + end, + ok. + +%%---------------------------------------------------------------------- +%% Test of continuation when end of stream +end_of_stream(Config) -> + Stream = <<"<?xml version=\"1.0\" encoding=\"utf-8\"?><a>hej</a>">>, + {ok, undefined, <<>>} = xmerl_sax_parser:stream(Stream, []), + ok. + +%%---------------------------------------------------------------------- +%% Utility functions +%%---------------------------------------------------------------------- +listen(Port) -> + case catch gen_tcp:listen(Port, [{active, false}, + binary, + {keepalive, true}, + {reuseaddr,true}]) of + {ok, ListenSocket} -> + {ok, ListenSocket}; + {error, Reason} -> + {error, {listen, Reason}} + end. + +close(Socket) -> + (catch gen_tcp:close(Socket)). + +connect(Host, Port) -> + Timeout = 5000, + % Options1 = check_options(Options), + Options = [binary], + case catch gen_tcp:connect(Host, Port, Options, Timeout) of + {ok, Socket} -> + {ok, Socket}; + {error, Reason} -> + {error, Reason} + end. + +send_chunks(Socket, Binary) -> + BSize = erlang:size(Binary), + if + BSize > 25 -> + <<Head:25/binary, Tail/binary>> = Binary, + case gen_tcp:send(Socket, Head) of + ok -> + timer:sleep(1000), + send_chunks(Socket, Tail); + {error,closed} -> + ok + end; + true -> + gen_tcp:send(Socket, Binary) + end. + +datadir(Config) -> + proplists:get_value(data_dir, Config). diff --git a/lib/xmerl/test/xmerl_sax_stream_SUITE_data/xmerl_sax_stream_one.xml b/lib/xmerl/test/xmerl_sax_stream_SUITE_data/xmerl_sax_stream_one.xml new file mode 100644 index 0000000000..30328bb188 --- /dev/null +++ b/lib/xmerl/test/xmerl_sax_stream_SUITE_data/xmerl_sax_stream_one.xml @@ -0,0 +1,17 @@ +<?xml version="1.0"?> +<person> +<name> +Arne Andersson +</name> +<address> +<street> + Old Road 456 +</street> +<zip> +12323 +</zip> +<city> +Small City +</city> +</address> +</person> diff --git a/lib/xmerl/test/xmerl_sax_stream_SUITE_data/xmerl_sax_stream_one_junk.xml b/lib/xmerl/test/xmerl_sax_stream_SUITE_data/xmerl_sax_stream_one_junk.xml new file mode 100644 index 0000000000..f730a95865 --- /dev/null +++ b/lib/xmerl/test/xmerl_sax_stream_SUITE_data/xmerl_sax_stream_one_junk.xml @@ -0,0 +1,18 @@ +<?xml version="1.0"?> +<person> +<name> +Arne Andersson +</name> +<address> +<street> + Old Road 456 +</street> +<zip> +12323 +</zip> +<city> +Small City +</city> +</address> +</person> +this is junk ...... diff --git a/lib/xmerl/test/xmerl_sax_stream_SUITE_data/xmerl_sax_stream_two.xml b/lib/xmerl/test/xmerl_sax_stream_SUITE_data/xmerl_sax_stream_two.xml new file mode 100644 index 0000000000..e241a02190 --- /dev/null +++ b/lib/xmerl/test/xmerl_sax_stream_SUITE_data/xmerl_sax_stream_two.xml @@ -0,0 +1,34 @@ +<?xml version="1.0"?> +<person> +<name> +Arne Andersson +</name> +<address> +<street> + Old Road 456 +</street> +<zip> +12323 +</zip> +<city> +Small City +</city> +</address> +</person> +<?xml version="1.0"?> +<person> +<name> +Bertil Bengtson +</name> +<address> +<street> + New Road 4 +</street> +<zip> +12328 +</zip> +<city> +Small City +</city> +</address> +</person> diff --git a/lib/xmerl/vsn.mk b/lib/xmerl/vsn.mk index a78a035a1f..2e9c9061d9 100644 --- a/lib/xmerl/vsn.mk +++ b/lib/xmerl/vsn.mk @@ -1 +1 @@ -XMERL_VSN = 1.3.11 +XMERL_VSN = 1.3.15 diff --git a/lib/xmerl/xmerl.pub b/lib/xmerl/xmerl.pub deleted file mode 100644 index 29a81bbde2..0000000000 --- a/lib/xmerl/xmerl.pub +++ /dev/null @@ -1,19 +0,0 @@ -{name, "xmerl"}. -{vsn, {0,18}}. -{summary, "XML processing tools"}. -{author, "Ulf Wiger, Johan Blom, Richard Carlsson, Mickael Remond", "[email protected]", "20020307"}. -{keywords, ["xml", "html"]}. -{needs, [{compiler, "3.0"}]}. -{abstract, - "Implements a set of tools for processing XML documents, as well as " - "working with XML-like structures in Erlang. The main attraction so far " - "is a single-pass, highly customizable XML processor. Other components are " - "an export/translation facility and an XPATH query engine. " - "This version fixes a few bugs in the scanner, and improves HTML export. " - "The latest version can be found at http://sowap.sourceforge.net/ " - "Note that this is still very much a beta product."}. - - - - - |