diff options
author | Tom Moertel <[email protected]> | 2011-04-28 17:15:16 -0400 |
---|---|---|
committer | Tom Moertel <[email protected]> | 2011-04-28 17:15:16 -0400 |
commit | a011451e7e40690b533003802ee54f7c6f77e16e (patch) | |
tree | 7785b731ce89bebba8c873a992f5fe0111c3264f /lib/xmerl/src/xmerl_sax_parser.erl | |
parent | 3e815447cafbcb704bac1fac3d195e94def7080f (diff) | |
download | otp-a011451e7e40690b533003802ee54f7c6f77e16e.tar.gz otp-a011451e7e40690b533003802ee54f7c6f77e16e.tar.bz2 otp-a011451e7e40690b533003802ee54f7c6f77e16e.zip |
Prevent xmerl from over-normalizing character references in attributes
Section 3.3.3 of the XML Recommendation gives the rules for
attribute-value normalization. One of those rules requires
that character references not be re-normalized after being
replaced with the referenced characters:
For a character reference, append the referenced
character to the normalized value.
And, in particular:
Note that if the unnormalized attribute value contains
a character reference to a white space character other
than space (#x20), the normalized value contains the
referenced character itself (#xD, #xA or #x9).
Source: http://www.w3.org/TR/xml/#AVNormalize
In xmerl_scan, however, character references in attributes are
normalized an extra time after replacement. For example, the
character reference "
" in the following XML document gets
normalized (incorrectly) into a space when parsed:
2> xmerl_scan:string("<root x='
'/>").
{... [{xmlAttribute,x,[],[],[],[],1,[]," ",false}] ...}
This short patch restores the correct behavior:
2> xmerl_scan:string("<root x='
'/>").
{... [{xmlAttribute,x,[],[],[],[],1,[],"\n",false}] ...}
NOTE: This change does not include tests because I could not
find a test suite for xmerl.
Diffstat (limited to 'lib/xmerl/src/xmerl_sax_parser.erl')
0 files changed, 0 insertions, 0 deletions