aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorPéter Dimitrov <[email protected]>2017-10-30 13:38:28 +0100
committerPéter Dimitrov <[email protected]>2017-10-30 16:18:34 +0100
commitf7d3033dfeeb012841729bf8ed3889da8457b4f7 (patch)
treea23dc5a387204123053c9cac8c0765b073141805
parentce78af7e5a76dc4a27673ab5c80a315762b992b1 (diff)
downloadotp-f7d3033dfeeb012841729bf8ed3889da8457b4f7.tar.gz
otp-f7d3033dfeeb012841729bf8ed3889da8457b4f7.tar.bz2
otp-f7d3033dfeeb012841729bf8ed3889da8457b4f7.zip
stdlib: Update documentation (normalize/1)
-rw-r--r--lib/stdlib/doc/src/uri_string.xml115
1 files changed, 83 insertions, 32 deletions
diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml
index 8322eecb24..55d8690b98 100644
--- a/lib/stdlib/doc/src/uri_string.xml
+++ b/lib/stdlib/doc/src/uri_string.xml
@@ -69,6 +69,9 @@
<item>Changing inbound binary and percent-encoding of URIs<br></br>
<seealso marker="#transcode/2"><c>transcode/2</c></seealso>
</item>
+ <item>Transforming URIs into a normalized form<br></br>
+ <seealso marker="#normalize/1"><c>normalize/1</c></seealso>
+ </item>
<item>Composing form-urlencoded query strings from a list of key-value pairs<br></br>
<seealso marker="#compose_query/1"><c>compose_query/1</c></seealso><br></br>
<seealso marker="#compose_query/2"><c>compose_query/2</c></seealso>
@@ -84,12 +87,21 @@
<item>Outbound binary encoding in binaries</item>
<item>Outbound percent-encoding in lists and binaries</item>
</list>
+ <p>Functions with <c>uri_string()</c> argument accept lists, binaries and
+ mixed lists (lists with binary elements) as input type. All of the functions but
+ <c>transcode/2</c> expects input as lists of unicode codepoints, UTF-8 encoded binaries
+ and UTF-8 percent-encoded URI parts ("%C3%B6" corresponds to the unicode character "ö").</p>
<p>Unless otherwise specified the return value type and encoding are the same as the input
type and encoding. That is, binary input returns binary output, list input returns a list
output but mixed input returns list output.</p>
- <p>All of the functions but <c>transcode/2</c> expects input as unicode codepoints in
- lists, UTF-8 encoding in binaries and UTF-8 encoding in percent-encoded URI parts.
- <c>transcode/2</c> provides the means to convert between the supported URI encodings.</p>
+ <p>In case of lists there is only percent-encoding. In binaries, however, both binary encoding
+ and percent-encoding shall be considered. <c>transcode/2</c> provides the means to convert
+ between the supported encodings, it takes a <c>uri_string()</c> and a list of options
+ specifying inbound and outbound encodings.</p>
+ <p><url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url> does not mandate any specific
+ character encoding and it is usually defined by the protocol or surrounding text. This library
+ takes the same assumption, binary and percent-encoding are handled as one configuration unit,
+ they cannot be set to different values.</p>
</description>
<datatypes>
@@ -97,28 +109,30 @@
<name name="error"/>
<desc>
<p>Error tuple indicating the type of error. Possible values of the second component:</p>
- <list type="bulleted">
- <item><c>invalid_character</c></item>
- <item><c>invalid_input</c></item>
- <item><c>invalid_map</c></item>
- <item><c>invalid_percent_encoding</c></item>
- <item><c>invalid_scheme</c></item>
- <item><c>invalid_uri</c></item>
- <item><c>invalid_utf8</c></item>
- <item><c>missing_value</c></item>
- </list>
+ <list type="bulleted">
+ <item><c>invalid_character</c></item>
+ <item><c>invalid_input</c></item>
+ <item><c>invalid_map</c></item>
+ <item><c>invalid_percent_encoding</c></item>
+ <item><c>invalid_scheme</c></item>
+ <item><c>invalid_uri</c></item>
+ <item><c>invalid_utf8</c></item>
+ <item><c>missing_value</c></item>
+ </list>
+ <p>The third component is a list or binary providing additional information about the
+ cause of the error.</p>
</desc>
</datatype>
<datatype>
<name name="uri_map"/>
<desc>
- <p>URI map holding the main components of a URI.</p>
+ <p>Map holding the main components of a URI.</p>
</desc>
</datatype>
<datatype>
<name name="uri_string"/>
<desc>
- <p>List of unicode codepoints, UTF-8 encoded binary, or a mix of the two,
+ <p>List of unicode codepoints, a UTF-8 encoded binary, or a mix of the two,
representing an <url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url>
compliant URI (<em>percent-encoded form</em>).
A URI is a sequence of characters from a very limited set: the letters of
@@ -134,10 +148,11 @@
<fsummary>Compose urlencoded query string.</fsummary>
<desc>
<p>Composes a form-urlencoded <c><anno>QueryString</anno></c> based on a
- <c><anno>QueryList</anno></c>, a list of unescaped key-value pairs.
- Media type <c>application/x-www-form-urlencoded</c> is defined in section
+ <c><anno>QueryList</anno></c>, a list of non-percent-encoded key-value pairs.
+ Form-urlencoding is defined in section
8.2.1 of <url href="https://www.ietf.org/rfc/rfc1866.txt">RFC 1866</url>
- (HTML 2.0). Reserved and unsafe characters, as
+ (HTML 2.0) for media type <c>application/x-www-form-urlencoded</c>.
+ Reserved and unsafe characters, as
defined by <url href="https://www.ietf.org/rfc/rfc1738.txt">RFC 1738</url>
(Uniform Resource Locators), are percent-encoded.</p>
<p>See also the opposite operation <seealso marker="#dissect_query/1">
@@ -145,9 +160,8 @@
</p>
<p><em>Example:</em></p>
<pre>
-1> <input>uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],</input>
-1> [{separator, semicolon}]).
-"foo+bar=1;city=%C3%B6rebro"
+1> <input>uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).</input>
+<![CDATA["foo+bar=1&amp;city=%C3%B6rebro"]]>
2> <![CDATA[uri_string:compose_query([{<<"foo bar">>,<<"1">>},
2> {<<"city">>,<<"örebro"/utf8>>}]).]]>
<![CDATA[<<"foo+bar=1&amp;city=%C3%B6rebro">>]]>
@@ -169,7 +183,10 @@
<pre>
1> <input>uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],</input>
1> [{separator, amp}]).
-<![CDATA["foo+bar=1&city=%C3%B6rebro"]]>
+<![CDATA["foo+bar=1&city=%C3%B6rebro"
+2> uri_string:compose_query([{<<"foo bar">>,<<"1">>},
+2> {<<"city">>,<<"örebro"/utf8>>}], [{separator, escaped_amp}]).]]>
+<![CDATA[<<"foo+bar=1&amp;city=%C3%B6rebro">>]]>
</pre>
</desc>
</func>
@@ -179,12 +196,15 @@
<fsummary>Dissect query string.</fsummary>
<desc>
<p>Dissects an urlencoded <c><anno>QueryString</anno></c> and returns a
- <c><anno>QueryList</anno></c>, a list of unescaped key-value pairs.
- Media type <c>application/x-www-form-urlencoded</c> is defined in section
+ <c><anno>QueryList</anno></c>, a list of non-percent-encoded key-value pairs.
+ Form-urlencoding is defined in section
8.2.1 of <url href="https://www.ietf.org/rfc/rfc1866.txt">RFC 1866</url>
- (HTML 2.0). Percent-encoded segments are decoded
- as defined by <url href="https://www.ietf.org/rfc/rfc1738.txt">RFC 1738</url>
+ (HTML 2.0) for media type <c>application/x-www-form-urlencoded</c>.
+ Percent-encoded segments are decoded as defined by
+ <url href="https://www.ietf.org/rfc/rfc1738.txt">RFC 1738</url>
(Uniform Resource Locators).</p>
+ <p>Supported separator types: <c>amp</c> (<![CDATA[&]]>), <c>escaped_amp</c>
+ (<![CDATA[&amp;]]>) and <c>semicolon</c> (;).</p>
<p>See also the opposite operation <seealso marker="#compose_query/1">
<c>compose_query/1</c></seealso>.
</p>
@@ -192,18 +212,42 @@
<pre>
1> <input>uri_string:dissect_query("foo+bar=1;city=%C3%B6rebro").</input>
[{"foo bar","1"},{"city","örebro"}]
-2> <![CDATA[uri_string:dissect_query(<<"foo+bar=1;city=%C3%B6rebro">>).]]>
+2> <![CDATA[uri_string:dissect_query(<<"foo+bar=1&city=%C3%B6rebro">>).]]>
<![CDATA[[{<<"foo bar">>,<<"1">>},{<<"city">>,<<"örebro"/utf8>>}] ]]>
</pre>
</desc>
</func>
<func>
+ <name name="normalize" arity="1"/>
+ <fsummary>Syntax-based normalization.</fsummary>
+ <desc>
+ <p>Transforms <c><anno>URIString</anno></c> into a normalized form
+ using Syntax-Based Normalization as defined by
+ <url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url>.</p>
+ <p>This function implements case normalization, percent-encoding
+ normalization, path segment normalization and scheme based normalization
+ for HTTP(S) with basic support for FTP, SSH, SFTP and TFTP.</p>
+ <p><em>Example:</em></p>
+ <pre>
+1> <input>uri_string:normalize("/a/b/c/./../../g").</input>
+"/a/g"
+2> <![CDATA[uri_string:normalize(<<"mid/content=5/../6">>).]]>
+<![CDATA[<<"mid/6">>]]>
+3> uri_string:normalize("http://localhost:80").
+"https://localhost/"
+ </pre>
+ </desc>
+ </func>
+
+ <func>
<name name="parse" arity="1"/>
<fsummary>Parse URI into a map.</fsummary>
<desc>
- <p>Returns a <c>URIMap</c>, that is a <em>uri_map()</em> with the parsed components
- of the <c><anno>URIString</anno></c>. If parsing fails, an error tuple is returned.</p>
+ <p>Parses an <url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url>
+ compliant <c>uri_string()</c> into a <c>uri_map()</c>, that holds the parsed
+ components of the <c>URI</c>.
+ If parsing fails, an error tuple is returned.</p>
<p>See also the opposite operation <seealso marker="#recompose/1">
<c>recompose/1</c></seealso>.</p>
<p><em>Example:</em></p>
@@ -224,8 +268,9 @@
<name name="recompose" arity="1"/>
<fsummary>Recompose URI.</fsummary>
<desc>
- <p>Returns an <url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url> compliant
- <c><anno>URIString</anno></c> (percent-encoded).
+ <p>Creates an <url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url> compliant
+ <c><anno>URIString</anno></c> (percent-encoded), based on the components of
+ <c><anno>URIMap</anno></c>.
If the <c><anno>URIMap</anno></c> is invalid, an error tuple is returned.</p>
<p>See also the opposite operation <seealso marker="#parse/1">
<c>parse/1</c></seealso>.</p>
@@ -249,13 +294,19 @@
<p>Transcodes an <url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url>
compliant <c><anno>URIString</anno></c>,
where <c><anno>Options</anno></c> is a list of tagged tuples, specifying the inbound
- (<c>in_encoding</c>) and outbound (<c>out_encoding</c>) encodings.
+ (<c>in_encoding</c>) and outbound (<c>out_encoding</c>) encodings. <c>in_encoding</c>
+ and <c>out_encoding</c> specifies both binary encoding and percent-encoding for the
+ input and output data. Mixed encoding, where binary encoding is not the same as
+ percent-encoding, is not supported.
If an argument is invalid, an error tuple is returned.</p>
<p><em>Example:</em></p>
<pre>
1> <input><![CDATA[uri_string:transcode(<<"foo%00%00%00%F6bar"/utf32>>,]]></input>
1> [{in_encoding, utf32},{out_encoding, utf8}]).
<![CDATA[<<"foo%C3%B6bar"/utf8>>]]>
+2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
+2> {out_encoding, utf8}]).
+"foo%C3%B6bar"
</pre>
</desc>
</func>