diff options
Diffstat (limited to 'lib/stdlib')
-rw-r--r-- | lib/stdlib/doc/src/uri_string.xml | 115 |
1 files changed, 83 insertions, 32 deletions
diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml index 8322eecb24..55d8690b98 100644 --- a/lib/stdlib/doc/src/uri_string.xml +++ b/lib/stdlib/doc/src/uri_string.xml @@ -69,6 +69,9 @@ <item>Changing inbound binary and percent-encoding of URIs<br></br> <seealso marker="#transcode/2"><c>transcode/2</c></seealso> </item> + <item>Transforming URIs into a normalized form<br></br> + <seealso marker="#normalize/1"><c>normalize/1</c></seealso> + </item> <item>Composing form-urlencoded query strings from a list of key-value pairs<br></br> <seealso marker="#compose_query/1"><c>compose_query/1</c></seealso><br></br> <seealso marker="#compose_query/2"><c>compose_query/2</c></seealso> @@ -84,12 +87,21 @@ <item>Outbound binary encoding in binaries</item> <item>Outbound percent-encoding in lists and binaries</item> </list> + <p>Functions with <c>uri_string()</c> argument accept lists, binaries and + mixed lists (lists with binary elements) as input type. All of the functions but + <c>transcode/2</c> expects input as lists of unicode codepoints, UTF-8 encoded binaries + and UTF-8 percent-encoded URI parts ("%C3%B6" corresponds to the unicode character "ö").</p> <p>Unless otherwise specified the return value type and encoding are the same as the input type and encoding. That is, binary input returns binary output, list input returns a list output but mixed input returns list output.</p> - <p>All of the functions but <c>transcode/2</c> expects input as unicode codepoints in - lists, UTF-8 encoding in binaries and UTF-8 encoding in percent-encoded URI parts. - <c>transcode/2</c> provides the means to convert between the supported URI encodings.</p> + <p>In case of lists there is only percent-encoding. In binaries, however, both binary encoding + and percent-encoding shall be considered. <c>transcode/2</c> provides the means to convert + between the supported encodings, it takes a <c>uri_string()</c> and a list of options + specifying inbound and outbound encodings.</p> + <p><url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url> does not mandate any specific + character encoding and it is usually defined by the protocol or surrounding text. This library + takes the same assumption, binary and percent-encoding are handled as one configuration unit, + they cannot be set to different values.</p> </description> <datatypes> @@ -97,28 +109,30 @@ <name name="error"/> <desc> <p>Error tuple indicating the type of error. Possible values of the second component:</p> - <list type="bulleted"> - <item><c>invalid_character</c></item> - <item><c>invalid_input</c></item> - <item><c>invalid_map</c></item> - <item><c>invalid_percent_encoding</c></item> - <item><c>invalid_scheme</c></item> - <item><c>invalid_uri</c></item> - <item><c>invalid_utf8</c></item> - <item><c>missing_value</c></item> - </list> + <list type="bulleted"> + <item><c>invalid_character</c></item> + <item><c>invalid_input</c></item> + <item><c>invalid_map</c></item> + <item><c>invalid_percent_encoding</c></item> + <item><c>invalid_scheme</c></item> + <item><c>invalid_uri</c></item> + <item><c>invalid_utf8</c></item> + <item><c>missing_value</c></item> + </list> + <p>The third component is a list or binary providing additional information about the + cause of the error.</p> </desc> </datatype> <datatype> <name name="uri_map"/> <desc> - <p>URI map holding the main components of a URI.</p> + <p>Map holding the main components of a URI.</p> </desc> </datatype> <datatype> <name name="uri_string"/> <desc> - <p>List of unicode codepoints, UTF-8 encoded binary, or a mix of the two, + <p>List of unicode codepoints, a UTF-8 encoded binary, or a mix of the two, representing an <url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url> compliant URI (<em>percent-encoded form</em>). A URI is a sequence of characters from a very limited set: the letters of @@ -134,10 +148,11 @@ <fsummary>Compose urlencoded query string.</fsummary> <desc> <p>Composes a form-urlencoded <c><anno>QueryString</anno></c> based on a - <c><anno>QueryList</anno></c>, a list of unescaped key-value pairs. - Media type <c>application/x-www-form-urlencoded</c> is defined in section + <c><anno>QueryList</anno></c>, a list of non-percent-encoded key-value pairs. + Form-urlencoding is defined in section 8.2.1 of <url href="https://www.ietf.org/rfc/rfc1866.txt">RFC 1866</url> - (HTML 2.0). Reserved and unsafe characters, as + (HTML 2.0) for media type <c>application/x-www-form-urlencoded</c>. + Reserved and unsafe characters, as defined by <url href="https://www.ietf.org/rfc/rfc1738.txt">RFC 1738</url> (Uniform Resource Locators), are percent-encoded.</p> <p>See also the opposite operation <seealso marker="#dissect_query/1"> @@ -145,9 +160,8 @@ </p> <p><em>Example:</em></p> <pre> -1> <input>uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],</input> -1> [{separator, semicolon}]). -"foo+bar=1;city=%C3%B6rebro" +1> <input>uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).</input> +<![CDATA["foo+bar=1&city=%C3%B6rebro"]]> 2> <![CDATA[uri_string:compose_query([{<<"foo bar">>,<<"1">>}, 2> {<<"city">>,<<"örebro"/utf8>>}]).]]> <![CDATA[<<"foo+bar=1&city=%C3%B6rebro">>]]> @@ -169,7 +183,10 @@ <pre> 1> <input>uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],</input> 1> [{separator, amp}]). -<![CDATA["foo+bar=1&city=%C3%B6rebro"]]> +<![CDATA["foo+bar=1&city=%C3%B6rebro" +2> uri_string:compose_query([{<<"foo bar">>,<<"1">>}, +2> {<<"city">>,<<"örebro"/utf8>>}], [{separator, escaped_amp}]).]]> +<![CDATA[<<"foo+bar=1&city=%C3%B6rebro">>]]> </pre> </desc> </func> @@ -179,12 +196,15 @@ <fsummary>Dissect query string.</fsummary> <desc> <p>Dissects an urlencoded <c><anno>QueryString</anno></c> and returns a - <c><anno>QueryList</anno></c>, a list of unescaped key-value pairs. - Media type <c>application/x-www-form-urlencoded</c> is defined in section + <c><anno>QueryList</anno></c>, a list of non-percent-encoded key-value pairs. + Form-urlencoding is defined in section 8.2.1 of <url href="https://www.ietf.org/rfc/rfc1866.txt">RFC 1866</url> - (HTML 2.0). Percent-encoded segments are decoded - as defined by <url href="https://www.ietf.org/rfc/rfc1738.txt">RFC 1738</url> + (HTML 2.0) for media type <c>application/x-www-form-urlencoded</c>. + Percent-encoded segments are decoded as defined by + <url href="https://www.ietf.org/rfc/rfc1738.txt">RFC 1738</url> (Uniform Resource Locators).</p> + <p>Supported separator types: <c>amp</c> (<![CDATA[&]]>), <c>escaped_amp</c> + (<![CDATA[&]]>) and <c>semicolon</c> (;).</p> <p>See also the opposite operation <seealso marker="#compose_query/1"> <c>compose_query/1</c></seealso>. </p> @@ -192,18 +212,42 @@ <pre> 1> <input>uri_string:dissect_query("foo+bar=1;city=%C3%B6rebro").</input> [{"foo bar","1"},{"city","örebro"}] -2> <![CDATA[uri_string:dissect_query(<<"foo+bar=1;city=%C3%B6rebro">>).]]> +2> <![CDATA[uri_string:dissect_query(<<"foo+bar=1&city=%C3%B6rebro">>).]]> <![CDATA[[{<<"foo bar">>,<<"1">>},{<<"city">>,<<"örebro"/utf8>>}] ]]> </pre> </desc> </func> <func> + <name name="normalize" arity="1"/> + <fsummary>Syntax-based normalization.</fsummary> + <desc> + <p>Transforms <c><anno>URIString</anno></c> into a normalized form + using Syntax-Based Normalization as defined by + <url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url>.</p> + <p>This function implements case normalization, percent-encoding + normalization, path segment normalization and scheme based normalization + for HTTP(S) with basic support for FTP, SSH, SFTP and TFTP.</p> + <p><em>Example:</em></p> + <pre> +1> <input>uri_string:normalize("/a/b/c/./../../g").</input> +"/a/g" +2> <![CDATA[uri_string:normalize(<<"mid/content=5/../6">>).]]> +<![CDATA[<<"mid/6">>]]> +3> uri_string:normalize("http://localhost:80"). +"https://localhost/" + </pre> + </desc> + </func> + + <func> <name name="parse" arity="1"/> <fsummary>Parse URI into a map.</fsummary> <desc> - <p>Returns a <c>URIMap</c>, that is a <em>uri_map()</em> with the parsed components - of the <c><anno>URIString</anno></c>. If parsing fails, an error tuple is returned.</p> + <p>Parses an <url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url> + compliant <c>uri_string()</c> into a <c>uri_map()</c>, that holds the parsed + components of the <c>URI</c>. + If parsing fails, an error tuple is returned.</p> <p>See also the opposite operation <seealso marker="#recompose/1"> <c>recompose/1</c></seealso>.</p> <p><em>Example:</em></p> @@ -224,8 +268,9 @@ <name name="recompose" arity="1"/> <fsummary>Recompose URI.</fsummary> <desc> - <p>Returns an <url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url> compliant - <c><anno>URIString</anno></c> (percent-encoded). + <p>Creates an <url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url> compliant + <c><anno>URIString</anno></c> (percent-encoded), based on the components of + <c><anno>URIMap</anno></c>. If the <c><anno>URIMap</anno></c> is invalid, an error tuple is returned.</p> <p>See also the opposite operation <seealso marker="#parse/1"> <c>parse/1</c></seealso>.</p> @@ -249,13 +294,19 @@ <p>Transcodes an <url href="https://www.ietf.org/rfc/rfc3986.txt">RFC 3986</url> compliant <c><anno>URIString</anno></c>, where <c><anno>Options</anno></c> is a list of tagged tuples, specifying the inbound - (<c>in_encoding</c>) and outbound (<c>out_encoding</c>) encodings. + (<c>in_encoding</c>) and outbound (<c>out_encoding</c>) encodings. <c>in_encoding</c> + and <c>out_encoding</c> specifies both binary encoding and percent-encoding for the + input and output data. Mixed encoding, where binary encoding is not the same as + percent-encoding, is not supported. If an argument is invalid, an error tuple is returned.</p> <p><em>Example:</em></p> <pre> 1> <input><![CDATA[uri_string:transcode(<<"foo%00%00%00%F6bar"/utf32>>,]]></input> 1> [{in_encoding, utf32},{out_encoding, utf8}]). <![CDATA[<<"foo%C3%B6bar"/utf8>>]]> +2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1}, +2> {out_encoding, utf8}]). +"foo%C3%B6bar" </pre> </desc> </func> |