From f7d3033dfeeb012841729bf8ed3889da8457b4f7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?= Date: Mon, 30 Oct 2017 13:38:28 +0100 Subject: stdlib: Update documentation (normalize/1) --- lib/stdlib/doc/src/uri_string.xml | 115 +++++++++++++++++++++++++++----------- 1 file changed, 83 insertions(+), 32 deletions(-) (limited to 'lib/stdlib') diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml index 8322eecb24..55d8690b98 100644 --- a/lib/stdlib/doc/src/uri_string.xml +++ b/lib/stdlib/doc/src/uri_string.xml @@ -69,6 +69,9 @@ Changing inbound binary and percent-encoding of URIs

transcode/2
+ Transforming URIs into a normalized form

+ normalize/1 +
Composing form-urlencoded query strings from a list of key-value pairs

compose_query/1

compose_query/2 @@ -84,12 +87,21 @@ Outbound binary encoding in binaries Outbound percent-encoding in lists and binaries +

Functions with uri_string() argument accept lists, binaries and + mixed lists (lists with binary elements) as input type. All of the functions but + transcode/2 expects input as lists of unicode codepoints, UTF-8 encoded binaries + and UTF-8 percent-encoded URI parts ("%C3%B6" corresponds to the unicode character "ö").

Unless otherwise specified the return value type and encoding are the same as the input type and encoding. That is, binary input returns binary output, list input returns a list output but mixed input returns list output.

-

All of the functions but transcode/2 expects input as unicode codepoints in - lists, UTF-8 encoding in binaries and UTF-8 encoding in percent-encoded URI parts. - transcode/2 provides the means to convert between the supported URI encodings.

+

In case of lists there is only percent-encoding. In binaries, however, both binary encoding + and percent-encoding shall be considered. transcode/2 provides the means to convert + between the supported encodings, it takes a uri_string() and a list of options + specifying inbound and outbound encodings.

+

RFC 3986 does not mandate any specific + character encoding and it is usually defined by the protocol or surrounding text. This library + takes the same assumption, binary and percent-encoding are handled as one configuration unit, + they cannot be set to different values.

@@ -97,28 +109,30 @@

Error tuple indicating the type of error. Possible values of the second component:

- - invalid_character - invalid_input - invalid_map - invalid_percent_encoding - invalid_scheme - invalid_uri - invalid_utf8 - missing_value - + + invalid_character + invalid_input + invalid_map + invalid_percent_encoding + invalid_scheme + invalid_uri + invalid_utf8 + missing_value + +

The third component is a list or binary providing additional information about the + cause of the error.

-

URI map holding the main components of a URI.

+

Map holding the main components of a URI.

-

List of unicode codepoints, UTF-8 encoded binary, or a mix of the two, +

List of unicode codepoints, a UTF-8 encoded binary, or a mix of the two, representing an RFC 3986 compliant URI (percent-encoded form). A URI is a sequence of characters from a very limited set: the letters of @@ -134,10 +148,11 @@ Compose urlencoded query string.

Composes a form-urlencoded QueryString based on a - QueryList, a list of unescaped key-value pairs. - Media type application/x-www-form-urlencoded is defined in section + QueryList, a list of non-percent-encoded key-value pairs. + Form-urlencoding is defined in section 8.2.1 of RFC 1866 - (HTML 2.0). Reserved and unsafe characters, as + (HTML 2.0) for media type application/x-www-form-urlencoded. + Reserved and unsafe characters, as defined by RFC 1738 (Uniform Resource Locators), are percent-encoded.

See also the opposite operation @@ -145,9 +160,8 @@

Example:

-1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
-1> [{separator, semicolon}]).
-"foo+bar=1;city=%C3%B6rebro"
+1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
+
 2> >,<<"1">>},
 2> {<<"city">>,<<"örebro"/utf8>>}]).]]>
 >]]>
@@ -169,7 +183,10 @@
         
 1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
 1> [{separator, amp}]).
-
+ uri_string:compose_query([{<<"foo bar">>,<<"1">>},
+2> {<<"city">>,<<"örebro"/utf8>>}], [{separator, escaped_amp}]).]]>
+>]]>
 	
@@ -179,12 +196,15 @@ Dissect query string.

Dissects an urlencoded QueryString and returns a - QueryList, a list of unescaped key-value pairs. - Media type application/x-www-form-urlencoded is defined in section + QueryList, a list of non-percent-encoded key-value pairs. + Form-urlencoding is defined in section 8.2.1 of RFC 1866 - (HTML 2.0). Percent-encoded segments are decoded - as defined by RFC 1738 + (HTML 2.0) for media type application/x-www-form-urlencoded. + Percent-encoded segments are decoded as defined by + RFC 1738 (Uniform Resource Locators).

+

Supported separator types: amp (), escaped_amp + () and semicolon (;).

See also the opposite operation compose_query/1.

@@ -192,18 +212,42 @@
 1> uri_string:dissect_query("foo+bar=1;city=%C3%B6rebro").
 [{"foo bar","1"},{"city","örebro"}]
-2> >).]]>
+2> >).]]>
 >,<<"1">>},{<<"city">>,<<"örebro"/utf8>>}] ]]>
 	
+ + + Syntax-based normalization. + +

Transforms URIString into a normalized form + using Syntax-Based Normalization as defined by + RFC 3986.

+

This function implements case normalization, percent-encoding + normalization, path segment normalization and scheme based normalization + for HTTP(S) with basic support for FTP, SSH, SFTP and TFTP.

+

Example:

+
+1> uri_string:normalize("/a/b/c/./../../g").
+"/a/g"
+2> >).]]>
+>]]>
+3> uri_string:normalize("http://localhost:80").
+"https://localhost/"
+	
+
+
+ Parse URI into a map. -

Returns a URIMap, that is a uri_map() with the parsed components - of the URIString. If parsing fails, an error tuple is returned.

+

Parses an RFC 3986 + compliant uri_string() into a uri_map(), that holds the parsed + components of the URI. + If parsing fails, an error tuple is returned.

See also the opposite operation recompose/1.

Example:

@@ -224,8 +268,9 @@ Recompose URI. -

Returns an RFC 3986 compliant - URIString (percent-encoded). +

Creates an RFC 3986 compliant + URIString (percent-encoded), based on the components of + URIMap. If the URIMap is invalid, an error tuple is returned.

See also the opposite operation parse/1.

@@ -249,13 +294,19 @@

Transcodes an RFC 3986 compliant URIString, where Options is a list of tagged tuples, specifying the inbound - (in_encoding) and outbound (out_encoding) encodings. + (in_encoding) and outbound (out_encoding) encodings. in_encoding + and out_encoding specifies both binary encoding and percent-encoding for the + input and output data. Mixed encoding, where binary encoding is not the same as + percent-encoding, is not supported. If an argument is invalid, an error tuple is returned.

Example:

 1> >,]]>
 1> [{in_encoding, utf32},{out_encoding, utf8}]).
 >]]>
+2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
+2> {out_encoding, utf8}]).
+"foo%C3%B6bar"
 	
-- cgit v1.2.3