From 80feeb36f92a923f57f740c7c28c12bb8b69ec16 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?= This module contains functions for parsing and handling RFC 3986 compliant URIs. A URI is an identifier consisting of a sequence of characters matching the syntax
+ rule named URI in RFC 3986. The generic URI syntax consists of a hierarchical sequence of components referred
+ to as the scheme, authority, path, query, and fragment:
+ URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
+ hier-part = "//" authority path-abempty
+ / path-absolute
+ / path-rootless
+ / path-empty
+ scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
+ authority = [ userinfo "@" ] host [ ":" port ]
+ userinfo = *( unreserved / pct-encoded / sub-delims / ":" )
+
+ reserved = gen-delims / sub-delims
+ gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
+ sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
+ / "*" / "+" / "," / ";" / "="
+
+ unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
+
+
The interpretation of a URI depends only on the characters used and not on how those + characters are represented in a network protocol.
+The functions implemented by this module covers the following use cases:
+
+
+
+
+
+
+
+
+
+
There are four different encodings present during the handling of URIs:
+
+
+
Unless otherwise specified the return value type and encoding are the same as the input
+ type and encoding. That is, binary input returns binary output, list input returns a list
+ output but mixed input returns list output. Input and output encodings are the same except
+ for
All of the functions but
Maybe improper list of bytes (0..255).
+URI map holding the main components of a URI.
+List of unicode codepoints, UTF-8 encoded binary, or a mix of the two, + representing an RFC 3986 compliant URI (percent-encoded form). + A URI is a sequence of characters from a very limited set: the letters of + the basic Latin alphabet, digits, and a few special characters.
+Composes an urlencoded
If an argument is invalid, a
Example:
++1> uri_string:compose_query(...). ++
Creates an RFC 3986 compliant
If an argument is invalid, a
Example:
++1> uri_string:create_uri_reference(...,...). ++
Dissects an urlencoded
If an argument is invalid, a
Example:
++1> uri_string:dissect_query(...). ++
Normalizes an RFC 3986 compliant
If an argument is invalid, a
Example:
++1> uri_string:normalize("http://example.org/one/two/../../one"). +"http://example.org/one" ++
Returns a
If parsing fails, a
Example:
++1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose"). +#{fragment => "nose",host => "example.com", + path => "/over/there",port => 8042,query => "name=ferret", + scheme => foo,userinfo => "user"} +2>+
Returns an RFC 3986 compliant
If the
Example:
++1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there", +port => 8042, query => "name=ferret", scheme => foo, userinfo => "user"}. +#{fragment => "top",host => "example.com", + path => "/over/there",port => 8042,query => "?name=ferret", + scheme => foo,userinfo => "user"} + +2> uri_string:recompose(URIMap, []). +"foo://example.com:8042/over/there?name=ferret#nose"+
Resolves an RFC 3986 compliant
If an argument is invalid, a
Example:
++1> uri_string:resolve_uri_reference(...,...). ++
Transcodes an RFC 3986 compliant
If an argument is invalid, a
Example:
++1> uri_string:transcode(<<"foo://f%20oo">>, [{in_encoding, utf8}, +{out_encoding, utf16}]). +<<0,102,0,111,0,111,0,58,0,47,0,47,0,102,0,37,0,48,0,48,0,37,0,50,0,48,0, + 111,0,111>> ++
Maybe improper list of bytes (0..255).
-A URI is an identifier consisting of a sequence of characters matching the syntax rule named URI in RFC 3986.
The generic URI syntax consists of a hierarchical sequence of components referred - to as the scheme, authority, path, query, and fragment:
+ to as the scheme, authority, path, query, and fragment: +URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] hier-part = "//" authority path-abempty / path-absolute @@ -51,35 +52,26 @@ unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
-The interpretation of a URI depends only on the characters used and not on how those characters are represented in a network protocol.
-The functions implemented by this module covers the following use cases: +
The functions implemented by this module covers the following use cases:
- -
- Parsing URIs
parse/1 - Recomposing URIs
-
recompose/2 - Resolving URI references
-
-resolve_uri_reference/3 - Creating URI references
-
-create_uri_reference/3 - Normalizing URIs
-normalize/1 - Transcoding URIs
-
transcode/2 - Working with urlencoded query strings
+
-compose_query/1, dissect_query/1 - Working with form-urlencoded query strings
+compose_query/[1,2], dissect_query/1 There are four different encodings present during the handling of URIs: +
There are four different encodings present during the handling of URIs:
-
- Inbound binary encoding in binaries
- Inbound percent-encoding in lists and binaries
- Outbound binary encoding in binaries
- Outbound percent-encoding in lists and binaries
Unless otherwise specified the return value type and encoding are the same as the input type and encoding. That is, binary input returns binary output, list input returns a list output but mixed input returns list output. Input and output encodings are the same except @@ -113,31 +105,34 @@
Compose urlencoded query string. - Composes an urlencoded
based on a + QueryString Composes a form-urlencoded
-based on a QueryString , a list of unescaped key-value pairs. Media type QueryList application/x-www-form-urlencoded is defined in section - 8.2.1 ofRFC 1866 (HTML 2.0). + 8.2.1 ofRFC 1866 (HTML 2.0). Reserved and unsafe characters, as + defined by RFC 1738 (Uniform Resource Locators), are procent-encoded.If an argument is invalid, a
badarg exception is raised.Example:
-1> uri_string:compose_query(...). -+1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]). + +
Creates an RFC 3986 compliant
If an argument is invalid, a
Same as
Example:
-1> uri_string:create_uri_reference(...,...). -+1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}], +2> [{separator, semicolon}]). +"foo+bar=1;city=%C3%B6rebro" +
Dissects an urlencoded
If an argument is invalid, a
Example:
-1> uri_string:dissect_query(...). -- - - -
Normalizes an RFC 3986 compliant
If an argument is invalid, a
Example:
--1> uri_string:normalize("http://example.org/one/two/../../one"). -"http://example.org/one" -+1> uri_string:dissect_query("foo+bar=1;city=%C3%B6rebro"). +[{"foo bar","1"},{"city","örebro"}] +
Returns a
If parsing fails, a
If parsing fails, an error tuple is returned.
Example:
1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose"). #{fragment => "nose",host => "example.com", path => "/over/there",port => 8042,query => "name=ferret", scheme => foo,userinfo => "user"} -2>+
Returns an RFC 3986 compliant
If the
If the
Example:
1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there", -port => 8042, query => "name=ferret", scheme => foo, userinfo => "user"}. +port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}. #{fragment => "top",host => "example.com", path => "/over/there",port => 8042,query => "?name=ferret", scheme => foo,userinfo => "user"} -2> uri_string:recompose(URIMap, []). +2> uri_string:recompose(URIMap). "foo://example.com:8042/over/there?name=ferret#nose"
Resolves an RFC 3986 compliant
If an argument is invalid, a
Example:
--1> uri_string:resolve_uri_reference(...,...). --
Transcodes an RFC 3986 compliant
If an argument is invalid, a
If an argument is invalid, an error tuple is returned.
Example:
-1> uri_string:transcode(<<"foo://f%20oo">>, [{in_encoding, utf8}, -{out_encoding, utf16}]). -<<0,102,0,111,0,111,0,58,0,47,0,47,0,102,0,37,0,48,0,48,0,37,0,50,0,48,0, - 111,0,111>> -+1> >,]]> +2> [{in_encoding, utf32},{out_encoding, utf8}]). +>]]> +
This module contains functions for parsing and handling RFC 3986 compliant URIs.
+This module contains functions for parsing and handling URIs (RFC 3986) and + form-urlencoded query strings (RFC 1866).
A URI is an identifier consisting of a sequence of characters matching the syntax rule named URI in RFC 3986.
The generic URI syntax consists of a hierarchical sequence of components referred
@@ -109,7 +110,7 @@
Example:
@@ -125,8 +126,7 @@@@ -143,13 +181,19 @@ Same as
+ between key-value pairs. There are three supported separator types:compose_query/1 but with an additionalparameter, that controls the type of separator used - between key-value pairs. There are two supported separator types: Options amp () - andsemicolon (;).amp (),escaped_amp () andsemicolon (;). If the parameteris empty, separator takes the default value ( Options escaped_amp ).Example:
1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}], -- cgit v1.2.3 From 642bb27f8104991445a1f507f6b065d3cd7cd1ae Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?=Date: Tue, 24 Oct 2017 09:17:55 +0200 Subject: stdlib: Fix title in uri_string.xml --- lib/stdlib/doc/src/uri_string.xml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'lib/stdlib/doc/src') diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml index 97b38ea93e..d67c687fd1 100644 --- a/lib/stdlib/doc/src/uri_string.xml +++ b/lib/stdlib/doc/src/uri_string.xml @@ -21,10 +21,10 @@ limitations under the License. - maps +uri_string Péter Dimitrov 1 -2017-10-20 +2017-10-24 A uri_string -- cgit v1.2.3 From b0c682a8118c5775da784e9a0f569ee995319f80 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?=Date: Thu, 26 Oct 2017 11:29:48 +0200 Subject: stdlib: Update documentation, error tuples --- lib/stdlib/doc/src/uri_string.xml | 117 +++++++++++++++++++++++++++----------- 1 file changed, 85 insertions(+), 32 deletions(-) (limited to 'lib/stdlib/doc/src') diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml index d67c687fd1..8322eecb24 100644 --- a/lib/stdlib/doc/src/uri_string.xml +++ b/lib/stdlib/doc/src/uri_string.xml @@ -30,10 +30,13 @@ uri_string URI processing functions. - This module contains functions for parsing and handling URIs (RFC 3986) and - form-urlencoded query strings (RFC 1866).
+This module contains functions for parsing and handling URIs + (
RFC 3986 ) and + form-urlencoded query strings (RFC 1866 ). +A URI is an identifier consisting of a sequence of characters matching the syntax - rule named URI in RFC 3986.
+ rule named URI inRFC 3986 . +The generic URI syntax consists of a hierarchical sequence of components referred to as the scheme, authority, path, query, and fragment:
@@ -55,16 +58,24 @@
The interpretation of a URI depends only on the characters used and not on how those characters are represented in a network protocol.
-The functions implemented by this module covers the following use cases:
+The functions implemented by this module cover the following use cases:
-
- Parsing URIs
-
-parse/1 - Recomposing URIs
-
-recompose/2 - Transcoding URIs
-
-transcode/2 - Working with form-urlencoded query strings
+
-compose_query/[1,2], dissect_query/1 - Parsing URIs into its components and returing a map
+
++ parse/1 - Recomposing a map of URI components into a URI string
+
++ recompose/1 - Changing inbound binary and percent-encoding of URIs
+
++ transcode/2 - Composing form-urlencoded query strings from a list of key-value pairs
+
+compose_query/1
++ compose_query/2 - Dissecting form-urlencoded query strings into a list of key-value pairs
++ dissect_query/1 There are four different encodings present during the handling of URIs:
@@ -75,14 +86,29 @@
Unless otherwise specified the return value type and encoding are the same as the input type and encoding. That is, binary input returns binary output, list input returns a list - output but mixed input returns list output. Input and output encodings are the same except - for
+ output but mixed input returns list output.transcode/2 .All of the functions but
transcode/2 expects input as unicode codepoints in lists, UTF-8 encoding in binaries and UTF-8 encoding in percent-encoded URI parts.transcode/2 provides the means to convert between the supported URI encodings.+ + + + +Error tuple indicating the type of error. Possible values of the second component:
++
+- +
invalid_character - +
invalid_input - +
invalid_map - +
invalid_percent_encoding - +
invalid_scheme - +
invalid_uri - +
invalid_utf8 - +
missing_value @@ -93,7 +119,8 @@ @@ -127,11 +162,14 @@@@ -109,13 +136,21 @@ List of unicode codepoints, UTF-8 encoded binary, or a mix of the two, - representing an RFC 3986 compliant URI (percent-encoded form). + representing an
RFC 3986 + compliant URI (percent-encoded form). A URI is a sequence of characters from a very limited set: the letters of the basic Latin alphabet, digits, and a few special characters.Composes a form-urlencoded
+based on a QueryString , a list of unescaped key-value pairs. Media type QueryList application/x-www-form-urlencoded is defined in section - 8.2.1 ofRFC 1866 (HTML 2.0). Reserved and unsafe characters, as - defined by RFC 1738 (Uniform Resource Locators), are percent-encoded. + 8.2.1 ofRFC 1866 + (HTML 2.0). Reserved and unsafe characters, as + defined byRFC 1738 + (Uniform Resource Locators), are percent-encoded.See also the opposite operation
+ .dissect_query/1 Example:
-1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]). - +1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}], +1> [{separator, semicolon}]). +"foo+bar=1;city=%C3%B6rebro" +2> >,<<"1">>}, +2> {<<"city">>,<<"örebro"/utf8>>}]).]]> +>]]>Same as
+compose_query/1 but with an additionalparameter, that controls the type of separator used between key-value pairs. There are three supported separator types: Options amp (),escaped_amp () andsemicolon (;). If the parameteris empty, separator takes the default value ( Options escaped_amp ).See also the opposite operation
+ . +dissect_query/1 Example:
1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}], -2> [{separator, semicolon}]). -"foo+bar=1;city=%C3%B6rebro" +1> [{separator, amp}]). +Dissects an urlencoded
+and returns a QueryString , a list of unescaped key-value pairs. Media type QueryList application/x-www-form-urlencoded is defined in section - 8.2.1 ofRFC 1866 (HTML 2.0). Percent-encoded segments are decoded - as defined by RFC 1738 (Uniform Resource Locators). + 8.2.1 ofRFC 1866 + (HTML 2.0). Percent-encoded segments are decoded + as defined byRFC 1738 + (Uniform Resource Locators).See also the opposite operation
+ .compose_query/1 Example:
1> uri_string:dissect_query("foo+bar=1;city=%C3%B6rebro"). [{"foo bar","1"},{"city","örebro"}] +2> >).]]> +>,<<"1">>},{<<"city">>,<<"örebro"/utf8>>}] ]]>@@ -159,14 +203,19 @@Parse URI into a map. @@ -175,12 +224,15 @@ Returns a
-URIMap , that is a uri_map() with the parsed components - of the. URIString If parsing fails, an error tuple is returned.
+ of the. If parsing fails, an error tuple is returned. + URIString See also the opposite operation
+ .recompose/1 Example:
1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose"). #{fragment => "nose",host => "example.com", path => "/over/there",port => 8042,query => "name=ferret", scheme => foo,userinfo => "user"} +2> >).]]> + <<"example.com">>,path => <<"/over/there">>, + port => 8042,query => <<"name=ferret">>,scheme => <<"foo">>, + userinfo => <<"user">>}]]>Recompose URI. - Returns an RFC 3986 compliant
-(percent-encoded). URIString If the
+is invalid, an error tuple is returned. URIMap Returns an
+RFC 3986 compliant +(percent-encoded). + If the URIString is invalid, an error tuple is returned. URIMap See also the opposite operation
+ .parse/1 Example:
1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there", -port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}. +1> port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}. #{fragment => "top",host => "example.com", path => "/over/there",port => 8042,query => "?name=ferret", scheme => foo,userinfo => "user"} @@ -194,14 +246,15 @@ port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.Transcode URI. - -- cgit v1.2.3 From f7d3033dfeeb012841729bf8ed3889da8457b4f7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?=Transcodes an RFC 3986 compliant
, + URIString Transcodes an
-RFC 3986 + compliant, where URIString is a list of tagged tuples, specifying the inbound - ( Options in_encoding ) and outbound (out_encoding ) encodings.If an argument is invalid, an error tuple is returned.
+ (in_encoding ) and outbound (out_encoding ) encodings. + If an argument is invalid, an error tuple is returned.Example:
1> >,]]> -2> [{in_encoding, utf32},{out_encoding, utf8}]). +1> [{in_encoding, utf32},{out_encoding, utf8}]). >]]>Date: Mon, 30 Oct 2017 13:38:28 +0100 Subject: stdlib: Update documentation (normalize/1) --- lib/stdlib/doc/src/uri_string.xml | 115 +++++++++++++++++++++++++++----------- 1 file changed, 83 insertions(+), 32 deletions(-) (limited to 'lib/stdlib/doc/src') diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml index 8322eecb24..55d8690b98 100644 --- a/lib/stdlib/doc/src/uri_string.xml +++ b/lib/stdlib/doc/src/uri_string.xml @@ -69,6 +69,9 @@ - Changing inbound binary and percent-encoding of URIs
+
transcode/2 - Transforming URIs into a normalized form
++ normalize/1 - Composing form-urlencoded query strings from a list of key-value pairs
compose_query/1
@@ -84,12 +87,21 @@ compose_query/2 - Outbound binary encoding in binaries
- Outbound percent-encoding in lists and binaries
+Functions with
uri_string() argument accept lists, binaries and + mixed lists (lists with binary elements) as input type. All of the functions but +transcode/2 expects input as lists of unicode codepoints, UTF-8 encoded binaries + and UTF-8 percent-encoded URI parts ("%C3%B6" corresponds to the unicode character "ö").Unless otherwise specified the return value type and encoding are the same as the input type and encoding. That is, binary input returns binary output, list input returns a list output but mixed input returns list output.
-All of the functions but
+transcode/2 expects input as unicode codepoints in - lists, UTF-8 encoding in binaries and UTF-8 encoding in percent-encoded URI parts. -transcode/2 provides the means to convert between the supported URI encodings.In case of lists there is only percent-encoding. In binaries, however, both binary encoding + and percent-encoding shall be considered.
+transcode/2 provides the means to convert + between the supported encodings, it takes auri_string() and a list of options + specifying inbound and outbound encodings.
RFC 3986 does not mandate any specific + character encoding and it is usually defined by the protocol or surrounding text. This library + takes the same assumption, binary and percent-encoding are handled as one configuration unit, + they cannot be set to different values.
Error tuple indicating the type of error. Possible values of the second component:
-The third component is a list or binary providing additional information about the + cause of the error.
URI map holding the main components of a URI.
+Map holding the main components of a URI.
List of unicode codepoints, UTF-8 encoded binary, or a mix of the two, +
List of unicode codepoints, a UTF-8 encoded binary, or a mix of the two,
representing an Composes a form-urlencoded See also the opposite operation Example: Dissects an urlencoded Supported separator types: See also the opposite operation Transforms This function implements case normalization, percent-encoding
+ normalization, path segment normalization and scheme based normalization
+ for HTTP(S) with basic support for FTP, SSH, SFTP and TFTP. Example: Returns a Parses an See also the opposite operation Example: Returns an Creates an See also the opposite operation Transcodes an Example: The third component is a list or binary providing additional information about the
+ The third component is a term providing additional information about the
cause of the error.
-1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
-1> [{separator, semicolon}]).
-"foo+bar=1;city=%C3%B6rebro"
+1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
+
2> >,<<"1">>},
2> {<<"city">>,<<"örebro"/utf8>>}]).]]>
>]]>
@@ -169,7 +183,10 @@
1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
1> [{separator, amp}]).
-
+ uri_string:compose_query([{<<"foo bar">>,<<"1">>},
+2> {<<"city">>,<<"örebro"/utf8>>}], [{separator, escaped_amp}]).]]>
+>]]>
1> uri_string:dissect_query("foo+bar=1;city=%C3%B6rebro").
[{"foo bar","1"},{"city","örebro"}]
-2> >).]]>
+2> >).]]>
>,<<"1">>},{<<"city">>,<<"örebro"/utf8>>}] ]]>
+1> uri_string:normalize("/a/b/c/./../../g").
+"/a/g"
+2> >).]]>
+>]]>
+3> uri_string:normalize("http://localhost:80").
+"https://localhost/"
+
+
1> >,]]>
1> [{in_encoding, utf32},{out_encoding, utf8}]).
>]]>
+2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
+2> {out_encoding, utf8}]).
+"foo%C3%B6bar"
This module contains functions for parsing and handling URIs
- (
A URI is an identifier consisting of a sequence of characters matching the syntax
rule named URI in
-
-
-
There are four different encodings present during the handling of URIs:
Error tuple indicating the type of error. Possible values of the second component:
The third component is a term providing additional information about the cause of the error.
@@ -143,81 +133,6 @@Composes a form-urlencoded
See also the opposite operation
Example:
--1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]). - -2> >,<<"1">>}, -2> {<<"city">>,<<"örebro"/utf8>>}]).]]> ->]]> --
Same as
See also the opposite operation
Example:
--1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}], -1> [{separator, amp}]). - uri_string:compose_query([{<<"foo bar">>,<<"1">>}, -2> {<<"city">>,<<"örebro"/utf8>>}], [{separator, escaped_amp}]).]]> ->]]> --
Dissects an urlencoded
Supported separator types:
See also the opposite operation
Example:
--1> uri_string:dissect_query("foo+bar=1;city=%C3%B6rebro"). -[{"foo bar","1"},{"city","örebro"}] -2> >).]]> ->,<<"1">>},{<<"city">>,<<"örebro"/utf8>>}] ]]> --