From 80feeb36f92a923f57f740c7c28c12bb8b69ec16 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?= Date: Fri, 28 Jul 2017 11:04:19 +0200 Subject: stdlib: Add API and doc of uri_string module --- lib/stdlib/doc/src/Makefile | 1 + lib/stdlib/doc/src/ref_man.xml | 1 + lib/stdlib/doc/src/specs.xml | 1 + lib/stdlib/doc/src/uri_string.xml | 255 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 258 insertions(+) create mode 100644 lib/stdlib/doc/src/uri_string.xml (limited to 'lib/stdlib/doc/src') diff --git a/lib/stdlib/doc/src/Makefile b/lib/stdlib/doc/src/Makefile index 93eac8220d..aeed79408b 100644 --- a/lib/stdlib/doc/src/Makefile +++ b/lib/stdlib/doc/src/Makefile @@ -98,6 +98,7 @@ XML_REF3_FILES = \ sys.xml \ timer.xml \ unicode.xml \ + uri_string.xml \ win32reg.xml \ zip.xml diff --git a/lib/stdlib/doc/src/ref_man.xml b/lib/stdlib/doc/src/ref_man.xml index 878a3babc5..68bfddbc71 100644 --- a/lib/stdlib/doc/src/ref_man.xml +++ b/lib/stdlib/doc/src/ref_man.xml @@ -93,6 +93,7 @@ + diff --git a/lib/stdlib/doc/src/specs.xml b/lib/stdlib/doc/src/specs.xml index 45b207b13d..d559adf9b6 100644 --- a/lib/stdlib/doc/src/specs.xml +++ b/lib/stdlib/doc/src/specs.xml @@ -60,6 +60,7 @@ + diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml new file mode 100644 index 0000000000..e6b2bd5e80 --- /dev/null +++ b/lib/stdlib/doc/src/uri_string.xml @@ -0,0 +1,255 @@ + + + + +
+ + 20172017 + Ericsson AB. All Rights Reserved. + + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + + + maps + Péter Dimitrov + 1 + 2017-08-23 + A +
+ uri_string + RFC 3986 compliant URI processing functions. + +

This module contains functions for parsing and handling RFC 3986 compliant URIs.

+

A URI is an identifier consisting of a sequence of characters matching the syntax + rule named URI in RFC 3986.

+

The generic URI syntax consists of a hierarchical sequence of components referred + to as the scheme, authority, path, query, and fragment:

+    URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
+    hier-part   = "//" authority path-abempty
+                   / path-absolute
+                   / path-rootless
+                   / path-empty
+    scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
+    authority   = [ userinfo "@" ] host [ ":" port ]
+    userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
+
+    reserved    = gen-delims / sub-delims
+    gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
+    sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
+                / "*" / "+" / "," / ";" / "="
+
+    unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
+    


+

+

The interpretation of a URI depends only on the characters used and not on how those + characters are represented in a network protocol.

+

The functions implemented by this module covers the following use cases: + + Parsing URIs

+ parse/1
+ Recomposing URIs

+ recompose/2
+ Resolving URI references

+ resolve_uri_reference/3
+ Creating URI references

+ create_uri_reference/3
+ Normalizing URIs

+ normalize/1
+ Transcoding URIs

+ transcode/2
+ Working with urlencoded query strings

+ compose_query/1, dissect_query/1
+
+

+

There are four different encodings present during the handling of URIs: + + Inbound binary encoding in binaries + Inbound percent-encoding in lists and binaries + Outbound binary encoding in binaries + Outbound percent-encoding in lists and binaries + +

+

Unless otherwise specified the return value type and encoding are the same as the input + type and encoding. That is, binary input returns binary output, list input returns a list + output but mixed input returns list output. Input and output encodings are the same except + for transcode/2.

+

All of the functions but transcode/2 expects input as unicode codepoints in + lists, UTF-8 encoding in binaries and UTF-8 encoding in percent-encoded URI parts. + transcode/2 provides the means to convert between the supported URI encodings.

+
+ + + + + +

Maybe improper list of bytes (0..255).

+
+
+ + + +

URI map holding the main components of a URI.

+
+
+ + + +

List of unicode codepoints, UTF-8 encoded binary, or a mix of the two, + representing an RFC 3986 compliant URI (percent-encoded form). + A URI is a sequence of characters from a very limited set: the letters of + the basic Latin alphabet, digits, and a few special characters.

+
+
+
+ + + + + + Compose urlencoded query string. + +

Composes an urlencoded QueryString based on a + QueryList, a list of unescaped key-value pairs. + Media type application/x-www-form-urlencoded is defined in section + 8.2.1 of RFC 1866 (HTML 2.0). +

+

If an argument is invalid, a badarg exception is raised.

+

Example:

+
+1> uri_string:compose_query(...).
+
+
+
+ + + + Create references. + +

Creates an RFC 3986 compliant RelativeDestURI, + based AbsoluteSourceURI and AbsoluteSourceURI +

+

If an argument is invalid, a badarg exception is raised.

+

Example:

+
+1> uri_string:create_uri_reference(...,...).
+
+
+
+ + + + Dissect query string. + +

Dissects an urlencoded QueryString and returns a + QueryList, a list of unescaped key-value pairs. + Media type application/x-www-form-urlencoded is defined in section + 8.2.1 of RFC 1866 (HTML 2.0). +

+

If an argument is invalid, a badarg exception is raised.

+

Example:

+
+1> uri_string:dissect_query(...).
+
+
+
+ + + + Normalize URI. + +

Normalizes an RFC 3986 compliant URIString and returns + a NormalizedURI. The algorithm used to shorten the input + URI is called Syntax-Based Normalization and described at + Section 6.2.2 of RFC 3986. +

+

If an argument is invalid, a badarg exception is raised.

+

Example:

+
+1> uri_string:normalize("http://example.org/one/two/../../one").
+"http://example.org/one"
+
+
+
+ + + + Parse URI into a map. + +

Returns a URIMap, that is a uri_map() with the parsed components + of the URIString.

+

If parsing fails, a parse_error exception is raised.

+

Example:

+
+1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
+#{fragment => "nose",host => "example.com",
+  path => "/over/there",port => 8042,query => "name=ferret",
+  scheme => foo,userinfo => "user"}
+2> 
+
+
+ + + + Recompose URI. + +

Returns an RFC 3986 compliant URIString (percent-encoded).

+

If the URIMap is invalid, a badarg exception is raised.

+

Example:

+
+1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
+port => 8042, query => "name=ferret", scheme => foo, userinfo => "user"}.
+#{fragment => "top",host => "example.com",
+  path => "/over/there",port => 8042,query => "?name=ferret",
+  scheme => foo,userinfo => "user"}
+
+2> uri_string:recompose(URIMap, []).
+"foo://example.com:8042/over/there?name=ferret#nose"
+
+
+ + + + Resolve URI reference. + +

Resolves an RFC 3986 compliant RelativeURI, + based AbsoluteBaseURI and returns a new absolute URI + (AbsoluteDestURI).

+

If an argument is invalid, a badarg exception is raised.

+

Example:

+
+1> uri_string:resolve_uri_reference(...,...).
+
+
+
+ + + + Transcode URI. + +

Transcodes an RFC 3986 compliant URIString, + where Options is a list of tagged tuples, specifying the inbound + (in_encoding) and outbound (out_encoding) encodings.

+

If an argument is invalid, a badarg exception is raised.

+

Example:

+
+1> uri_string:transcode(<<"foo://f%20oo">>, [{in_encoding, utf8},
+{out_encoding, utf16}]).
+<<0,102,0,111,0,111,0,58,0,47,0,47,0,102,0,37,0,48,0,48,0,37,0,50,0,48,0,
+  111,0,111>>
+
+
+
+ +
+
-- cgit v1.2.3 From 29a9dd0e17a97a3e6e46f0d08c6ba8f31db33f5e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?= Date: Thu, 31 Aug 2017 15:39:45 +0200 Subject: stdlib: Implement uri_string:parse --- lib/stdlib/doc/src/uri_string.xml | 6 ------ 1 file changed, 6 deletions(-) (limited to 'lib/stdlib/doc/src') diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml index e6b2bd5e80..8283b8ca0e 100644 --- a/lib/stdlib/doc/src/uri_string.xml +++ b/lib/stdlib/doc/src/uri_string.xml @@ -90,12 +90,6 @@ - - - -

Maybe improper list of bytes (0..255).

-
-
-- cgit v1.2.3 From b439d19d38479d6264d906dd926a168c9c514da3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?= Date: Fri, 20 Oct 2017 16:32:42 +0200 Subject: stdlib: Update documentation (uri_string) --- lib/stdlib/doc/src/uri_string.xml | 114 +++++++++++++------------------------- 1 file changed, 38 insertions(+), 76 deletions(-) (limited to 'lib/stdlib/doc/src') diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml index 8283b8ca0e..496573ae2f 100644 --- a/lib/stdlib/doc/src/uri_string.xml +++ b/lib/stdlib/doc/src/uri_string.xml @@ -24,7 +24,7 @@ maps Péter Dimitrov 1 - 2017-08-23 + 2017-10-20 A uri_string @@ -34,7 +34,8 @@

A URI is an identifier consisting of a sequence of characters matching the syntax rule named URI in RFC 3986.

The generic URI syntax consists of a hierarchical sequence of components referred - to as the scheme, authority, path, query, and fragment:

+    to as the scheme, authority, path, query, and fragment:

+
     URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
     hier-part   = "//" authority path-abempty
                    / path-absolute
@@ -51,35 +52,26 @@
 
     unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
     


-

The interpretation of a URI depends only on the characters used and not on how those characters are represented in a network protocol.

-

The functions implemented by this module covers the following use cases: +

The functions implemented by this module covers the following use cases:

Parsing URIs

parse/1
Recomposing URIs

recompose/2
- Resolving URI references

- resolve_uri_reference/3
- Creating URI references

- create_uri_reference/3
- Normalizing URIs

- normalize/1
Transcoding URIs

transcode/2
- Working with urlencoded query strings

- compose_query/1, dissect_query/1
+ Working with form-urlencoded query strings

+ compose_query/[1,2], dissect_query/1
-

-

There are four different encodings present during the handling of URIs: +

There are four different encodings present during the handling of URIs:

Inbound binary encoding in binaries Inbound percent-encoding in lists and binaries Outbound binary encoding in binaries Outbound percent-encoding in lists and binaries -

Unless otherwise specified the return value type and encoding are the same as the input type and encoding. That is, binary input returns binary output, list input returns a list output but mixed input returns list output. Input and output encodings are the same except @@ -113,31 +105,34 @@ Compose urlencoded query string. -

Composes an urlencoded QueryString based on a +

Composes a form-urlencoded QueryString based on a QueryList, a list of unescaped key-value pairs. Media type application/x-www-form-urlencoded is defined in section - 8.2.1 of RFC 1866 (HTML 2.0). + 8.2.1 of RFC 1866 (HTML 2.0). Reserved and unsafe characters, as + defined by RFC 1738 (Uniform Resource Locators), are procent-encoded.

-

If an argument is invalid, a badarg exception is raised.

Example:

-1> uri_string:compose_query(...).
-
+1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]). + +
- - Create references. + + Compose urlencoded query string. -

Creates an RFC 3986 compliant RelativeDestURI, - based AbsoluteSourceURI and AbsoluteSourceURI -

-

If an argument is invalid, a badarg exception is raised.

+

Same as compose_query/1 but with an additional + Options parameter, that controls the type of separator used + between key-value pairs. There are two supported separator types: amp () + and semicolon (;).

Example:

-1> uri_string:create_uri_reference(...,...).
-
+1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}], +2> [{separator, semicolon}]). +"foo+bar=1;city=%C3%B6rebro" +
@@ -148,31 +143,14 @@

Dissects an urlencoded QueryString and returns a QueryList, a list of unescaped key-value pairs. Media type application/x-www-form-urlencoded is defined in section - 8.2.1 of RFC 1866 (HTML 2.0). + 8.2.1 of RFC 1866 (HTML 2.0). Percent-encoded segments are decoded + as defined by RFC 1738 (Uniform Resource Locators).

-

If an argument is invalid, a badarg exception is raised.

Example:

-1> uri_string:dissect_query(...).
-
- - - - - - Normalize URI. - -

Normalizes an RFC 3986 compliant URIString and returns - a NormalizedURI. The algorithm used to shorten the input - URI is called Syntax-Based Normalization and described at - Section 6.2.2 of RFC 3986. -

-

If an argument is invalid, a badarg exception is raised.

-

Example:

-
-1> uri_string:normalize("http://example.org/one/two/../../one").
-"http://example.org/one"
-
+1> uri_string:dissect_query("foo+bar=1;city=%C3%B6rebro"). +[{"foo bar","1"},{"city","örebro"}] +
@@ -182,14 +160,14 @@

Returns a URIMap, that is a uri_map() with the parsed components of the URIString.

-

If parsing fails, a parse_error exception is raised.

+

If parsing fails, an error tuple is returned.

Example:

 1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
 #{fragment => "nose",host => "example.com",
   path => "/over/there",port => 8042,query => "name=ferret",
   scheme => foo,userinfo => "user"}
-2> 
+
@@ -198,35 +176,20 @@ Recompose URI.

Returns an RFC 3986 compliant URIString (percent-encoded).

-

If the URIMap is invalid, a badarg exception is raised.

+

If the URIMap is invalid, an error tuple is returned.

Example:

 1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
-port => 8042, query => "name=ferret", scheme => foo, userinfo => "user"}.
+port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.
 #{fragment => "top",host => "example.com",
   path => "/over/there",port => 8042,query => "?name=ferret",
   scheme => foo,userinfo => "user"}
 
-2> uri_string:recompose(URIMap, []).
+2> uri_string:recompose(URIMap).
 "foo://example.com:8042/over/there?name=ferret#nose"
- - - Resolve URI reference. - -

Resolves an RFC 3986 compliant RelativeURI, - based AbsoluteBaseURI and returns a new absolute URI - (AbsoluteDestURI).

-

If an argument is invalid, a badarg exception is raised.

-

Example:

-
-1> uri_string:resolve_uri_reference(...,...).
-
-
-
- Transcode URI. @@ -234,14 +197,13 @@ port => 8042, query => "name=ferret", scheme => foo, userinfo => "user"}.

Transcodes an RFC 3986 compliant URIString, where Options is a list of tagged tuples, specifying the inbound (in_encoding) and outbound (out_encoding) encodings.

-

If an argument is invalid, a badarg exception is raised.

+

If an argument is invalid, an error tuple is returned.

Example:

-1> uri_string:transcode(<<"foo://f%20oo">>, [{in_encoding, utf8},
-{out_encoding, utf16}]).
-<<0,102,0,111,0,111,0,58,0,47,0,47,0,102,0,37,0,48,0,48,0,37,0,50,0,48,0,
-  111,0,111>>
-
+1> >,]]> +2> [{in_encoding, utf32},{out_encoding, utf8}]). +>]]> +
-- cgit v1.2.3 From da11b15aef87f392a807b4756bf285160e15a194 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?= Date: Mon, 23 Oct 2017 12:02:16 +0200 Subject: stdlib: Update supported separators (query string) Update list of supported separators: - escaped_amp (default): "&" - amp: "&" - semicolon: ";" --- lib/stdlib/doc/src/uri_string.xml | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'lib/stdlib/doc/src') diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml index 496573ae2f..97b38ea93e 100644 --- a/lib/stdlib/doc/src/uri_string.xml +++ b/lib/stdlib/doc/src/uri_string.xml @@ -28,9 +28,10 @@ A uri_string - RFC 3986 compliant URI processing functions. + URI processing functions. -

This module contains functions for parsing and handling RFC 3986 compliant URIs.

+

This module contains functions for parsing and handling URIs (RFC 3986) and + form-urlencoded query strings (RFC 1866).

A URI is an identifier consisting of a sequence of characters matching the syntax rule named URI in RFC 3986.

The generic URI syntax consists of a hierarchical sequence of components referred @@ -109,7 +110,7 @@ QueryList, a list of unescaped key-value pairs. Media type application/x-www-form-urlencoded is defined in section 8.2.1 of RFC 1866 (HTML 2.0). Reserved and unsafe characters, as - defined by RFC 1738 (Uniform Resource Locators), are procent-encoded. + defined by RFC 1738 (Uniform Resource Locators), are percent-encoded.

Example:

@@ -125,8 +126,7 @@
       
         

Same as compose_query/1 but with an additional Options parameter, that controls the type of separator used - between key-value pairs. There are two supported separator types: amp () - and semicolon (;).

+ between key-value pairs. There are three supported separator types: amp (), escaped_amp () and semicolon (;). If the parameter Options is empty, separator takes the default value (escaped_amp).

Example:

 1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
-- 
cgit v1.2.3


From 642bb27f8104991445a1f507f6b065d3cd7cd1ae Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?= 
Date: Tue, 24 Oct 2017 09:17:55 +0200
Subject: stdlib: Fix title in uri_string.xml

---
 lib/stdlib/doc/src/uri_string.xml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'lib/stdlib/doc/src')

diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml
index 97b38ea93e..d67c687fd1 100644
--- a/lib/stdlib/doc/src/uri_string.xml
+++ b/lib/stdlib/doc/src/uri_string.xml
@@ -21,10 +21,10 @@
       limitations under the License.
     
 
-    maps
+    uri_string
     Péter Dimitrov
     1
-    2017-10-20
+    2017-10-24
     A
   
   uri_string
-- 
cgit v1.2.3


From b0c682a8118c5775da784e9a0f569ee995319f80 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?= 
Date: Thu, 26 Oct 2017 11:29:48 +0200
Subject: stdlib: Update documentation, error tuples

---
 lib/stdlib/doc/src/uri_string.xml | 117 +++++++++++++++++++++++++++-----------
 1 file changed, 85 insertions(+), 32 deletions(-)

(limited to 'lib/stdlib/doc/src')

diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml
index d67c687fd1..8322eecb24 100644
--- a/lib/stdlib/doc/src/uri_string.xml
+++ b/lib/stdlib/doc/src/uri_string.xml
@@ -30,10 +30,13 @@
   uri_string
   URI processing functions.
   
-    

This module contains functions for parsing and handling URIs (RFC 3986) and - form-urlencoded query strings (RFC 1866).

+

This module contains functions for parsing and handling URIs + (RFC 3986) and + form-urlencoded query strings (RFC 1866). +

A URI is an identifier consisting of a sequence of characters matching the syntax - rule named URI in RFC 3986.

+ rule named URI in RFC 3986. +

The generic URI syntax consists of a hierarchical sequence of components referred to as the scheme, authority, path, query, and fragment:

@@ -55,16 +58,24 @@
     


The interpretation of a URI depends only on the characters used and not on how those characters are represented in a network protocol.

-

The functions implemented by this module covers the following use cases:

+

The functions implemented by this module cover the following use cases:

- Parsing URIs

- parse/1
- Recomposing URIs

- recompose/2
- Transcoding URIs

- transcode/2
- Working with form-urlencoded query strings

- compose_query/[1,2], dissect_query/1
+ Parsing URIs into its components and returing a map

+ parse/1 +
+ Recomposing a map of URI components into a URI string

+ recompose/1 +
+ Changing inbound binary and percent-encoding of URIs

+ transcode/2 +
+ Composing form-urlencoded query strings from a list of key-value pairs

+ compose_query/1

+ compose_query/2 +
+ Dissecting form-urlencoded query strings into a list of key-value pairs

+ dissect_query/1 +

There are four different encodings present during the handling of URIs:

@@ -75,14 +86,29 @@

Unless otherwise specified the return value type and encoding are the same as the input type and encoding. That is, binary input returns binary output, list input returns a list - output but mixed input returns list output. Input and output encodings are the same except - for transcode/2.

+ output but mixed input returns list output.

All of the functions but transcode/2 expects input as unicode codepoints in lists, UTF-8 encoding in binaries and UTF-8 encoding in percent-encoded URI parts. transcode/2 provides the means to convert between the supported URI encodings.

+ + + +

Error tuple indicating the type of error. Possible values of the second component:

+ + invalid_character + invalid_input + invalid_map + invalid_percent_encoding + invalid_scheme + invalid_uri + invalid_utf8 + missing_value + +
+
@@ -93,7 +119,8 @@

List of unicode codepoints, UTF-8 encoded binary, or a mix of the two, - representing an RFC 3986 compliant URI (percent-encoded form). + representing an RFC 3986 + compliant URI (percent-encoded form). A URI is a sequence of characters from a very limited set: the letters of the basic Latin alphabet, digits, and a few special characters.

@@ -109,13 +136,21 @@

Composes a form-urlencoded QueryString based on a QueryList, a list of unescaped key-value pairs. Media type application/x-www-form-urlencoded is defined in section - 8.2.1 of RFC 1866 (HTML 2.0). Reserved and unsafe characters, as - defined by RFC 1738 (Uniform Resource Locators), are percent-encoded. + 8.2.1 of RFC 1866 + (HTML 2.0). Reserved and unsafe characters, as + defined by RFC 1738 + (Uniform Resource Locators), are percent-encoded.

+

See also the opposite operation + dissect_query/1.

Example:

-1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
-
+1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
+1> [{separator, semicolon}]).
+"foo+bar=1;city=%C3%B6rebro"
+2> >,<<"1">>},
+2> {<<"city">>,<<"örebro"/utf8>>}]).]]>
+>]]>
 	
@@ -127,11 +162,14 @@

Same as compose_query/1 but with an additional Options parameter, that controls the type of separator used between key-value pairs. There are three supported separator types: amp (), escaped_amp () and semicolon (;). If the parameter Options is empty, separator takes the default value (escaped_amp).

+

See also the opposite operation + dissect_query/1. +

Example:

 1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
-2> [{separator, semicolon}]).
-"foo+bar=1;city=%C3%B6rebro"
+1> [{separator, amp}]).
+
 	
@@ -143,13 +181,19 @@

Dissects an urlencoded QueryString and returns a QueryList, a list of unescaped key-value pairs. Media type application/x-www-form-urlencoded is defined in section - 8.2.1 of RFC 1866 (HTML 2.0). Percent-encoded segments are decoded - as defined by RFC 1738 (Uniform Resource Locators). + 8.2.1 of RFC 1866 + (HTML 2.0). Percent-encoded segments are decoded + as defined by RFC 1738 + (Uniform Resource Locators).

+

See also the opposite operation + compose_query/1.

Example:

 1> uri_string:dissect_query("foo+bar=1;city=%C3%B6rebro").
 [{"foo bar","1"},{"city","örebro"}]
+2> >).]]>
+>,<<"1">>},{<<"city">>,<<"örebro"/utf8>>}] ]]>
 	
@@ -159,14 +203,19 @@ Parse URI into a map.

Returns a URIMap, that is a uri_map() with the parsed components - of the URIString.

-

If parsing fails, an error tuple is returned.

+ of the URIString. If parsing fails, an error tuple is returned.

+

See also the opposite operation + recompose/1.

Example:

 1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
 #{fragment => "nose",host => "example.com",
   path => "/over/there",port => 8042,query => "name=ferret",
   scheme => foo,userinfo => "user"}
+2> >).]]>
+ <<"example.com">>,path => <<"/over/there">>,
+  port => 8042,query => <<"name=ferret">>,scheme => <<"foo">>,
+  userinfo => <<"user">>}]]>
 	
@@ -175,12 +224,15 @@ Recompose URI. -

Returns an RFC 3986 compliant URIString (percent-encoded).

-

If the URIMap is invalid, an error tuple is returned.

+

Returns an RFC 3986 compliant + URIString (percent-encoded). + If the URIMap is invalid, an error tuple is returned.

+

See also the opposite operation + parse/1.

Example:

 1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
-port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.
+1> port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.
 #{fragment => "top",host => "example.com",
   path => "/over/there",port => 8042,query => "?name=ferret",
   scheme => foo,userinfo => "user"}
@@ -194,14 +246,15 @@ port => 8042, query => "name=ferret", scheme => "foo", userinfo => "user"}.
       
       Transcode URI.
       
-        

Transcodes an RFC 3986 compliant URIString, +

Transcodes an RFC 3986 + compliant URIString, where Options is a list of tagged tuples, specifying the inbound - (in_encoding) and outbound (out_encoding) encodings.

-

If an argument is invalid, an error tuple is returned.

+ (in_encoding) and outbound (out_encoding) encodings. + If an argument is invalid, an error tuple is returned.

Example:

 1> >,]]>
-2> [{in_encoding, utf32},{out_encoding, utf8}]).
+1> [{in_encoding, utf32},{out_encoding, utf8}]).
 >]]>
 	
-- cgit v1.2.3 From f7d3033dfeeb012841729bf8ed3889da8457b4f7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?= Date: Mon, 30 Oct 2017 13:38:28 +0100 Subject: stdlib: Update documentation (normalize/1) --- lib/stdlib/doc/src/uri_string.xml | 115 +++++++++++++++++++++++++++----------- 1 file changed, 83 insertions(+), 32 deletions(-) (limited to 'lib/stdlib/doc/src') diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml index 8322eecb24..55d8690b98 100644 --- a/lib/stdlib/doc/src/uri_string.xml +++ b/lib/stdlib/doc/src/uri_string.xml @@ -69,6 +69,9 @@ Changing inbound binary and percent-encoding of URIs

transcode/2
+ Transforming URIs into a normalized form

+ normalize/1 +
Composing form-urlencoded query strings from a list of key-value pairs

compose_query/1

compose_query/2 @@ -84,12 +87,21 @@ Outbound binary encoding in binaries Outbound percent-encoding in lists and binaries +

Functions with uri_string() argument accept lists, binaries and + mixed lists (lists with binary elements) as input type. All of the functions but + transcode/2 expects input as lists of unicode codepoints, UTF-8 encoded binaries + and UTF-8 percent-encoded URI parts ("%C3%B6" corresponds to the unicode character "ö").

Unless otherwise specified the return value type and encoding are the same as the input type and encoding. That is, binary input returns binary output, list input returns a list output but mixed input returns list output.

-

All of the functions but transcode/2 expects input as unicode codepoints in - lists, UTF-8 encoding in binaries and UTF-8 encoding in percent-encoded URI parts. - transcode/2 provides the means to convert between the supported URI encodings.

+

In case of lists there is only percent-encoding. In binaries, however, both binary encoding + and percent-encoding shall be considered. transcode/2 provides the means to convert + between the supported encodings, it takes a uri_string() and a list of options + specifying inbound and outbound encodings.

+

RFC 3986 does not mandate any specific + character encoding and it is usually defined by the protocol or surrounding text. This library + takes the same assumption, binary and percent-encoding are handled as one configuration unit, + they cannot be set to different values.

@@ -97,28 +109,30 @@

Error tuple indicating the type of error. Possible values of the second component:

- - invalid_character - invalid_input - invalid_map - invalid_percent_encoding - invalid_scheme - invalid_uri - invalid_utf8 - missing_value - + + invalid_character + invalid_input + invalid_map + invalid_percent_encoding + invalid_scheme + invalid_uri + invalid_utf8 + missing_value + +

The third component is a list or binary providing additional information about the + cause of the error.

-

URI map holding the main components of a URI.

+

Map holding the main components of a URI.

-

List of unicode codepoints, UTF-8 encoded binary, or a mix of the two, +

List of unicode codepoints, a UTF-8 encoded binary, or a mix of the two, representing an RFC 3986 compliant URI (percent-encoded form). A URI is a sequence of characters from a very limited set: the letters of @@ -134,10 +148,11 @@ Compose urlencoded query string.

Composes a form-urlencoded QueryString based on a - QueryList, a list of unescaped key-value pairs. - Media type application/x-www-form-urlencoded is defined in section + QueryList, a list of non-percent-encoded key-value pairs. + Form-urlencoding is defined in section 8.2.1 of RFC 1866 - (HTML 2.0). Reserved and unsafe characters, as + (HTML 2.0) for media type application/x-www-form-urlencoded. + Reserved and unsafe characters, as defined by RFC 1738 (Uniform Resource Locators), are percent-encoded.

See also the opposite operation @@ -145,9 +160,8 @@

Example:

-1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
-1> [{separator, semicolon}]).
-"foo+bar=1;city=%C3%B6rebro"
+1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
+
 2> >,<<"1">>},
 2> {<<"city">>,<<"örebro"/utf8>>}]).]]>
 >]]>
@@ -169,7 +183,10 @@
         
 1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
 1> [{separator, amp}]).
-
+ uri_string:compose_query([{<<"foo bar">>,<<"1">>},
+2> {<<"city">>,<<"örebro"/utf8>>}], [{separator, escaped_amp}]).]]>
+>]]>
 	
@@ -179,12 +196,15 @@ Dissect query string.

Dissects an urlencoded QueryString and returns a - QueryList, a list of unescaped key-value pairs. - Media type application/x-www-form-urlencoded is defined in section + QueryList, a list of non-percent-encoded key-value pairs. + Form-urlencoding is defined in section 8.2.1 of RFC 1866 - (HTML 2.0). Percent-encoded segments are decoded - as defined by RFC 1738 + (HTML 2.0) for media type application/x-www-form-urlencoded. + Percent-encoded segments are decoded as defined by + RFC 1738 (Uniform Resource Locators).

+

Supported separator types: amp (), escaped_amp + () and semicolon (;).

See also the opposite operation compose_query/1.

@@ -192,18 +212,42 @@
 1> uri_string:dissect_query("foo+bar=1;city=%C3%B6rebro").
 [{"foo bar","1"},{"city","örebro"}]
-2> >).]]>
+2> >).]]>
 >,<<"1">>},{<<"city">>,<<"örebro"/utf8>>}] ]]>
 	
+ + + Syntax-based normalization. + +

Transforms URIString into a normalized form + using Syntax-Based Normalization as defined by + RFC 3986.

+

This function implements case normalization, percent-encoding + normalization, path segment normalization and scheme based normalization + for HTTP(S) with basic support for FTP, SSH, SFTP and TFTP.

+

Example:

+
+1> uri_string:normalize("/a/b/c/./../../g").
+"/a/g"
+2> >).]]>
+>]]>
+3> uri_string:normalize("http://localhost:80").
+"https://localhost/"
+	
+
+
+ Parse URI into a map. -

Returns a URIMap, that is a uri_map() with the parsed components - of the URIString. If parsing fails, an error tuple is returned.

+

Parses an RFC 3986 + compliant uri_string() into a uri_map(), that holds the parsed + components of the URI. + If parsing fails, an error tuple is returned.

See also the opposite operation recompose/1.

Example:

@@ -224,8 +268,9 @@ Recompose URI. -

Returns an RFC 3986 compliant - URIString (percent-encoded). +

Creates an RFC 3986 compliant + URIString (percent-encoded), based on the components of + URIMap. If the URIMap is invalid, an error tuple is returned.

See also the opposite operation parse/1.

@@ -249,13 +294,19 @@

Transcodes an RFC 3986 compliant URIString, where Options is a list of tagged tuples, specifying the inbound - (in_encoding) and outbound (out_encoding) encodings. + (in_encoding) and outbound (out_encoding) encodings. in_encoding + and out_encoding specifies both binary encoding and percent-encoding for the + input and output data. Mixed encoding, where binary encoding is not the same as + percent-encoding, is not supported. If an argument is invalid, an error tuple is returned.

Example:

 1> >,]]>
 1> [{in_encoding, utf32},{out_encoding, utf8}]).
 >]]>
+2> uri_string:transcode("foo%F6bar", [{in_encoding, latin1},
+2> {out_encoding, utf8}]).
+"foo%C3%B6bar"
 	
-- cgit v1.2.3 From 7a4d4e183ae5567d6242184b8268918904c872c6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?= Date: Mon, 30 Oct 2017 16:57:49 +0100 Subject: stdlib: Refactor helper functions in uri_string --- lib/stdlib/doc/src/uri_string.xml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'lib/stdlib/doc/src') diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml index 55d8690b98..8fa2a92370 100644 --- a/lib/stdlib/doc/src/uri_string.xml +++ b/lib/stdlib/doc/src/uri_string.xml @@ -119,7 +119,7 @@ invalid_utf8 missing_value -

The third component is a list or binary providing additional information about the +

The third component is a term providing additional information about the cause of the error.

-- cgit v1.2.3 From 7e5d062973e7cb4f9ee949529e9dcdb5785c1304 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?= Date: Mon, 6 Nov 2017 09:54:12 +0100 Subject: stdlib: Remove compose_query and dissect_query compose_query/{1,2} and dissect_query/1 removed as the implemented specification (HTML 2.0) is old. They will be re-implemented based on HTML5. --- lib/stdlib/doc/src/uri_string.xml | 87 +-------------------------------------- 1 file changed, 1 insertion(+), 86 deletions(-) (limited to 'lib/stdlib/doc/src') diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml index 8fa2a92370..9ace2b0a05 100644 --- a/lib/stdlib/doc/src/uri_string.xml +++ b/lib/stdlib/doc/src/uri_string.xml @@ -31,8 +31,7 @@ URI processing functions.

This module contains functions for parsing and handling URIs - (RFC 3986) and - form-urlencoded query strings (RFC 1866). + (RFC 3986).

A URI is an identifier consisting of a sequence of characters matching the syntax rule named URI in RFC 3986. @@ -72,13 +71,6 @@ Transforming URIs into a normalized form

normalize/1
- Composing form-urlencoded query strings from a list of key-value pairs

- compose_query/1

- compose_query/2 -
- Dissecting form-urlencoded query strings into a list of key-value pairs

- dissect_query/1 -

There are four different encodings present during the handling of URIs:

@@ -110,14 +102,12 @@

Error tuple indicating the type of error. Possible values of the second component:

- invalid_character invalid_input invalid_map invalid_percent_encoding invalid_scheme invalid_uri invalid_utf8 - missing_value

The third component is a term providing additional information about the cause of the error.

@@ -143,81 +133,6 @@ - - - Compose urlencoded query string. - -

Composes a form-urlencoded QueryString based on a - QueryList, a list of non-percent-encoded key-value pairs. - Form-urlencoding is defined in section - 8.2.1 of RFC 1866 - (HTML 2.0) for media type application/x-www-form-urlencoded. - Reserved and unsafe characters, as - defined by RFC 1738 - (Uniform Resource Locators), are percent-encoded.

-

See also the opposite operation - dissect_query/1. -

-

Example:

-
-1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}]).
-
-2> >,<<"1">>},
-2> {<<"city">>,<<"örebro"/utf8>>}]).]]>
->]]>
-	
-
-
- - - - Compose urlencoded query string. - -

Same as compose_query/1 but with an additional - Options parameter, that controls the type of separator used - between key-value pairs. There are three supported separator types: amp (), escaped_amp () and semicolon (;). If the parameter Options is empty, separator takes the default value (escaped_amp).

-

See also the opposite operation - dissect_query/1. -

-

Example:

-
-1> uri_string:compose_query([{"foo bar","1"},{"city","örebro"}],
-1> [{separator, amp}]).
- uri_string:compose_query([{<<"foo bar">>,<<"1">>},
-2> {<<"city">>,<<"örebro"/utf8>>}], [{separator, escaped_amp}]).]]>
->]]>
-	
-
-
- - - - Dissect query string. - -

Dissects an urlencoded QueryString and returns a - QueryList, a list of non-percent-encoded key-value pairs. - Form-urlencoding is defined in section - 8.2.1 of RFC 1866 - (HTML 2.0) for media type application/x-www-form-urlencoded. - Percent-encoded segments are decoded as defined by - RFC 1738 - (Uniform Resource Locators).

-

Supported separator types: amp (), escaped_amp - () and semicolon (;).

-

See also the opposite operation - compose_query/1. -

-

Example:

-
-1> uri_string:dissect_query("foo+bar=1;city=%C3%B6rebro").
-[{"foo bar","1"},{"city","örebro"}]
-2> >).]]>
->,<<"1">>},{<<"city">>,<<"örebro"/utf8>>}] ]]>
-	
-
-
- Syntax-based normalization. -- cgit v1.2.3