From 80feeb36f92a923f57f740c7c28c12bb8b69ec16 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A9ter=20Dimitrov?= Date: Fri, 28 Jul 2017 11:04:19 +0200 Subject: stdlib: Add API and doc of uri_string module --- lib/stdlib/doc/src/Makefile | 1 + lib/stdlib/doc/src/ref_man.xml | 1 + lib/stdlib/doc/src/specs.xml | 1 + lib/stdlib/doc/src/uri_string.xml | 255 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 258 insertions(+) create mode 100644 lib/stdlib/doc/src/uri_string.xml (limited to 'lib/stdlib/doc/src') diff --git a/lib/stdlib/doc/src/Makefile b/lib/stdlib/doc/src/Makefile index 93eac8220d..aeed79408b 100644 --- a/lib/stdlib/doc/src/Makefile +++ b/lib/stdlib/doc/src/Makefile @@ -98,6 +98,7 @@ XML_REF3_FILES = \ sys.xml \ timer.xml \ unicode.xml \ + uri_string.xml \ win32reg.xml \ zip.xml diff --git a/lib/stdlib/doc/src/ref_man.xml b/lib/stdlib/doc/src/ref_man.xml index 878a3babc5..68bfddbc71 100644 --- a/lib/stdlib/doc/src/ref_man.xml +++ b/lib/stdlib/doc/src/ref_man.xml @@ -93,6 +93,7 @@ + diff --git a/lib/stdlib/doc/src/specs.xml b/lib/stdlib/doc/src/specs.xml index 45b207b13d..d559adf9b6 100644 --- a/lib/stdlib/doc/src/specs.xml +++ b/lib/stdlib/doc/src/specs.xml @@ -60,6 +60,7 @@ + diff --git a/lib/stdlib/doc/src/uri_string.xml b/lib/stdlib/doc/src/uri_string.xml new file mode 100644 index 0000000000..e6b2bd5e80 --- /dev/null +++ b/lib/stdlib/doc/src/uri_string.xml @@ -0,0 +1,255 @@ + + + + +
+ + 20172017 + Ericsson AB. All Rights Reserved. + + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + + + maps + Péter Dimitrov + 1 + 2017-08-23 + A +
+ uri_string + RFC 3986 compliant URI processing functions. + +

This module contains functions for parsing and handling RFC 3986 compliant URIs.

+

A URI is an identifier consisting of a sequence of characters matching the syntax + rule named URI in RFC 3986.

+

The generic URI syntax consists of a hierarchical sequence of components referred + to as the scheme, authority, path, query, and fragment:

+    URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
+    hier-part   = "//" authority path-abempty
+                   / path-absolute
+                   / path-rootless
+                   / path-empty
+    scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
+    authority   = [ userinfo "@" ] host [ ":" port ]
+    userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
+
+    reserved    = gen-delims / sub-delims
+    gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"
+    sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
+                / "*" / "+" / "," / ";" / "="
+
+    unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
+    


+

+

The interpretation of a URI depends only on the characters used and not on how those + characters are represented in a network protocol.

+

The functions implemented by this module covers the following use cases: + + Parsing URIs

+ parse/1
+ Recomposing URIs

+ recompose/2
+ Resolving URI references

+ resolve_uri_reference/3
+ Creating URI references

+ create_uri_reference/3
+ Normalizing URIs

+ normalize/1
+ Transcoding URIs

+ transcode/2
+ Working with urlencoded query strings

+ compose_query/1, dissect_query/1
+
+

+

There are four different encodings present during the handling of URIs: + + Inbound binary encoding in binaries + Inbound percent-encoding in lists and binaries + Outbound binary encoding in binaries + Outbound percent-encoding in lists and binaries + +

+

Unless otherwise specified the return value type and encoding are the same as the input + type and encoding. That is, binary input returns binary output, list input returns a list + output but mixed input returns list output. Input and output encodings are the same except + for transcode/2.

+

All of the functions but transcode/2 expects input as unicode codepoints in + lists, UTF-8 encoding in binaries and UTF-8 encoding in percent-encoded URI parts. + transcode/2 provides the means to convert between the supported URI encodings.

+
+ + + + + +

Maybe improper list of bytes (0..255).

+
+
+ + + +

URI map holding the main components of a URI.

+
+
+ + + +

List of unicode codepoints, UTF-8 encoded binary, or a mix of the two, + representing an RFC 3986 compliant URI (percent-encoded form). + A URI is a sequence of characters from a very limited set: the letters of + the basic Latin alphabet, digits, and a few special characters.

+
+
+
+ + + + + + Compose urlencoded query string. + +

Composes an urlencoded QueryString based on a + QueryList, a list of unescaped key-value pairs. + Media type application/x-www-form-urlencoded is defined in section + 8.2.1 of RFC 1866 (HTML 2.0). +

+

If an argument is invalid, a badarg exception is raised.

+

Example:

+
+1> uri_string:compose_query(...).
+
+
+
+ + + + Create references. + +

Creates an RFC 3986 compliant RelativeDestURI, + based AbsoluteSourceURI and AbsoluteSourceURI +

+

If an argument is invalid, a badarg exception is raised.

+

Example:

+
+1> uri_string:create_uri_reference(...,...).
+
+
+
+ + + + Dissect query string. + +

Dissects an urlencoded QueryString and returns a + QueryList, a list of unescaped key-value pairs. + Media type application/x-www-form-urlencoded is defined in section + 8.2.1 of RFC 1866 (HTML 2.0). +

+

If an argument is invalid, a badarg exception is raised.

+

Example:

+
+1> uri_string:dissect_query(...).
+
+
+
+ + + + Normalize URI. + +

Normalizes an RFC 3986 compliant URIString and returns + a NormalizedURI. The algorithm used to shorten the input + URI is called Syntax-Based Normalization and described at + Section 6.2.2 of RFC 3986. +

+

If an argument is invalid, a badarg exception is raised.

+

Example:

+
+1> uri_string:normalize("http://example.org/one/two/../../one").
+"http://example.org/one"
+
+
+
+ + + + Parse URI into a map. + +

Returns a URIMap, that is a uri_map() with the parsed components + of the URIString.

+

If parsing fails, a parse_error exception is raised.

+

Example:

+
+1> uri_string:parse("foo://user@example.com:8042/over/there?name=ferret#nose").
+#{fragment => "nose",host => "example.com",
+  path => "/over/there",port => 8042,query => "name=ferret",
+  scheme => foo,userinfo => "user"}
+2> 
+
+
+ + + + Recompose URI. + +

Returns an RFC 3986 compliant URIString (percent-encoded).

+

If the URIMap is invalid, a badarg exception is raised.

+

Example:

+
+1> URIMap = #{fragment => "nose", host => "example.com", path => "/over/there",
+port => 8042, query => "name=ferret", scheme => foo, userinfo => "user"}.
+#{fragment => "top",host => "example.com",
+  path => "/over/there",port => 8042,query => "?name=ferret",
+  scheme => foo,userinfo => "user"}
+
+2> uri_string:recompose(URIMap, []).
+"foo://example.com:8042/over/there?name=ferret#nose"
+
+
+ + + + Resolve URI reference. + +

Resolves an RFC 3986 compliant RelativeURI, + based AbsoluteBaseURI and returns a new absolute URI + (AbsoluteDestURI).

+

If an argument is invalid, a badarg exception is raised.

+

Example:

+
+1> uri_string:resolve_uri_reference(...,...).
+
+
+
+ + + + Transcode URI. + +

Transcodes an RFC 3986 compliant URIString, + where Options is a list of tagged tuples, specifying the inbound + (in_encoding) and outbound (out_encoding) encodings.

+

If an argument is invalid, a badarg exception is raised.

+

Example:

+
+1> uri_string:transcode(<<"foo://f%20oo">>, [{in_encoding, utf8},
+{out_encoding, utf16}]).
+<<0,102,0,111,0,111,0,58,0,47,0,47,0,102,0,37,0,48,0,48,0,37,0,50,0,48,0,
+  111,0,111>>
+
+
+
+ +
+
-- cgit v1.2.3