Age | Commit message (Collapse) | Author |
|
|
|
Implement socket options recvtclass, recvtos, recvttl and pktoptions.
Document the implemented socket options, new types and message formats.
The options recvtclass, recvtos and recvttl are boolean options that
when activated (true) for a socket will cause ancillary data to be
received through recvmsg(). That is for packet oriented sockets
(UDP and SCTP).
The required options for this feature were recvtclass and recvtos,
and recvttl was only added to test that the ancillary data parsing
handled multiple data items in one message correctly.
These options does not work on Windows since ancillary data
is not handled by the Winsock2 API.
For stream sockets (TCP) there is no clear connection between
a received packet and what is returned when reading data from
the socket, so recvmsg() is not useful. It is possible to get
the same ancillary data through a getsockopt() call with
the IPv6 socket option IPV6_PKTOPTIONS, on Linux named
IPV6_2292PKTOPTIONS after the now obsoleted RFC where it originated.
(unfortunately RFC 3542 that obsoletes it explicitly undefines
this way to get packet ancillary data from a stream socket)
Linux also has got a way to get packet ancillary data for IPv4
TCP sockets through a getsockopt() call with IP_PKTOPTIONS,
which appears to be Linux specific.
This implementation uses a flag field in the inet_drv.c socket
internal data that records if any setsockopt() call with recvtclass,
recvtos or recvttl (IPV6_RECVTCLASS, IP_RECVTOS or IP_RECVTTL)
has been activated. If so recvmsg() is used instead of recvfrom().
Ancillary data is delivered to the application by a new return
tuple format from gen_udp:recv/2,3 containing a list of
ancillary data tuples [{tclass,TCLASS} | {tos,TOS} | {ttl,TTL}],
as returned by recvmsg(). For a socket in active mode a new
message format, containing the ancillary data list, delivers
the data in the same way.
For gen_sctp the ancillary data is delivered in the same way,
except that the gen_sctp return tuple format already contained
an ancillary data list so there are just more possible elements
when using these socket options. Note that the active mode
message format has got an extra tuple level for the ancillary
data compared to what is now implemented gen_udp.
The gen_sctp active mode format was considered to be the odd one
- now all tuples containing ancillary data are flat,
except for gen_sctp active mode.
Note that testing has not shown that Linux SCTP sockets deliver
any ancillary data for these socket options, so it is probably
not implemented yet. Remains to be seen what FreeBSD does...
For gen_tcp inet:getopts([pktoptions]) will deliver the latest
received ancillary data for any activated socket option recvtclass,
recvtos or recvttl, on platforms where IP_PKTOPTIONS is defined
for an IPv4 socket, or where IPV6_PKTOPTIONS or IPV6_2292PKTOPTIONS
is defined for an IPv6 socket. It will be delivered as a
list of ancillary data items in the same way as for gen_udp
(and gen_sctp).
On some platforms, e.g the BSD:s, when you activate IP_RECVTOS
you get ancillary data tagged IP_RECVTOS with the TOS value,
but on Linux you get ancillary data tagged IP_TOS with the
TOS value. Linux follows the style of RFC 2292, and the BSD:s
use an older notion. For RFC 2292 that defines the IP_PKTOPTIONS
socket option it is more logical to tag the items with the
tag that is the item's, than with the tag that defines that you
want the item. Therefore this implementation translates all
BSD style ancillary data tags to the corresponding Linux style
data tags, so the application will only see the tags 'tclass',
'tos' and 'ttl' on all platforms.
|
|
I did not find any legitimate use of "can not", however skipped
changing e.g RFCs archived in the source tree.
|
|
This improves the latency of file operations as dirty schedulers
are a bit more eager to run jobs than async threads, and use a
single global queue rather than per-thread queues, eliminating the
risk of a job stalling behind a long-running job on the same thread
while other async threads sit idle.
There's no such thing as a free lunch though; the lowered latency
comes at the cost of increased busy-waiting which may have an
adverse effect on some applications. This behavior can be tweaked
with the +sbwt flag, but unfortunately it affects all types of
schedulers and not just dirty ones. We plan to add type-specific
flags at a later stage.
sendfile has been moved to inet_drv to lessen the effect of a nasty
race; the cooperation between inet_drv and efile has never been
airtight and the socket dying at the wrong time (Regardless of
reason) could result in fd aliasing. Moving it to the inet driver
makes it impossible to trigger this by closing the socket in the
middle of a sendfile operation, while still allowing it to be
aborted -- something that can't be done if it stays in the file
driver.
The race still occurs if the controlling process dies in the short
window between dispatching the sendfile operation and the dup(2)
call in the driver, but it's much less likely to happen now.
A proper fix is in the works.
--
Notable functional differences:
* The use_threads option for file:sendfile/5 no longer has any
effect.
* The file-specific DTrace probes have been removed. The same
effect can be achieved with normal tracing together with the
nif__entry/nif__return probes to track scheduling.
--
OTP-14256
|
|
|
|
bind to device is needed to properly support VRF-Lite under
Linux (see [1] for details).
[1]: https://www.kernel.org/doc/Documentation/networking/vrf.txt
|
|
|
|
|
|
|
|
|
|
Fix dialyzer warning for improper list in prim_inet
by not using an improper list.
|
|
|
|
When a AF_LOCAL file descriptor is created externally (e.g. Unix
Domain Socket) and passed to `gen_tcp:listen(0, [{fd, FD}])`, the
implementation incorrectly assigned the address family to be equal
to `inet`, which in the inet_drv driver translated to AF_INET instead
of AF_LOCAL (or AF_UNIX), and an `einval` error code was returned.
This patch fixes this problem such that the file descriptors of the
`local` address family are supported in the inet:fdopen/5,
gen_tcp:connect/3, gen_tcp:listen/2, gen_udp:open/2 calls
|
|
* c-rack/fix-typo-prim-inet:
Fix minor typo "timout" -> "timeout"
|
|
A new {line_delimiter, byte()} option allows line-oriented TCP-based
protocols to use a custom line delimiting character. It is to be
used in conjunction with {packet, line}.
This option also works with erlang:decode_packet/3 when its first argument
is 'line'.
|
|
|
|
|
|
Up until now, if {linger, {true, 0}} is set on the socket and there is
data in the port driver queue, the connection is not aborted until
the port queue is empty and close() is called on the underlying file
descriptor. This bug allows an idle TCP client to prevent a server
from terminating the connection and freeing resources. This patch
fixes the problem by discarding the port queue if the socket is closed
when {linger, {true, 0}} is set.
|
|
An ECONNRESET is a socket error which tells us that a TCP peer has sent
an RST. The RST indicates that they have aborted the connection and
that the payload we have received should not be considered complete. Up
until now, the implementation of TCP in inet_drv.c has hidden the
receipt of the RST from the user, treating it as though it was just
a FIN terminating the read side of the socket.
There are many cases where user code needs to be able to distinguish
between a socket that was closed normally and one that was aborted.
Setting the option {show_econnreset, true} enables the user to receive
ECONNRESET errors on both active and passive sockets.
A connected socket returned from gen_tcp:accept/1 will inherit the
show_econnreset setting of the listening socket.
By default this option is set to {show_econnreset, false}.
Note that this patch only enables the reporting of ECONNRESET when
the socket is being read from. It does not report ECONNRESET (or
EPIPE) when the user tries to write to a connection when an RST
has already been received. Currently the TCP implementation in
inet_drv.c hides all such send errors from the user in favour
of returning {error, close}. A separate patch will be needed to
enable the reporting of such errors.
|
|
If the driver queue is empty, or the user is requesting a 'read'
shutdown, then the shutdown() syscall is performed synchronously, as
per the old version of shutdown/2.
However, if the user is requesting a 'write' or 'read_write' shutdown,
and there is data in the driver queue for the socket, then the
shutdown() syscall is delayed and handled asynchronously when the
driver queue is written out.
This version of shutdown solves a number of issues with the old
version. The two main solutions it offers are:
* It doesn't block when the TCP peer is idle or slow. This is the
expected behaviour when shutdown() is called: the caller needs
to be able to continue reading from the socket, not be prevented
from doing so.
* It doesn't truncate the output. The current version of
gen_tcp:shutdown/2 will truncate any outbound data in the driver
queue after about 10 seconds if the TCP peer is idle of slow. Worse
yet, it doesn't even inform anyone that the data has been
truncated: 'ok' is returned to the caller; and a FIN rather than
an RST is sent to the TCP peer.
For a detailed description of all the problems with the old version
of shutdown, please see the EEP Light that was written to justify
this patch.
|
|
Conflicts:
erts/doc/src/notes.xml
erts/preloaded/ebin/prim_inet.beam
erts/vsn.mk
lib/kernel/doc/src/notes.xml
lib/kernel/vsn.mk
|
|
|
|
* maint:
Fix prim_inet:close/1
Ensure exit signal due to link precede port BIF return
Conflicts:
erts/preloaded/ebin/prim_inet.beam
|
|
|
|
Conflicts:
erts/preloaded/ebin/prim_inet.beam
lib/kernel/test/gen_sctp_SUITE.erl
|
|
|
|
Add the {active,N} socket option, where N is an integer in the range
-32768..32767, to allow a caller to specify the number of data messages to
be delivered to the controlling process. Once the socket's delivered
message count either reaches 0 or is explicitly set to 0 with
inet:setopts/2 or by including {active,0} as an option when the socket is
created, the socket transitions to passive ({active, false}) mode and the
socket's controlling process receives a message to inform it of the
transition. TCP sockets receive {tcp_passive,Socket}, UDP sockets receive
{udp_passive,Socket} and SCTP sockets receive {sctp_passive,Socket}.
The socket's delivered message counter defaults to 0, but it can be set
using {active,N} via any gen_tcp, gen_udp, or gen_sctp function that takes
socket options as arguments, or via inet:setopts/2. New N values are added
to the socket's current counter value, and negative numbers can be used to
reduce the counter value. Specifying a number that would cause the socket's
counter value to go above 32767 causes an einval error. If a negative
number is specified such that the counter value would become negative, the
socket's counter value is set to 0 and the socket transitions to passive
mode. If the counter value is already 0 and inet:setopts(Socket,
[{active,0}]) is specified, the counter value remains at 0 but the
appropriate passive mode transition message is generated for the socket.
This commit contains a modified preloaded prim_inet.beam due to changes in
prim_inet.erl.
Add tests for {active,N} mode for TCP, UDP, and SCTP sockets.
Add documentation for {active,N} mode for inet, gen_tcp, gen_udp, and
gen_sctp.
|
|
|
|
|
|
|
|
|
|
rickard/r16/port-optimizations/OTP-10336
* rickard/port-optimizations/OTP-10336:
Change annotate level for emacs-22 in cerl
Update etp-commands
Add documentation on communication in Erlang
Add support for busy port message queue
Add driver callback epilogue
Implement true asynchronous signaling between processes and ports
Add erl_drv_[send|output]_term
Move busy port flag
Use rwlock for driver list
Optimize management of port tasks
Improve configuration of process and port tables
Remove R9 compatibility features
Use ptab functionality also for ports
Prepare for use of ptab functionality also for ports
Atomic port state
Generalize process table implementation
Implement functionality for delaying thread progress from unmanaged threads
Conflicts:
erts/doc/src/erl_driver.xml
erts/doc/src/erlang.xml
erts/emulator/beam/beam_bif_load.c
erts/emulator/beam/beam_bp.c
erts/emulator/beam/beam_emu.c
erts/emulator/beam/bif.c
erts/emulator/beam/copy.c
erts/emulator/beam/erl_alloc.c
erts/emulator/beam/erl_alloc.types
erts/emulator/beam/erl_bif_info.c
erts/emulator/beam/erl_bif_port.c
erts/emulator/beam/erl_bif_trace.c
erts/emulator/beam/erl_init.c
erts/emulator/beam/erl_message.c
erts/emulator/beam/erl_port_task.c
erts/emulator/beam/erl_process.c
erts/emulator/beam/erl_process.h
erts/emulator/beam/erl_process_lock.c
erts/emulator/beam/erl_trace.c
erts/emulator/beam/export.h
erts/emulator/beam/global.h
erts/emulator/beam/io.c
erts/emulator/sys/unix/sys.c
erts/emulator/sys/vxworks/sys.c
erts/emulator/test/port_SUITE.erl
erts/etc/unix/cerl.src
erts/preloaded/ebin/erlang.beam
erts/preloaded/ebin/prim_inet.beam
erts/preloaded/src/prim_inet.erl
lib/hipe/cerl/erl_bif_types.erl
lib/kernel/doc/src/inet.xml
lib/kernel/src/inet.erl
|
|
|
|
|
|
* maint:
Bumped version nr
ssl & public_key: Workaround that some certificates encode countryname as utf8 and close down gracefully if other ASN-1 errors occur.
Add more cross reference links to ct docs
Remove config option from common_test args
Update user config to use nested tuple keys
Allow mixed IPv4 and IPv6 addresses to sctp_bindx
Add checks for in6addr_any and in6addr_loopback
Fix SCTP multihoming
observer: fix app file (Noticed-by: Motiejus Jakstys)
Fix lib/src/test/ssh_basic_SUITE.erl to fix IPv6 option typos
Prevent index from being corrupted if a nonexistent item is deleted
Add tests showing that trying to delete non-existing object may corrupt the table index
Fix Table Viewer search crash on new|changed|deleted rows
Escape control characters in Table Viewer
Fix Table Viewer crash after a 'Found' -> 'Not found' search sequence
inet_drv.c: Set sockaddr lengths in inet_set_[f]address
Conflicts:
erts/preloaded/ebin/prim_inet.beam
|
|
Also allow mixed address families to bind, since the first address on
a multihomed sctp socket must be bound with bind, while the rest are
to be bound using sctp_bindx.
At least Linux supports adding address of mixing families.
Make inet_set_faddress function available also when HAVE_SCTP is not
defined, since we use it to find an address for bind to be able to mix
ipv4 and ipv6 addresses.
|
|
|
|
|
|
Will catch system_limit and return error tuple instead. An uncaught
exception would be an incorrect behaviour. This problem would occur
for gen_tcp:listen/1,2 for example.
|
|
Ignore fd is a feature used by sendfile to temporarily remove
all driver_select calls on that fd so that another driver can
select on it. It also delays all actions which sends or receives
data in that fd until in the fd is no longer ignored.
Only the controlling_process should use the feature as it is otherwise
possible that the ignore will never be cleaned up and hence create
a memory leak in the driver.
An ignored driver will not detect that an fd has been closed until
it is unignored.
|
|
Created erlang fallback for sendfile in gen_tcp and
moved sendfile from file to gen_tcp. Also created testcases
for testing all different options to sendfile.
For info about how sendfile should work see the BSD man pages
as they contain a more complete API than other *nixes.
|
|
It is better that gen_sctp:open/0-2 returns the informative Posix
return code {error,eprotonosupport} than previously {error,badarg}
when SCTP is not supported since it is so platform dependent.
|
|
|
|
|
|
|
|
|
|
Joint effort by Kostis, pan, egil & sverker
|
|
|
|
|
|
* rani/sctp-sndrcvinfo/OTP-8795:
Fix xfer_active close expection for Solaris behaviour
Keep default #sctp_sndrcvinfo{} fields on gen_sctp:send/4
Fill in sinfo_assoc_id in struct sctp_sndrcvinfo for getopt()
Conflicts:
lib/kernel/test/gen_sctp_SUITE.erl
|