Age | Commit message (Collapse) | Author |
|
* maint:
kernel runtime dependency to erts
erts: Add yield via timeout to inet read_packet
erts: Don't increase buffer when sctp sndbuf is set
erts: Only change inet buffer if not set
|
|
into maint
* lukas/erts/fix_inet_buffer_auto_adjust/OTP-15651/OTP-15652:
kernel runtime dependency to erts
erts: Add yield via timeout to inet read_packet
erts: Don't increase buffer when sctp sndbuf is set
erts: Only change inet buffer if not set
|
|
The idea here is that the timeout of 0 will work like a yield
so that that we don't starve other ports/processes, but it
is faster than the select trigger. We don't deselect on the socket
because it does not matter if it is triggered laster or not
as we'll just get an EAGAIN.
Doing this also circumvents the fact that no select is done on
active true style sockets until all I/O has been handled. So in
a system with a lot of active true style I/O this will could be
very benificial.
|
|
This is most likely a copy-paste bug that has lived in the code
unnoticed for 5+ years...
|
|
We only want to autoupdate update the buffer if recbuf
is set if the buffer has not been set before.
|
|
|
|
to increase the probablity of a nice badarg
from erlang:port_control.
|
|
|
|
* lukas/erts/fix_inet_multitimer_cleanup/OTP-15536:
erts: Fix cleanup of the inet MultiTimer
|
|
|
|
|
|
* lukas/erts/inet_pktopts_old_linux/OTP-15494:
erts: Fix inet pktopts on very old linux kernels
|
|
Conflicts:
erts/emulator/beam/erl_process.c
|
|
* lukas/OTP-21.1.1/scheduler_pollset/OTP-15475:
erts: Move fds with active true behaviour to own pollset
erts: Fix lists_member_2 reduction count
erts: Allow code_model_small to be set in xcomp setting
erts: Implement delay_send using timer instead of poll
erts: Optimize driver_set_timer(0) to fire at once
erts: Optimize the inet driver multi timers for one timer
erts: Move all inet tcp CONNECTED timers to multi timer
erts: Add erts_io_notify_port_task_executed to check_io msacc state
erts: Add pre-alloc to ALLOC msacc state
erts: Make thr prgr wakeup current or sched 1
erts: Pass thread progress data where possible
|
|
|
|
The previous implementation uses a round-trip in the poll-set
to simulate a yield of the port context. With the poll thread
implementation this is no longer a good idea as it generated a
lot more work for the system. So this commit changes the
implementation to use a timer instead.
OTP-15471
|
|
The most common case for any socket is to have zero or one
timer, so we optimize for the one case. The only case when
we have more than one timer is when the multi accept feature
is used.
|
|
|
|
|
|
* sverker/erts/sendfile-error-bug/ERL-784/OTP-15461:
erts: Fix hanging sendfile bugs when socket closes unexpectedly
erts: Fix unexpected inet_reply message from failing file:sendfile
erts: Fix bug in sendfile for active socket
|
|
|
|
A failing file:sendfile call would often send a message
{inet_reply, Port, {error, Reason}} that would pollute the mailbox
of the calling process.
TCP_REQ_SENDFILE has its own reply messages format
{sendfile, _, _} and does not expect an inet_reply message.
Solution: Suppress inet_reply error message if TCP_ADDF_SENDFILE
is set.
|
|
driver_select() was called after port had been killed
by tcp_inet_sendfile() calling tcp_send_error().
|
|
* maint:
"cork" tcp socket around file:sendfile
Add nopush TCP socket option
|
|
* igor/tcp-nopush-ERL-698/OTP-15357:
"cork" tcp socket around file:sendfile
Add nopush TCP socket option
|
|
* maint:
Updated OTP version
Prepare release
erts: Fix UNC path handling on Windows
erts: Fix a compiler warning
eldap: Fix race at socket close
Fix bug for sockopt pktoptions on BSD
erts: Fix memory leak on file read errors
|
|
* maint-21:
Updated OTP version
Prepare release
erts: Fix UNC path handling on Windows
erts: Fix a compiler warning
eldap: Fix race at socket close
Fix bug for sockopt pktoptions on BSD
erts: Fix memory leak on file read errors
|
|
Conflicts:
erts/preloaded/ebin/prim_inet.beam
|
|
This translates to TCP_CORK on Linux and TCP_NOPUSH on
BSD.
In effect, this acts as super-Nagle: no partial TCP segments
are sent out until this option is turned off. Once turned off,
all accumulated unsent data is sent out immediately. The latter
is *not* the case on OSX, hence the implementation ignores
"nopush" on OSX to reduce confusion.
|
|
Also implement the same option for the legacy undocumented functions
inet:getif/1,getiflist/1,ifget/2,ifset/2.
The arity 1 functions had before this change got signatures that
took a socket port that was used to do the needed syscall, so now
the signature was extended to also take an option list with the
only supported option {netns,Namespace}. The Socket argument
variant remains unsupported.
For inet:getifaddrs/1 the documentation file was changed to old
style function name definition so be able to hide the Socket
argument variant that is visible in the type spec.
The arity 2 functions had got an option list as second argument.
This list had to be partitioned into one list for the namespace
option(s) and the other for the rest.
The namespace option list was then fed to the already existing
namespace support for socket opening, which places the socket
in a namespace and hence made all these functions that in
inet_drv.c used getsockopt() work without change.
The functions that used getifaddrs() in inet_drv.c had to be
changed in inet_drv.c to swap namespaces around the
getifaddrs() syscall. This functionality was separated into
a new function call_getifaddrs().
|
|
The macros for the BSD style option names had accidentally
wound up outside the option parsing loop, causing unclear
behaviour and Valgrind errors.
|
|
|
|
|
|
|
|
|
|
Implement socket options recvtclass, recvtos, recvttl and pktoptions.
Document the implemented socket options, new types and message formats.
The options recvtclass, recvtos and recvttl are boolean options that
when activated (true) for a socket will cause ancillary data to be
received through recvmsg(). That is for packet oriented sockets
(UDP and SCTP).
The required options for this feature were recvtclass and recvtos,
and recvttl was only added to test that the ancillary data parsing
handled multiple data items in one message correctly.
These options does not work on Windows since ancillary data
is not handled by the Winsock2 API.
For stream sockets (TCP) there is no clear connection between
a received packet and what is returned when reading data from
the socket, so recvmsg() is not useful. It is possible to get
the same ancillary data through a getsockopt() call with
the IPv6 socket option IPV6_PKTOPTIONS, on Linux named
IPV6_2292PKTOPTIONS after the now obsoleted RFC where it originated.
(unfortunately RFC 3542 that obsoletes it explicitly undefines
this way to get packet ancillary data from a stream socket)
Linux also has got a way to get packet ancillary data for IPv4
TCP sockets through a getsockopt() call with IP_PKTOPTIONS,
which appears to be Linux specific.
This implementation uses a flag field in the inet_drv.c socket
internal data that records if any setsockopt() call with recvtclass,
recvtos or recvttl (IPV6_RECVTCLASS, IP_RECVTOS or IP_RECVTTL)
has been activated. If so recvmsg() is used instead of recvfrom().
Ancillary data is delivered to the application by a new return
tuple format from gen_udp:recv/2,3 containing a list of
ancillary data tuples [{tclass,TCLASS} | {tos,TOS} | {ttl,TTL}],
as returned by recvmsg(). For a socket in active mode a new
message format, containing the ancillary data list, delivers
the data in the same way.
For gen_sctp the ancillary data is delivered in the same way,
except that the gen_sctp return tuple format already contained
an ancillary data list so there are just more possible elements
when using these socket options. Note that the active mode
message format has got an extra tuple level for the ancillary
data compared to what is now implemented gen_udp.
The gen_sctp active mode format was considered to be the odd one
- now all tuples containing ancillary data are flat,
except for gen_sctp active mode.
Note that testing has not shown that Linux SCTP sockets deliver
any ancillary data for these socket options, so it is probably
not implemented yet. Remains to be seen what FreeBSD does...
For gen_tcp inet:getopts([pktoptions]) will deliver the latest
received ancillary data for any activated socket option recvtclass,
recvtos or recvttl, on platforms where IP_PKTOPTIONS is defined
for an IPv4 socket, or where IPV6_PKTOPTIONS or IPV6_2292PKTOPTIONS
is defined for an IPv6 socket. It will be delivered as a
list of ancillary data items in the same way as for gen_udp
(and gen_sctp).
On some platforms, e.g the BSD:s, when you activate IP_RECVTOS
you get ancillary data tagged IP_RECVTOS with the TOS value,
but on Linux you get ancillary data tagged IP_TOS with the
TOS value. Linux follows the style of RFC 2292, and the BSD:s
use an older notion. For RFC 2292 that defines the IP_PKTOPTIONS
socket option it is more logical to tag the items with the
tag that is the item's, than with the tag that defines that you
want the item. Therefore this implementation translates all
BSD style ancillary data tags to the corresponding Linux style
data tags, so the application will only see the tags 'tclass',
'tos' and 'ttl' on all platforms.
|
|
|
|
There is no reason to have a larger buffer than this as
the recvmsg call will never return more data.
OTP-15206
|
|
I did not find any legitimate use of "can not", however skipped
changing e.g RFCs archived in the source tree.
|
|
Only SCPT should keep the recv buffer when going into
select. If UDP does it, it will result in many more
memory allocations than there should be which can be
very detrimental to performance.
|
|
* john/erts/inet-drv-race/OTP-15158/ERL-654:
Fix a race condition when generating async operation ids
|
|
The counter used for generating async operation ids was a plain int
shared between all ports, which was incorrect but mostly worked
fine since the ids only had to be unique on a per-port basis.
However, some compilers (notably GCC 8.1.1) generated code that
assumed that this value didn't change between reads. Using a
shortened version of enq_async_w_tmo as an example:
int id = async_ref++;
op->id = id; //A
return id; //B
In GCC 7 and earlier, `async_ref` would be read once and assigned
to `id` before being incremented, which kept the values at A and B
consistent. In GCC 8, `async_ref` was read when assigned at A and
read again at B, and then incremented, which made them inconsistent
if we raced with another port.
This commit fixes the issue by removing `async_ref` altogether and
replacing it with a per-port counter which makes it impossible to
race with someone else.
|
|
|
|
* lukas/erts/tcp_send_return_closed/OTP-15001:
erts: tcp send should return {error,closed}
|
|
This improves the latency of file operations as dirty schedulers
are a bit more eager to run jobs than async threads, and use a
single global queue rather than per-thread queues, eliminating the
risk of a job stalling behind a long-running job on the same thread
while other async threads sit idle.
There's no such thing as a free lunch though; the lowered latency
comes at the cost of increased busy-waiting which may have an
adverse effect on some applications. This behavior can be tweaked
with the +sbwt flag, but unfortunately it affects all types of
schedulers and not just dirty ones. We plan to add type-specific
flags at a later stage.
sendfile has been moved to inet_drv to lessen the effect of a nasty
race; the cooperation between inet_drv and efile has never been
airtight and the socket dying at the wrong time (Regardless of
reason) could result in fd aliasing. Moving it to the inet driver
makes it impossible to trigger this by closing the socket in the
middle of a sendfile operation, while still allowing it to be
aborted -- something that can't be done if it stays in the file
driver.
The race still occurs if the controlling process dies in the short
window between dispatching the sendfile operation and the dup(2)
call in the driver, but it's much less likely to happen now.
A proper fix is in the works.
--
Notable functional differences:
* The use_threads option for file:sendfile/5 no longer has any
effect.
* The file-specific DTrace probes have been removed. The same
effect can be achieved with normal tracing together with the
nif__entry/nif__return probes to track scheduling.
--
OTP-14256
|
|
* maint:
Updated OTP version
Prepare release
ssh: testcases for space trailing Hello msg
ssh: Don't remove trailing WS in Hello msg
ssh: dialyzer fixes
ssh: Fix broken error handling during session setup
Remove invalid EINTR loop around close(2)
Conflicts:
OTP_VERSION
|
|
* maint-20:
Updated OTP version
Prepare release
ssh: testcases for space trailing Hello msg
ssh: Don't remove trailing WS in Hello msg
ssh: dialyzer fixes
ssh: Fix broken error handling during session setup
Remove invalid EINTR loop around close(2)
Conflicts:
lib/ssh/test/ssh_options_SUITE.erl
|
|
* john/erts/fix-close-eintr/OTP-14775:
Remove invalid EINTR loop around close(2)
|
|
Retrying close(2) on anything other than HP-UX is likely to close
something entirely different. POSIX says that the state of the file
descriptor is unspecified, and Linux/BSD guarantee that it's closed
on return.
|
|
OTP-13760
SCTP connect could return error even if the connect is ongoing
|