aboutsummaryrefslogtreecommitdiffstats
path: root/erts/preloaded/src/prim_inet.erl
AgeCommit message (Collapse)Author
2018-10-22Merge branch 'raimo/tcp-close-while-send/maint/ERL-561/OTP-12242' into maintRaimo Niskanen
* raimo/tcp-close-while-send/maint/ERL-561/OTP-12242: Write test case Fix hanging gen_tcp send vs close race Conflicts: erts/preloaded/ebin/prim_inet.beam
2018-10-19Fix hanging gen_tcp send vs close raceRaimo Niskanen
While a gen_tcp send was in progress with filled buffers and slow receiver a close (from another process) would place the port in a half dead state so the port could not signal back to send, that waited for confirmation. The solution is to after some time (5 s) of waiting for send confirmation set a monitor on the port, which detects if the port becomes half dead due to close from another process. The close pending loop has also been improved to use the linger timeout for waiting, and to set a system timeout (arbitrarily selected 3 min) to not wait forever when the other end reads data s l o w l y (tarpitting, kind of).
2018-10-17Merge branch 'igor/tcp-nopush-ERL-698/OTP-15357' into maintJohn Högberg
* igor/tcp-nopush-ERL-698/OTP-15357: "cork" tcp socket around file:sendfile Add nopush TCP socket option
2018-10-11"cork" tcp socket around file:sendfileIgor Slepchin
This fixes 200ms delay on the last TCP segment when using file:sendfile/2 on Linux (ERL-698).
2018-10-11Add nopush TCP socket optionIgor Slepchin
This translates to TCP_CORK on Linux and TCP_NOPUSH on BSD. In effect, this acts as super-Nagle: no partial TCP segments are sent out until this option is turned off. Once turned off, all accumulated unsent data is sent out immediately. The latter is *not* the case on OSX, hence the implementation ignores "nopush" on OSX to reduce confusion.
2018-10-02Implement {netns,NS} option for inet:getifaddrs/1Raimo Niskanen
Also implement the same option for the legacy undocumented functions inet:getif/1,getiflist/1,ifget/2,ifset/2. The arity 1 functions had before this change got signatures that took a socket port that was used to do the needed syscall, so now the signature was extended to also take an option list with the only supported option {netns,Namespace}. The Socket argument variant remains unsupported. For inet:getifaddrs/1 the documentation file was changed to old style function name definition so be able to hide the Socket argument variant that is visible in the type spec. The arity 2 functions had got an option list as second argument. This list had to be partitioned into one list for the namespace option(s) and the other for the rest. The namespace option list was then fed to the already existing namespace support for socket opening, which places the socket in a namespace and hence made all these functions that in inet_drv.c used getsockopt() work without change. The functions that used getifaddrs() in inet_drv.c had to be changed in inet_drv.c to swap namespaces around the getifaddrs() syscall. This functionality was separated into a new function call_getifaddrs().
2018-09-04Implement socket option recvtos and friendsRaimo Niskanen
Implement socket options recvtclass, recvtos, recvttl and pktoptions. Document the implemented socket options, new types and message formats. The options recvtclass, recvtos and recvttl are boolean options that when activated (true) for a socket will cause ancillary data to be received through recvmsg(). That is for packet oriented sockets (UDP and SCTP). The required options for this feature were recvtclass and recvtos, and recvttl was only added to test that the ancillary data parsing handled multiple data items in one message correctly. These options does not work on Windows since ancillary data is not handled by the Winsock2 API. For stream sockets (TCP) there is no clear connection between a received packet and what is returned when reading data from the socket, so recvmsg() is not useful. It is possible to get the same ancillary data through a getsockopt() call with the IPv6 socket option IPV6_PKTOPTIONS, on Linux named IPV6_2292PKTOPTIONS after the now obsoleted RFC where it originated. (unfortunately RFC 3542 that obsoletes it explicitly undefines this way to get packet ancillary data from a stream socket) Linux also has got a way to get packet ancillary data for IPv4 TCP sockets through a getsockopt() call with IP_PKTOPTIONS, which appears to be Linux specific. This implementation uses a flag field in the inet_drv.c socket internal data that records if any setsockopt() call with recvtclass, recvtos or recvttl (IPV6_RECVTCLASS, IP_RECVTOS or IP_RECVTTL) has been activated. If so recvmsg() is used instead of recvfrom(). Ancillary data is delivered to the application by a new return tuple format from gen_udp:recv/2,3 containing a list of ancillary data tuples [{tclass,TCLASS} | {tos,TOS} | {ttl,TTL}], as returned by recvmsg(). For a socket in active mode a new message format, containing the ancillary data list, delivers the data in the same way. For gen_sctp the ancillary data is delivered in the same way, except that the gen_sctp return tuple format already contained an ancillary data list so there are just more possible elements when using these socket options. Note that the active mode message format has got an extra tuple level for the ancillary data compared to what is now implemented gen_udp. The gen_sctp active mode format was considered to be the odd one - now all tuples containing ancillary data are flat, except for gen_sctp active mode. Note that testing has not shown that Linux SCTP sockets deliver any ancillary data for these socket options, so it is probably not implemented yet. Remains to be seen what FreeBSD does... For gen_tcp inet:getopts([pktoptions]) will deliver the latest received ancillary data for any activated socket option recvtclass, recvtos or recvttl, on platforms where IP_PKTOPTIONS is defined for an IPv4 socket, or where IPV6_PKTOPTIONS or IPV6_2292PKTOPTIONS is defined for an IPv6 socket. It will be delivered as a list of ancillary data items in the same way as for gen_udp (and gen_sctp). On some platforms, e.g the BSD:s, when you activate IP_RECVTOS you get ancillary data tagged IP_RECVTOS with the TOS value, but on Linux you get ancillary data tagged IP_TOS with the TOS value. Linux follows the style of RFC 2292, and the BSD:s use an older notion. For RFC 2292 that defines the IP_PKTOPTIONS socket option it is more logical to tag the items with the tag that is the item's, than with the tag that defines that you want the item. Therefore this implementation translates all BSD style ancillary data tags to the corresponding Linux style data tags, so the application will only see the tags 'tclass', 'tos' and 'ttl' on all platforms.
2017-11-30Reimplement efile_drv as a dirty NIFJohn Högberg
This improves the latency of file operations as dirty schedulers are a bit more eager to run jobs than async threads, and use a single global queue rather than per-thread queues, eliminating the risk of a job stalling behind a long-running job on the same thread while other async threads sit idle. There's no such thing as a free lunch though; the lowered latency comes at the cost of increased busy-waiting which may have an adverse effect on some applications. This behavior can be tweaked with the +sbwt flag, but unfortunately it affects all types of schedulers and not just dirty ones. We plan to add type-specific flags at a later stage. sendfile has been moved to inet_drv to lessen the effect of a nasty race; the cooperation between inet_drv and efile has never been airtight and the socket dying at the wrong time (Regardless of reason) could result in fd aliasing. Moving it to the inet driver makes it impossible to trigger this by closing the socket in the middle of a sendfile operation, while still allowing it to be aborted -- something that can't be done if it stays in the file driver. The race still occurs if the controlling process dies in the short window between dispatching the sendfile operation and the dup(2) call in the driver, but it's much less likely to happen now. A proper fix is in the works. -- Notable functional differences: * The use_threads option for file:sendfile/5 no longer has any effect. * The file-specific DTrace probes have been removed. The same effect can be achieved with normal tracing together with the nif__entry/nif__return probes to track scheduling. -- OTP-14256
2017-05-04Update copyright yearRaimo Niskanen
2017-04-20implement SO_BINDTODEVICE for inet protocolsAndreas Schultz
bind to device is needed to properly support VRF-Lite under Linux (see [1] for details). [1]: https://www.kernel.org/doc/Documentation/networking/vrf.txt
2016-09-12Implement IPV6_TCLASSRaimo Niskanen
2016-07-29Handle AF_UNSPEC and AF_UNDEFINED correctlyRaimo Niskanen
2016-06-09Document the local (unix) address familyRaimo Niskanen
2016-06-08Remove internal state BOUND from inet_drvRaimo Niskanen
2016-06-02AF_UNIX is more portableRaimo Niskanen
Fix dialyzer warning for improper list in prim_inet by not using an improper list.
2016-06-01Rewrite inet_drv for AF_LOCALRaimo Niskanen
2016-01-12Assign externally open fd to gen_tcp (UDS support)Serge Aleynikov
When a AF_LOCAL file descriptor is created externally (e.g. Unix Domain Socket) and passed to `gen_tcp:listen(0, [{fd, FD}])`, the implementation incorrectly assigned the address family to be equal to `inet`, which in the inet_drv driver translated to AF_INET instead of AF_LOCAL (or AF_UNIX), and an `einval` error code was returned. This patch fixes this problem such that the file descriptors of the `local` address family are supported in the inet:fdopen/5, gen_tcp:connect/3, gen_tcp:listen/2, gen_udp:open/2 calls
2015-11-16Merge branch 'c-rack/fix-typo-prim-inet' into maintHenrik Nord
* c-rack/fix-typo-prim-inet: Fix minor typo "timout" -> "timeout"
2015-10-26erts: Add {line_delimiter, byte()} option to inet:setopts/2Serge Aleynikov
A new {line_delimiter, byte()} option allows line-oriented TCP-based protocols to use a custom line delimiting character. It is to be used in conjunction with {packet, line}. This option also works with erlang:decode_packet/3 when its first argument is 'line'.
2015-10-15Fix minor typo "timout" -> "timeout"Constantin Rack
2015-06-18Change license text to APLv2Bruce Yinhe
2015-06-09Fix socket option {linger, {true, 0}} to abort TCP connectionsRory Byrne
Up until now, if {linger, {true, 0}} is set on the socket and there is data in the port driver queue, the connection is not aborted until the port queue is empty and close() is called on the underlying file descriptor. This bug allows an idle TCP client to prevent a server from terminating the connection and freeing resources. This patch fixes the problem by discarding the port queue if the socket is closed when {linger, {true, 0}} is set.
2015-06-09Add 'show_econnreset' TCP socket optionRory Byrne
An ECONNRESET is a socket error which tells us that a TCP peer has sent an RST. The RST indicates that they have aborted the connection and that the payload we have received should not be considered complete. Up until now, the implementation of TCP in inet_drv.c has hidden the receipt of the RST from the user, treating it as though it was just a FIN terminating the read side of the socket. There are many cases where user code needs to be able to distinguish between a socket that was closed normally and one that was aborted. Setting the option {show_econnreset, true} enables the user to receive ECONNRESET errors on both active and passive sockets. A connected socket returned from gen_tcp:accept/1 will inherit the show_econnreset setting of the listening socket. By default this option is set to {show_econnreset, false}. Note that this patch only enables the reporting of ECONNRESET when the socket is being read from. It does not report ECONNRESET (or EPIPE) when the user tries to write to a connection when an RST has already been received. Currently the TCP implementation in inet_drv.c hides all such send errors from the user in favour of returning {error, close}. A separate patch will be needed to enable the reporting of such errors.
2015-05-12Fix gen_tcp:shutdown/2 by making it asynchronousRory Byrne
If the driver queue is empty, or the user is requesting a 'read' shutdown, then the shutdown() syscall is performed synchronously, as per the old version of shutdown/2. However, if the user is requesting a 'write' or 'read_write' shutdown, and there is data in the driver queue for the socket, then the shutdown() syscall is delayed and handled asynchronously when the driver queue is written out. This version of shutdown solves a number of issues with the old version. The two main solutions it offers are: * It doesn't block when the TCP peer is idle or slow. This is the expected behaviour when shutdown() is called: the caller needs to be able to continue reading from the socket, not be prevented from doing so. * It doesn't truncate the output. The current version of gen_tcp:shutdown/2 will truncate any outbound data in the driver queue after about 10 seconds if the TCP peer is idle of slow. Worse yet, it doesn't even inform anyone that the data has been truncated: 'ok' is returned to the caller; and a FIN rather than an RST is sent to the TCP peer. For a detailed description of all the problems with the old version of shutdown, please see the EEP Light that was written to justify this patch.
2014-07-24Merge branch 'maint-r16' into maintHenrik Nord
Conflicts: erts/doc/src/notes.xml erts/preloaded/ebin/prim_inet.beam erts/vsn.mk lib/kernel/doc/src/notes.xml lib/kernel/vsn.mk
2014-07-22kernel: When doing an fdopen we now also bind the fd to the specified addr/portLukas Larsson
2013-11-28Merge branch 'maint'Rickard Green
* maint: Fix prim_inet:close/1 Ensure exit signal due to link precede port BIF return Conflicts: erts/preloaded/ebin/prim_inet.beam
2013-11-26Fix prim_inet:close/1Rickard Green
2013-11-26Merge branch 'maint'Raimo Niskanen
Conflicts: erts/preloaded/ebin/prim_inet.beam lib/kernel/test/gen_sctp_SUITE.erl
2013-11-07Implement prim_inet:socknames/1,2 and prim_inet:peernames/1,2Raimo Niskanen
2013-09-23add {active,N} socket option for TCP, UDP, and SCTPSteve Vinoski
Add the {active,N} socket option, where N is an integer in the range -32768..32767, to allow a caller to specify the number of data messages to be delivered to the controlling process. Once the socket's delivered message count either reaches 0 or is explicitly set to 0 with inet:setopts/2 or by including {active,0} as an option when the socket is created, the socket transitions to passive ({active, false}) mode and the socket's controlling process receives a message to inform it of the transition. TCP sockets receive {tcp_passive,Socket}, UDP sockets receive {udp_passive,Socket} and SCTP sockets receive {sctp_passive,Socket}. The socket's delivered message counter defaults to 0, but it can be set using {active,N} via any gen_tcp, gen_udp, or gen_sctp function that takes socket options as arguments, or via inet:setopts/2. New N values are added to the socket's current counter value, and negative numbers can be used to reduce the counter value. Specifying a number that would cause the socket's counter value to go above 32767 causes an einval error. If a negative number is specified such that the counter value would become negative, the socket's counter value is set to 0 and the socket transitions to passive mode. If the counter value is already 0 and inet:setopts(Socket, [{active,0}]) is specified, the counter value remains at 0 but the appropriate passive mode transition message is generated for the socket. This commit contains a modified preloaded prim_inet.beam due to changes in prim_inet.erl. Add tests for {active,N} mode for TCP, UDP, and SCTP sockets. Add documentation for {active,N} mode for inet, gen_tcp, gen_udp, and gen_sctp.
2013-07-17Implement emulator netns support for TCP and UDPRaimo Niskanen
2013-06-15Improve solutionRaimo Niskanen
2013-05-24Do not unlink before closing portRaimo Niskanen
2013-05-06Make high_msgq_watermark and low_msgq_watermark generic inet optionsRickard Green
2012-12-07Merge branch 'rickard/port-optimizations/OTP-10336' into ↵Rickard Green
rickard/r16/port-optimizations/OTP-10336 * rickard/port-optimizations/OTP-10336: Change annotate level for emacs-22 in cerl Update etp-commands Add documentation on communication in Erlang Add support for busy port message queue Add driver callback epilogue Implement true asynchronous signaling between processes and ports Add erl_drv_[send|output]_term Move busy port flag Use rwlock for driver list Optimize management of port tasks Improve configuration of process and port tables Remove R9 compatibility features Use ptab functionality also for ports Prepare for use of ptab functionality also for ports Atomic port state Generalize process table implementation Implement functionality for delaying thread progress from unmanaged threads Conflicts: erts/doc/src/erl_driver.xml erts/doc/src/erlang.xml erts/emulator/beam/beam_bif_load.c erts/emulator/beam/beam_bp.c erts/emulator/beam/beam_emu.c erts/emulator/beam/bif.c erts/emulator/beam/copy.c erts/emulator/beam/erl_alloc.c erts/emulator/beam/erl_alloc.types erts/emulator/beam/erl_bif_info.c erts/emulator/beam/erl_bif_port.c erts/emulator/beam/erl_bif_trace.c erts/emulator/beam/erl_init.c erts/emulator/beam/erl_message.c erts/emulator/beam/erl_port_task.c erts/emulator/beam/erl_process.c erts/emulator/beam/erl_process.h erts/emulator/beam/erl_process_lock.c erts/emulator/beam/erl_trace.c erts/emulator/beam/export.h erts/emulator/beam/global.h erts/emulator/beam/io.c erts/emulator/sys/unix/sys.c erts/emulator/sys/vxworks/sys.c erts/emulator/test/port_SUITE.erl erts/etc/unix/cerl.src erts/preloaded/ebin/erlang.beam erts/preloaded/ebin/prim_inet.beam erts/preloaded/src/prim_inet.erl lib/hipe/cerl/erl_bif_types.erl lib/kernel/doc/src/inet.xml lib/kernel/src/inet.erl
2012-12-07Add support for busy port message queueRickard Green
2012-10-31erts,kernel: Implement socket option ipv6_v6only in erlang codeRaimo Niskanen
2012-08-27Merge branch 'maint'Lukas Larsson
* maint: Bumped version nr ssl & public_key: Workaround that some certificates encode countryname as utf8 and close down gracefully if other ASN-1 errors occur. Add more cross reference links to ct docs Remove config option from common_test args Update user config to use nested tuple keys Allow mixed IPv4 and IPv6 addresses to sctp_bindx Add checks for in6addr_any and in6addr_loopback Fix SCTP multihoming observer: fix app file (Noticed-by: Motiejus Jakstys) Fix lib/src/test/ssh_basic_SUITE.erl to fix IPv6 option typos Prevent index from being corrupted if a nonexistent item is deleted Add tests showing that trying to delete non-existing object may corrupt the table index Fix Table Viewer search crash on new|changed|deleted rows Escape control characters in Table Viewer Fix Table Viewer crash after a 'Found' -> 'Not found' search sequence inet_drv.c: Set sockaddr lengths in inet_set_[f]address Conflicts: erts/preloaded/ebin/prim_inet.beam
2012-08-16Allow mixed IPv4 and IPv6 addresses to sctp_bindxTomas Abrahamsson
Also allow mixed address families to bind, since the first address on a multihomed sctp socket must be bound with bind, while the rest are to be bound using sctp_bindx. At least Linux supports adding address of mixing families. Make inet_set_faddress function available also when HAVE_SCTP is not defined, since we use it to find an address for bind to be able to mix ipv4 and ipv6 addresses.
2012-04-16erts: Remove bit8 option from prim_inetBjörn-Egil Dahlberg
2012-03-30Update copyright yearsBjörn-Egil Dahlberg
2012-03-16prim_inet: Catch system_limit in open_portBjörn-Egil Dahlberg
Will catch system_limit and return error tuple instead. An uncaught exception would be an incorrect behaviour. This problem would occur for gen_tcp:listen/1,2 for example.
2011-12-01Implement ignorefd for TCPLukas Larsson
Ignore fd is a feature used by sendfile to temporarily remove all driver_select calls on that fd so that another driver can select on it. It also delays all actions which sends or receives data in that fd until in the fd is no longer ignored. Only the controlling_process should use the feature as it is otherwise possible that the ignore will never be cleaned up and hence create a memory leak in the driver. An ignored driver will not detect that an fd has been closed until it is unignored.
2011-12-01Create erlang fallback for sendfileLukas Larsson
Created erlang fallback for sendfile in gen_tcp and moved sendfile from file to gen_tcp. Also created testcases for testing all different options to sendfile. For info about how sendfile should work see the BSD man pages as they contain a more complete API than other *nixes.
2011-11-17erts,kernel: Return eprotonosupport when SCTP is not supportedRaimo Niskanen
It is better that gen_sctp:open/0-2 returns the informative Posix return code {error,eprotonosupport} than previously {error,badarg} when SCTP is not supported since it is so platform dependent.
2011-11-17erts,kernel: Implement gen_sctp:peeloff/2Raimo Niskanen
2011-11-17erts,kernel: Add type stream sockets to SCTPRaimo Niskanen
2011-11-16erts,kernel: Rename operations common to TCP and SCTPRaimo Niskanen
2011-03-11Update copyright yearsBjörn-Egil Dahlberg