Age | Commit message (Collapse) | Author |
|
This improves the latency of file operations as dirty schedulers
are a bit more eager to run jobs than async threads, and use a
single global queue rather than per-thread queues, eliminating the
risk of a job stalling behind a long-running job on the same thread
while other async threads sit idle.
There's no such thing as a free lunch though; the lowered latency
comes at the cost of increased busy-waiting which may have an
adverse effect on some applications. This behavior can be tweaked
with the +sbwt flag, but unfortunately it affects all types of
schedulers and not just dirty ones. We plan to add type-specific
flags at a later stage.
sendfile has been moved to inet_drv to lessen the effect of a nasty
race; the cooperation between inet_drv and efile has never been
airtight and the socket dying at the wrong time (Regardless of
reason) could result in fd aliasing. Moving it to the inet driver
makes it impossible to trigger this by closing the socket in the
middle of a sendfile operation, while still allowing it to be
aborted -- something that can't be done if it stays in the file
driver.
The race still occurs if the controlling process dies in the short
window between dispatching the sendfile operation and the dup(2)
call in the driver, but it's much less likely to happen now.
A proper fix is in the works.
--
Notable functional differences:
* The use_threads option for file:sendfile/5 no longer has any
effect.
* The file-specific DTrace probes have been removed. The same
effect can be achieved with normal tracing together with the
nif__entry/nif__return probes to track scheduling.
--
OTP-14256
|
|
* john/erts/fix-close-eintr/OTP-14775:
Remove invalid EINTR loop around close(2)
|
|
Retrying close(2) on anything other than HP-UX is likely to close
something entirely different. POSIX says that the state of the file
descriptor is unspecified, and Linux/BSD guarantee that it's closed
on return.
|
|
On Solaris, giving a too long sfv_len results in an
EINVAL error, but data is still transmitted and len is
correctly. So we translate this to a success with that
amount of data sent. This may hide some other errors
that causes EINVAL, but it is the best we can do for now.
|
|
|
|
|
|
* henrik/update-copyrightyear:
update copyright-year
|
|
* bjorn/erts/huge-file-fix/OTP-13461:
Handle multi-giga byte writes to files
|
|
Test cases that write 4Gb to a file at once would fail on
OS X and FreeBSD.
By running a simple test program on OS X (El Capitan 10.11.4/Darwin
15.4.0), I found that writev() can handle more than 4Gb of data, while
write() only can handle less than 2Gb. (Note that efile_drv.c will use
write() if there is only one element in the io vector, and writev() if
there is more than one.)
It is tempting to attempt to piggy-back on the existing mechanism
for segmenting write operations in efile_drv.c, but because of the
complex code I find it too dangerous, both from a correctness and
performance perspective.
Instead do the change in unix_efile.c, which is considerably
simpler.
|
|
|
|
|
|
The syscall fdatasync does not work as intended on Mac OSX.
Both the function fsync and fdatasync now uses fcntl(fd, F_FULLFSYNC) on Mac OSX.
|
|
* theom/freebsd-sendfile-patch-2/OTP-13271:
erts: Fix sendfile:ing of large files on FreeBSD
|
|
If the file was larger than the OS send buffer the call
would fail before this patch.
|
|
Lots of pthread platforms unnecessarily falled back on the pipe/select
solution. This since we tried to use the same monotonic clock source
for pthread_cond_timedwait() as used by OS monotonic time. This has
been fixed on most platforms by using another clock source.
Darwin can however not use pthread_cond_timedwait() with monotonic
clock source and has to use the pipe/select solution. On darwin we
now use select with _DARWIN_UNLIMITED_SELECT in order to be able to
handle a large amount of file descriptors.
|
|
|
|
If the initial stat() fails then efile_openfile() will still proceed
to open() the file. If that succeeds and the caller passed a non-NULL
pSize, then it will copy bogus data from the statbuf into *pSize. This
has been observed to cause file:read_file/1 to return truncated file
data with no error indication.
The use case involved a large file system mounted via NFS, with some
directories containing large number of files, and NFS mount options
that allow the NFS client to return EIO if the NFS server does not
respond quickly enough. Depending on the caching state of the client
and server machines, a few stat() calls (fewer than 1 per 10 million)
would take long enough to trigger EIO errors, but subsequent open()
calls would succeed, and read_file/1 would return truncated data. This
sequence of events has been observed via "strace" on beam.smp.
Signed-off-by: Mikael Pettersson <[email protected]>
|
|
If writev return an error (eg ENOSPC) we do not want to abort here
but instead propagate upwards into erlang.
|
|
|
|
The sync option adds the POSIX O_SYNC flag to the open system call on
platforms that support the flag or its equivalent, e.g.,
FILE_FLAG_WRITE_THROUGH on Windows. For platforms that don't support it,
file:open/2 returns {error, enotsup} if the sync option is passed in.
The semantics of O_SYNC are platform-specific. For example, not all
platforms guarantee that all file metadata are written to the disk along
with the file data when the flag is in effect. This issue is noted in the
documentation this commit adds for the sync option.
Add a test for the sync option. Note however that the underlying OS
semantics for O_SYNC can't be tested automatically in any practical way, so
the test assumes the OS does the right thing with the flag when
present. For manual verification, dtruss on OS X and strace on Linux were
both run against beam processes to watch calls to open(), and file:open/2
was called in Erlang shells to open files for writing, both with and
without the sync option. Both the dtruss output and the strace output
showed that the O_SYNC flag was present in the open() calls when sync was
specified and was clear when sync was not specified.
|
|
|
|
|
|
This operation allows pre-allocation of space for files.
It succeeds only on systems that support such operation.
The POSIX standard defines the optional system call
posix_fallocate() to implement this feature. However,
some systems implement more specific functions to
accomplish the same operation.
On Linux, if the more specific function fallocate() is
implemented, it is used instead of posix_fallocate(),
falling back to posix_fallocate() if the fallocate()
call failed (it's only supported for the ext4, ocfs2,
xfs and btrfs file systems at the moment).
On Mac OS X it uses the specific fcntl() operation
F_PREALLOCATE, falling back to posix_fallocate() if
it's available (at the moment Mac OS X doesn't provide
posix_fallocate()).
On any other UNIX system, it uses posix_fallocate() if it's
available. Any other system not providing this system call
or any function to pre-allocate space for files, this operation
always fails with the ENOTSUP POSIX error.
|
|
When using values of sfv_len and sfv_off which are larger than
the file in question, sendfilev can sometimes return -1 and send
data. It seems to be only Oracle SunOS which this happens on.
|
|
|
|
Ensure displayed sizes are not negative.
|
|
|
|
The return value from efile_sendfile was not consistent
inbetween platforms. The API should now be working as it
was intended.
OTP-9994
|
|
* jz/error-logic-efile_sendfile:
erts: minor fix for unnecessary condition
OTP-9872
|
|
* jz/sendfile_chunk_size:
erts: change SENDFILE_CHUNK_SIZE from signed to unsigned
Conflicts:
erts/emulator/drivers/unix/unix_efile.c
OTP-9872
|
|
|
|
|
|
* ta/sendfile/OTP-9240:
Do not use async threads on DARWIN
Fix cleanup when sendfile process crashes
Return {error,closed} from sendfile if closed
Do not use SFV_NOWAIT as it does not exist on all solaris
Clarify some code comments
Make solaris use sendfilev
|
|
|
|
Both mtime and atime were incorrectly checked for zero
|
|
|
|
In "while (retval != -1 && retval == SENDFILE_CHUNK_SIZE)", "retval != -1" is pointless.
|
|
It's reasonable to use UL in SENDFILE_CHUNK_SIZE
|
|
Thanks Tuncer Ayaz
|
|
sendfilev is a richer API which allows us to
do non blocking TCP on solaris. The normal
sendfile API seems to have some issue with
non blocking sockets and the return value of
sendfile.
|
|
First stage in utc-time for prim_file.
|
|
Since the API for headers/trailers seem to be very awkward to
work with when using non-blocking io the feature is dropped
for now. See unix_efile.c for more details.
|
|
Have to figure out how to represent progress in header writing when
using non-blocking, not sure how to do this.
|
|
|
|
It is not possible to use the maximum size_t/off_t for the chunks
as that causes sendfile to return einval. 3GB seems to work on all
*nix platforms.
|
|
|
|
|
|
|
|
|
|
When there are no async threads sendfile will use the
ready_output select on the socket fd to know when to send
data.
The file_desc will also be put in the sending sendfile_state
which buffers all other commands to that file until the
sendfile is done.
|