Age | Commit message (Collapse) | Author |
|
* rickard/dist-system-limit/OTP-15708:
Fail when we cannot encode term in binary
|
|
Before this fix the process would continue to process
more distributed down or exit messages until it
ran out of reductions instead of being suspended
immediately.
|
|
Complementory fix to 922fd355831575965
|
|
Fail when we cannot encode term in binary instead of producing a
faulty result.
|
|
|
|
When the GC is run or the link/monitor node is deleted,
the terms associated with it will also be deallocated
so we have to make sure that we don't do any GCs and that
all link/monitor nodes are copied onto the heap before
the node is deallocated.
|
|
* sverker/master/enif_whereis_pid-dirty-dtor:
erts: Add test of enif_whereis* from resource destructor
erts: Simplify nif_SUITE:nif_whereis* tests
erts: Schedule resource destructors always
|
|
* max-au/erts/dirty_scheduler_shutdown/PR-2172/OTP-15690:
erts: release dirty runqueue lock before entering endless loop when BEAM is shutting down
|
|
shutting down
This patch fixes a problem happening when BEAM is shutting down. It is possible for a dirty scheduler to take the lock, and keep it, when the system is shutting down. It may also happen that a normal scheduler decides to schedule some dirty job (example is major garbage collection that results in migrating the process into dirty CPU queue), and hangs trying to take the lock that will never be released.
To fix the problem, either release the lock before entering endless wait loop, or reverse the order in which schedulers are stopped. Either fix works, and, of course, it works even better to apply both.
|
|
into sverker/master/enif_whereis_pid-dirty-dtor
|
|
to run user NIF code in a more known execution context.
Fixes problems like user calling enif_whereis_pid() in destructor
which may need to release process main lock in order to lock reg_tab.
|
|
'lukas/erts/fragment-dist-messages/OTP-13397/OTP-15610/OTP-15611/OTP-15612/OTP-15613'
* lukas/erts/fragment-dist-messages/OTP-13397/OTP-15610/OTP-15611/OTP-15612/OTP-15613:
erts: Add debug dist obuf memory leak check
win32: Fix ./otp_build debuginfo_win32
Make ld.sh on windows print better error reason
erts: Fix so that externals with creation 0 compare equal to all
erts: Expand etp to look for free processes
erts: Implement trapping while sending distr exit/down
erts: Add ERL_NODE_BOOKKEEP to node tables refc
erts: Refactor ErtsSendContext to be ErtsDSigSendContext
erts: Add distr testcases for fragmentation
erts: Make remote send of exit/2 trap
erts: Implement fragmentation of distrubution messages
erts: Expand distribution protocol documentation
erts: Move reason in dist messages to payload
erts: Remove a copy of distribution data payload
erts: Yield later during process exit and allow free procs to run
erts: Refactor rbt _yielding to use reductions
erts: Limit binary printout for %.XT in erts_print
|
|
The reason in EXIT and DOWN may be arbitrarily large,
so we yield and allow other processes to execute while
encoding and sending the signals over the distribution.
|
|
This commit removed the general send context (which was used
very little anyways) and only uses the distributed send context.
This will make it easier to use the dist API at the cost of
a little bit more code for the local send.
|
|
|
|
OTP-15610
|
|
|
|
* sverker/heart-nice-exit/OTP-15599:
erts: Avoid heart killing a nicely exiting emulator
|
|
Symptom:
Heart kills exiting emulator before is has flushed all ports
and with HEART_KILL_SIGNAL=SIGABRT it may also produce
unnecessary core dumps from doing init:reboot() for example.
Problem:
Heart port is closed together with all the others in handle_reap_ports()
which is detected by heart OS process.
Solution 1:
Leave the heart port alone in handle_reap_ports() and let it be closed
by OS when emulator exists. It doesn't need to be flushed anyway.
Solution 2:
When heart OS process gets EOF on connection let it wait max 5 seconds
for emulator process to self terminate before trying to kill it.
|
|
All of the Red-Black Tree _yielding functions have been
updated to work with reductions returned by the called
function instead of yielding on each element.
|
|
|
|
* lukas/erts/scheduler-pollset-fixes/OTP-15538:
erts: Fix getting of poll events on linux >= 4.15.0
erts: Use reduction based polling for starved poll-set
erts: Fix pollset test cases
|
|
When the schedulers never go to sleep (and thus never polls)
it may be that the fds in schedulers poll-sets are never polled.
Before this commit, this was solved by starting a timer when an
overload was detected. This had issues as overloads were not always
detected in time. So this commit reverts to the pre OTP-21 behaviour
so keep a global counter makes that the poll is called when it should.
|
|
* rickard/dirty_scheduler_collapse/OTP-15509:
Fix bug causing dirty scheduler sleeper list inconsistency
|
|
* maint:
Fix bug causing dirty scheduler sleeper list inconsistency
|
|
rickard/dirty_scheduler_collapse/maint-21/OTP-15509
* rickard/dirty_scheduler_collapse/OTP-15509:
Fix bug causing dirty scheduler sleeper list inconsistency
|
|
|
|
Conflicts:
erts/emulator/beam/erl_process.c
|
|
At start of the VM a poll-set that the schedulers
will check is created where fds that have triggered
many (at the moment, many means 10) times without
being deselected inbetween. In this scheduler specific
poll-set fds do not use ONESHOT, which means that the
number of syscalls goes down dramatically for such fds.
This pollset is introduced in order to handle fds that
are used by the erlang distribution and that never
change their state from {active, true}.
This pollset only handles ready_input events,
ready_output is still handled by the poll threads.
During overload, polling the scheduler poll-set is done
on a 10ms timer.
|
|
While the system will keep working, the aux work will never run
and the affected scheduler never goes to sleep. OTP-15446 is a
good example of this.
As this error easily flies under the radar it's best to make it
immediately visible. The assertions we had in debug builds were
clearly not enough to catch all sources of this problem.
|
|
The poll thread does a lot of waking up and then going
back to sleep. A large part of the waking up is managing
thread progress and a large part of that was using thread
specific data to get the thread progress data pointer.
With this refactor the tpd is passed to each of the functions
which greatly decreases the number of ethr_get_tsd calls
which in turn halves the CPU usage of the poller thread in
certain scenarios.
|
|
* sverker/erts/beautify-ifdef-DEBUG:
erts: Beautify away #ifdef DEBUG
|
|
"(void)result" will silence warning about unused variable
and compiler will optimize away such unused variables.
|
|
process_info(self(), ...)
It is possible that a process has to yield before completing process_info BIF when it runs out of reductions. If this BIF is called by the process itself, it does not send a signal but executes in the context of a process. If it has to yield, it turns F_LOCAL_SIGS_ONLY flag on, which means new signals won't be fetched from the outer message queue.
When the same process needs to execute dirty system code (e.g. dirty GC) it has to be run on a dirty scheduler. However signals enqueued into outer queue cause it to be rescheduled on a normal scheduler. F_LOCAL_SIGS_ONLY prevent outer queue signals delivery, creating an endless rescheduling loop.
This commit disengages F_LOCAL_SIG_ONLY if process needs to execute dirty code in order to complete signal delivery and allow process to be moved to dirty run queue.
|
|
* maint-21:
Updated OTP version
Update release notes
Update version numbers
Fix missing 'in' trace events during 'running' trace
|
|
'in' trace events could be lost when a process had to be
rescheduled on another scheduler type (normal <-> dirty).
|
|
A lot of erts internal messages used behind APIs to create
non-blocking calls, e.g. port_command, would cause the seq_trace
token to be cleared from the caller when it should not.
This commit fixes that and adds asserts that makes sure
that all messages sent have to correct token set.
Fixes: ERL-602
|
|
* john/erts/fix-dirty-reschedule-bug/OTP-15154:
Move to a dirty scheduler even when we have pending system tasks
|
|
into maint-20
* john/erts/fix-process-schedule-after-free/OTP-15067/ERL-573:
Don't enqueue system tasks if target process is in fail_state
Fix erroneous schedule of freed/exiting processes
Fix deadlock in run queue evacuation
Fix memory leak of processes that died in the run queue
|
|
When a system task was enqueued on a process between being
scheduled and the check altered in this commit, we'd run dirty
code on a normal scheduler as the RUNNING_SYS flag wasn't set
and we wouldn't migrate back.
This change migrates us to a dirty scheduler instead, which will
immediately bounce us back to a normal scheduler where
RUNNING_SYS will be set appropriately.
This is caught fairly reliably by
process_SUITE:system_task_failed_enqueue on machines with a lot of
cores.
|
|
into john/erts/merge-OTP-15067
|
|
The fail state wasn't re-checked in the state change loop; only
the FREE state was checked. In addition to that, we would leave
the task in the queue when bailing out which could lead to a
double-free.
This commit backports active_sys_enqueue from master to make it
easier to merge onwards.
|
|
When scheduled out, the process was never checked for the FREE state
before rescheduling, which meant that a system task could sneak in
and cause a double-free later on.
|
|
* sverker/system-profile-bug/OTP-15085:
erts: Fix bug in system_profile
|
|
If scheduler_data is not set correctly on normal schedulers
the code in erts_schedule_time_break and possibly others
will trigger asserts.
|
|
|
|
|
|
* sverker/system-profile-bug/OTP-15085:
erts: Fix bug in system_profile
|
|
seen to cause redundant {profile,_,active,_,_} messages
when process is terminating.
|
|
* rickard/delete_process_schedule/OTP-15081:
Do not hold runq lock while deleting a process
|