aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--erts/emulator/internal_doc/CarrierMigration.md50
1 files changed, 25 insertions, 25 deletions
diff --git a/erts/emulator/internal_doc/CarrierMigration.md b/erts/emulator/internal_doc/CarrierMigration.md
index 7afdb70aef..2a9594db25 100644
--- a/erts/emulator/internal_doc/CarrierMigration.md
+++ b/erts/emulator/internal_doc/CarrierMigration.md
@@ -16,12 +16,12 @@ When a carrier is empty, i.e. contains only one large free block, it
is deallocated. Since multiblock carriers can contain both allocated
blocks and free blocks at the same time, an allocator instance might
be stuck with a large amount of poorly utilized carriers if the memory
-load decrease. After a peak in memory usage it is expected that not
-all memory can be returned since the blocks still allocated is likely
+load decreases. After a peak in memory usage it is expected that not
+all memory can be returned since the blocks still allocated are likely
to be dispersed over multiple carriers. Such poorly utilized carriers
-can usually be reused if the memory load increase again. However,
+can usually be reused if the memory load increases again. However,
since each scheduler thread manages its own set of allocator
-instances, and memory load is not necessarily connected to CPU load we
+instances, and memory load is not necessarily correlated to CPU load, we
might get into a situation where there are lots of poorly utilized
multiblock carriers on some allocator instances while we need to
allocate new multiblock carriers on other allocator instances. In
@@ -50,13 +50,13 @@ the allocator instance manages. Free blocks in one specific carrier
can be referred to from potentially every other carrier that is
managed, and the amount of such references can be huge. That is, the
work of removing the free blocks of such a carrier from the search
-tree will be huge. One way of solving this could be to not migrate
+tree will be huge. One way of solving this could be not to migrate
carriers that contain lots of free blocks, but this would prevent us
-from migrating carriers that potentially needs to be migrated in order
+from migrating carriers that potentially need to be migrated in order
to solve the problem we set out to solve.
By using one data structure of free blocks in each carrier and an
-allocator instance wide data structure of carriers managed by the
+allocator instance-wide data structure of carriers managed by the
allocator instance, the work needed in order to remove and add
carriers can be kept to a minimum. When migration of carriers is
enabled on a specific allocator type, we require that an allocation
@@ -76,9 +76,9 @@ through a pool of carriers. In order for a carrier migration to
complete, one scheduler needs to move the carrier into the pool, and
another scheduler needs to take the carrier out of the pool.
-The pool is implemented as a lock free, circular, double linked,
+The pool is implemented as a lock-free, circular, double linked,
list. The list contains a sentinel which is used as the starting point
-when inserting to, or fetching from the pool. Carriers in the pool are
+when inserting to, or fetching from, the pool. Carriers in the pool are
elements in this list.
The list can be modified by all scheduler threads
@@ -108,19 +108,19 @@ all search operations need to read the content of the sentinel. If we
were to modify the sentinel, the cache line containing the sentinel
would unnecessarily be bounced between processors.
-The `prev`, and `next` fields in the elements of the list contains the
+The `prev` and `next` fields in the elements of the list contain the
value of the pointer, a modification marker, and a deleted
marker. Memory operations on these fields are done using atomic memory
operations. When a thread has set the modification marker in a field,
no-one except the thread that set the marker is allowed to modify the
-field. If multiple modification markers needs to be set, we always
+field. If multiple modification markers need to be set, we always
begin with `next` fields followed by `prev` fields in the order
following the actual pointers. This guarantees that no deadlocks will
occur.
When a carrier is being removed from a pool, we mark it with a thread
progress value that needs to be reached before we are allowed to
-modify the `next`, and `prev` fields. That is, until we reach this
+modify the `next` and `prev` fields. That is, until we reach this
thread progress we are not allowed to insert the carrier into the pool
again, and we are not allowed to deallocate the carrier. This ensures
that threads inspecting the pool always will be able to traverse the
@@ -130,12 +130,12 @@ threads may have references to it via the pool.
### Migration ###
-There exist one pool for each allocator type enabling migration of
+There exists one pool for each allocator type enabling migration of
carriers between scheduler specific allocator instances of the same
allocator type.
Each allocator instance keeps track of the current utilization of its
-multiblock carriers. When the utilization falls below the "abandon
+multiblock carriers. When the total utilization falls below the "abandon
carrier utilization limit" it starts to inspect the utilization of the
current carrier when deallocations are made. If also the utilization
of the carrier falls below the "abandon carrier utilization limit" it
@@ -163,7 +163,7 @@ If a carrier in the pool becomes empty, it will be withdrawn from the
pool. All carriers that become empty are also always passed to its
**owning** allocator instance for deallocation using the delayed
dealloc functionality. Since carriers this way always will be
-deallocated by the owner, that allocated the carrier, the
+deallocated by the owner that allocated the carrier, the
underlying functionality of allocating and deallocating carriers can
remain simple and doesn't have to bother about multiple threads. In a
NUMA system we will also not mix carriers originating from multiple
@@ -185,10 +185,10 @@ In short:
To harbor real time characteristics, searching the pool is
limited. We only inspect a limited number of carriers. If none of
those carriers had a free block large enough to satisfy the allocation
-request, the search will fail. A carrier in the pool can also be busy,
+request, the search will fail. A carrier in the pool can also be busy
if another thread is currently doing block deallocation work on the
carrier. A busy carrier will also be skipped by the search as it can
-not satisfy the request. The pool is lock free and we do not want to
+not satisfy the request. The pool is lock-free and we do not want to
block, waiting for the other thread to finish.
#### Before OTP 17.4 ####
@@ -212,9 +212,9 @@ may later be inserted into the pool due to bad utilization. If the
frequency of insertions into the pool is higher than successful
fetching from the pool, memory will eventually get exhausted.
-This "bad" state, consist of a cluster of small and/or highly
+This "bad" state consists of a cluster of small and/or highly
fragmented carriers located at the sentinel in the pool. The largest free
-block in such a "bad" carrier is rather small, making it not able to satisfy
+block in such a "bad" carrier is rather small, making it unable to satisfy
most allocation requests. As the search always started at the
sentinel, any such "bad" carriers that had been left in the pool would
eventually cluster together at the sentinel. All searches first
@@ -229,7 +229,7 @@ allocator itself and later had been abandoned to the pool. If none of
our own abandoned carrier would do, then the search continues into the
pool, as before, to look for carriers created by other
allocators. However, if we have at least one abandoned carrier of our
-own, that could not satisfy the request, we can use that as entry point
+own that could not satisfy the request, we can use that as entry point
into the pool.
The result is that we prefer carriers created by the thread itself,
@@ -237,15 +237,15 @@ which is good for NUMA performance. And we get more entry points when
searching the pool, which will ease contention and clustering.
To do the first search among own carriers, every allocator instance
-has two new lists; `pooled_list` and `traitor_list`. These lists are only
-accessed by the allocator itself and they only contain the allocators
+has two new lists: `pooled_list` and `traitor_list`. These lists are only
+accessed by the allocator itself and they only contain the allocator's
own carriers. When an owned carrier is abandoned and put in the
pool, it is also linked into `pooled_list`. When we search our
`pooled_list` and find a carrier that is no longer in the pool, we
move that carrier from `pooled_list` to `traitor_list` as it is now
employed by another allocator. If searching `pooled_list` fails, we
also do a limited search of `traitor_list`. When finding an abandoned
-carrier in `traitor_list` it is either employed, or moved back to
+carrier in `traitor_list` it is either employed or moved back to
`pooled_list` if it could not satisfy the allocation request.
When searching `pooled_list` and `traitor_list` we always start at the
@@ -256,14 +256,14 @@ allocator instance, they need no thread synchronization at all.
Furthermore, the search for own carriers that are scheduled
for deallocation is now done as the last search option. The idea is
-that it is better to reuse a poorly utilized carrier, than to
+that it is better to reuse a poorly utilized carrier than to
resurrect an empty carrier that was just about to be released back to
the OS.
### Result ###
The use of this strategy of abandoning carriers with poor utilization
-and reusing these in allocator instances with an increased carrier
+and reusing them in allocator instances with an increased carrier
demand is extremely effective and completely eliminates the problems
that otherwise sometimes occurred when CPU load dropped while memory
load did not.