aboutsummaryrefslogtreecommitdiffstats
path: root/erts/emulator/beam/erl_db_catree.c
AgeCommit message (Collapse)Author
2019-04-18Fix broken ETS test caseKjell Winblad
This commit fixes an ETS test case that tests the decentralized memory counter in tables of type ordered_set with the write_concurrency option turned on. The test case assumed that the memory consumption of the table would only grow monotonically when terms are inserted. However, this was not the case when the emulator was compiled in debug mode as random splits and joins of CA tree nodes could happen. This commit fixes the test case by disabling random splits and joins in the tested table.
2019-04-10Decentralized counters for ETS ordered_set with write_concurrencyKjell Winblad
Previously, all ETS tables used centralized counter variables to keep track of the number of items stored and the amount of memory consumed. These counters can cause scalability problems (especially on big NUMA systems). This commit adds an implementation of a decentralized counter and modifies the implementation of ETS so that ETS tables of type ordered_set with write_concurrency enabled use the decentralized counter. [Experiments][1] indicate that this change substantially improves the scalability of ETS ordered_set tables with write_concurrency enabled in scenarios with frequent `ets:insert/2` and `ets:delete/2` calls. The new counter is implemented in the module erts_flxctr (`erts_flxctr.h` and `erts_flxctr.c`). The module has the suffix flxctr as it contains the implementation of a flexible counter (i.e., counter instances can be configured to be either centralized or decentralized). Counters that are configured to be centralized are implemented with a single counter variable which is modified with atomic operations. Decentralized counters are spread over several cache lines (how many can be configured with the parameter `+dcg`). The scheduler threads are mapped to cache lines so that there is no single point of contention when decentralized counters are updated. The thread progress functionality of the Erlang VM is utilized to implement support for linearizable snapshots of decentralized counters. The snapshot functionality is used by the `ets:info/1` and `ets:info/2` functions. [1]: http://winsh.me/ets_catree_benchmark/flxctr_res.html
2019-03-11erts: Fix ets:select table fixation leak at owner changeSverker Eriksson
Symtom: ETS table remains fixed after finished ets:select* call. Problem: The decision to unfix table after a yielding ets:select* is based on table ownership, but ownership might have changed while ets:select* was yielding. Solution: Remember and pass along whether table was fixed when the traversal started.
2018-12-05erts: Fix volatile ets test case failures on debug VMSverker Eriksson
Symptom: Test cases with small key ranges sometimes failed on debug VM with: "No routing nodes in table? Debug feature 'ets_force_split' does not seem to work." Solution: Don't provoke randomly joins when force_split is set.
2018-11-16erts: Fix offheap leak of ets catree tmp iteration keySverker Eriksson
Also fix erts_debug:get_internal_status(node_and_dist_references) for catree to also search route node keys for offheap stuff.
2018-11-16erts: Refactor erl_db_catree.cSverker Eriksson
with some code moving and removed obsolete comments.
2018-11-01erts: Tidy some ordered_set iteration codeSverker Eriksson
with variable name changes, some comments and moved catree_find_prev_from_pb_key_root after its twin.
2018-11-01erts: Fix bug for catree iterationSverker Eriksson
with keys containing off-heap terms. The passed key may actually be the one already saved (if nodes have been joined), in which case we do nothing. Calling destroy_route_key() may destroy off-heap data.
2018-10-26erts: Remove lock ordering of catree base nodesSverker Eriksson
We no longer lock more than one base node at a time. We do however trylock a second base node at join.
2018-10-26erts: Join empty base nodes in catreeSverker Eriksson
The original implementation did not do this due to fear of bad performance. But we think the negative effect of "leaking" empty base nodes is more important to fix. To get the bad performance a special kind of access patterns is needed where base nodes are frequently emptied and then repopulated soon again. ets_SUITE:throughput_benchmark for example did not show any negative effect from this commit at all.
2018-10-23erts: Refactor DbUpdateHandle with nicer typesSverker Eriksson
2018-10-23erts: Refactor away function generating macros in erl_db_catree.cSverker Eriksson
Easier to read and debug, and about the same lines of code.
2018-10-23erts: Fix faulty assert in catree_find_nextprev_rootSverker Eriksson
It's possible to first find an empty base node and then retry and find the same base node as invalid. It's a benign race with join which first makes the old invalid 'neighbor' accessible from 'gparent' before replacing it with 'new_neighbor'.
2018-10-23erts: Provoke random catree split/join for DEBUG emulatorSverker Eriksson
2018-10-23erts: Fix lc_key in base nodesSverker Eriksson
to actually pass the copy to lock checker.
2018-10-23erts: Do contention adaptions during (updating) iterationsSverker Eriksson
Once an iteration key has been found, never fall back to first/last key in next/prev tree as trees may split or join under our feet. I.e we must always use previous key when searching for the next key.
2018-10-23erts: Add erts_debug feature 'ets_force_split'Sverker Eriksson
to easier generate a routing tree for test without having to spend cpu to provoke actual repeated lock conflicts.
2018-10-23erts: Implement ets:info(T, stats) for catreesSverker Eriksson
{RouteNodes, BaseNodes, MaxRouteTreeDepth}
2018-10-19erts: Remove dead tree merging codeSverker Eriksson
2018-10-19erts: Remove tree merging for print and foreach_offheapSverker Eriksson
2018-10-19erts: Remove tree merging for ets:slotSverker Eriksson
Brute force solution will always iterate tree from slot 0 and forward. ToDo1: Yield. ToDo2: Maybe optimize by caching AVL tree size in each base node.
2018-10-19erts: Remove tree merging for ets:first,last,next,prevSverker Eriksson
2018-10-19erts: Remove tree merging for ets:select*Sverker Eriksson
2018-10-03erts: Refactor out DbRouteKey structSverker Eriksson
to not abuse DbTerm.
2018-10-03erts: Remove dead code in erl_db_catreeSverker Eriksson
'parent' is route node and 'neighbor' is base node, they can never be equal. 'neighbor_parent' has already been set correctly for all cases.
2018-10-03erts: Fix bug in erl_db_catreeSverker Eriksson
Search stack must be cleared before retry.
2018-10-03erts: Add lock order check for route nodesSverker Eriksson
Lock order is reverse tree depth, from leafs toward root. This solution may eventually fail if running too long as route nodes do not increase their 'lc_order' in a join operation when they move up in the tree. But who runs a VM with lock-checker for such a long time?
2018-10-03erts: Add lock order check of erl_db_catree base nodesSverker Eriksson
Lock order is key term order, so each base node needs its own key if lock check is enabled.
2018-10-03erts: Add some ERTS_RESTRICT pointersSverker Eriksson
2018-10-03erts: Do some refactoring in erl_db_catree.cSverker Eriksson
Fewer variables with shorter names and prefer DbTableCATree over DbTableCommon.
2018-10-03erts: Improve deallocation of CATree nodesSverker Eriksson
Update table memory stats before scheduling free to not be dependent on deallocation order with main table struct.
2018-10-03erts: Refactor rename union in DbTableCATreeNodeSverker Eriksson
u as in union
2018-09-05Add a more scalable ETS ordered_set implementationKjell Winblad
The current ETS ordered_set implementation can quickly become a scalability bottleneck on multicore machines when an application updates an ordered_set table from concurrent processes [1][2]. The current implementation is based on an AVL tree protected from concurrent writes by a single readers-writer lock. Furthermore, the current implementation has an optimization, called the stack optimization [3], that can improve the performance when only a single process accesses a table but can cause bad scalability even in read-only scenarios. It is possible to pass the option {write_concurrency, true} to ets:new/2 when creating an ETS table of type ordered_set but this option has no effect for tables of type ordered_set without this commit. The new ETS ordered_set implementation, added by this commit, is only activated when one passes the options ordered_set and {write_concurrency, true} to the ets:new/2 function. Thus, the previous ordered_set implementation (from here on called the default implementation) can still be used in applications that do not benefit from the new implementation. The benchmark results on the following web page show that the new implementation is many times faster than the old implementation in some scenarios and that the old implementation is still better than the new implementation in some scenarios. http://winsh.me/ets_catree_benchmark/ets_ca_tree_benchmark_results.html The new implementation is expected to scale better than the default implementation when concurrent processes use the following ETS operations to operate on a table: delete/2, delete_object/2, first/1, insert/2 (single object), insert_new/2 (single object), lookup/2, lookup_element/2, member/2, next/2, take/2 and update_element/3 (single object). Currently, the new implementation does not have scalable support for the other operations (e.g., select/2). However, when these operations are used infrequently, the new implantation may still scale better than the default implementation as the benchmark results at the URL above shows. Description of the New Implementation ---------------------------------- The new implementation is based on a data structure which is called the contention adapting search tree (CA tree for short). The following publication contains a detailed description of the CA tree: A Contention Adapting Approach to Concurrent Ordered Sets Journal of Parallel and Distributed Computing, 2018 Kjell Winblad and Konstantinos Sagonas https://doi.org/10.1016/j.jpdc.2017.11.007 http://www.it.uu.se/research/group/languages/software/ca_tree/catree_proofs.pdf A discussion of how the CA tree can be used as an ETS back-end can be found in another publication [1]. The CA tree is a data structure that dynamically changes its synchronization granularity based on detected contention. Internally, the CA tree uses instances of a sequential data structure to store items. The CA tree implementation contained in this commit uses the same AVL tree implementation as is used for the default ordered set implementation. This AVL tree implementation is reused so that much of the existing code to implement the ETS operations can be reused. Tests ----- The ETS tests in `lib/stdlib/test/ets_SUITE.erl` have been extended to also test the new ordered_set implementation. The function ets_SUITE:throughput_benchmark/0 has also been added to this file. This function can be used to measure and compare the performance of the different ETS table types and options. This function writes benchmark data to standard output that can be visualized by the HTML page `lib/stdlib/test/ets_SUITE_data/visualize_throughput.html`. [1] More Scalable Ordered Set for ETS Using Adaptation. In Thirteenth ACM SIGPLAN workshop on Erlang (2014). Kjell Winblad and Konstantinos Sagonas. https://doi.org/10.1145/2633448.2633455 http://www.it.uu.se/research/group/languages/software/ca_tree/erlang_paper.pdf [2] On the Scalability of the Erlang Term Storage In Twelfth ACM SIGPLAN workshop on Erlang (2013) Kjell Winblad, David Klaftenegger and Konstantinos Sagonas https://doi.org/10.1145/2505305.2505308 http://winsh.me/papers/erlang_workshop_2013.pdf [3] The stack optimization works by keeping one preallocated stack instance in every ordered_set table. This stack is updated so that it contains the search path in some read operations (e.g., ets:next/2). This makes it possible for a subsequent ets:next/2 to avoid traversing some nodes in some cases. Unfortunately, the preallocated stack needs to be flagged so that it is not updated concurrently by several threads which cause bad scalability.