diff options
author | John Högberg <[email protected]> | 2018-10-15 18:17:12 +0200 |
---|---|---|
committer | Lukas Larsson <[email protected]> | 2018-11-05 09:18:07 +0100 |
commit | d98da38562ec79360b58eed87eced3a506f1ff6d (patch) | |
tree | 4946a629854e4a65ae751bebc177390bc36450f8 /system/doc/efficiency_guide/commoncaveats.xml | |
parent | 53e7743216647d810d529e397bd3ea7278c6047c (diff) | |
download | otp-d98da38562ec79360b58eed87eced3a506f1ff6d.tar.gz otp-d98da38562ec79360b58eed87eced3a506f1ff6d.tar.bz2 otp-d98da38562ec79360b58eed87eced3a506f1ff6d.zip |
Optimize operator '--' and yield on large inputs
The removal set now uses a red-black tree instead of an array on
large inputs, decreasing runtime complexity from `n*n` to
`n*log(n)`. It will also exit early when there are no more items
left in the removal set, drastically improving performance and
memory use when the items to be removed are present near the head
of the list.
This got a lot more complicated than before as the overhead of
always using a red-black tree was unacceptable when either of the
inputs were small, but this compromise has okay-to-decent
performance regardless of input size.
Co-authored-by: Dmytro Lytovchenko <[email protected]>
Diffstat (limited to 'system/doc/efficiency_guide/commoncaveats.xml')
-rw-r--r-- | system/doc/efficiency_guide/commoncaveats.xml | 48 |
1 files changed, 0 insertions, 48 deletions
diff --git a/system/doc/efficiency_guide/commoncaveats.xml b/system/doc/efficiency_guide/commoncaveats.xml index c1d71e745a..854dd54660 100644 --- a/system/doc/efficiency_guide/commoncaveats.xml +++ b/system/doc/efficiency_guide/commoncaveats.xml @@ -169,53 +169,5 @@ multiple_setelement(T0) -> {Bin1,Bin2} = split_binary(Bin, Num)</code> </section> - <section> - <title>Operator "--"</title> - <p>The "<c>--</c>" operator has a complexity - proportional to the product of the length of its operands. - This means that the operator is very slow if both of its operands - are long lists:</p> - - <p><em>DO NOT</em></p> - <code type="none"><![CDATA[ - HugeList1 -- HugeList2]]></code> - - <p>Instead use the <seealso marker="stdlib:ordsets">ordsets</seealso> - module in STDLIB:</p> - - <p><em>DO</em></p> - <code type="none"> - HugeSet1 = ordsets:from_list(HugeList1), - HugeSet2 = ordsets:from_list(HugeList2), - ordsets:subtract(HugeSet1, HugeSet2)</code> - - <p>Obviously, that code does not work if the original order - of the list is important. If the order of the list must be - preserved, do as follows:</p> - - <p><em>DO</em></p> - <code type="none"><![CDATA[ - Set = gb_sets:from_list(HugeList2), - [E || E <- HugeList1, not gb_sets:is_element(E, Set)]]]></code> - - <note><p>This code behaves differently from "<c>--</c>" - if the lists contain duplicate elements (one occurrence - of an element in HugeList2 removes <em>all</em> - occurrences in HugeList1.)</p> - <p>Also, this code compares lists elements using the - "<c>==</c>" operator, while "<c>--</c>" uses the "<c>=:=</c>" operator. - If that difference is important, <c>sets</c> can be used instead of - <c>gb_sets</c>, but <c>sets:from_list/1</c> is much - slower than <c>gb_sets:from_list/1</c> for long lists.</p></note> - - <p>Using the "<c>--</c>" operator to delete an element - from a list is not a performance problem:</p> - - <p><em>OK</em></p> - <code type="none"> - HugeList1 -- [Element]</code> - - </section> - </chapter> |