Optimize operator '--' and yield on large inputs

The removal set now uses a red-black tree instead of an array on large inputs, decreasing runtime complexity from `n*n` to `n*log(n)`. It will also exit early when there are no more items left in the removal set, drastically improving performance and memory use when the items to be removed are present near the head of the list. This got a lot more complicated than before as the overhead of always using a red-black tree was unacceptable when either of the inputs were small, but this compromise has okay-to-decent performance regardless of input size. Co-authored-by: Dmytro Lytovchenko <[email protected]>
author: John Högberg <[email protected]> 2018-10-15 18:17:12 +0200
committer: Lukas Larsson <[email protected]> 2018-11-02 17:15:32 +0100
commit: bf5886b790f8f386ed425f543506a4bebb48448c (patch)
tree: b2ef444c7bc5fc21b9e1e4cf905041ca70832c78 /system
parent: db6059a9217767a6e42e93cec05089c0ec977d20 (diff)
download: otp-bf5886b790f8f386ed425f543506a4bebb48448c.tar.gz
otp-bf5886b790f8f386ed425f543506a4bebb48448c.tar.bz2
otp-bf5886b790f8f386ed425f543506a4bebb48448c.zip
2 files changed, 14 insertions, 48 deletions
diff --git a/system/doc/efficiency_guide/commoncaveats.xml b/system/doc/efficiency_guide/commoncaveats.xml
index b41ffc3902..367da09ba3 100644
--- a/system/doc/efficiency_guide/commoncaveats.xml
+++ b/system/doc/efficiency_guide/commoncaveats.xml
@@ -169,53 +169,5 @@ multiple_setelement(T0) ->
         {Bin1,Bin2} = split_binary(Bin, Num)</code>
    </section>
 
-  <section>
-    <title>Operator "--"</title>
-     <p>The "<c>--</c>" operator has a complexity
-     proportional to the product of the length of its operands.
-     This means that the operator is very slow if both of its operands
-     are long lists:</p>
-
-        <p><em>DO NOT</em></p>
-        <code type="none"><![CDATA[
-        HugeList1 -- HugeList2]]></code>
-
-     <p>Instead use the <seealso marker="stdlib:ordsets">ordsets</seealso>
-     module in STDLIB:</p>
-
-        <p><em>DO</em></p>
-        <code type="none">
-        HugeSet1 = ordsets:from_list(HugeList1),
-        HugeSet2 = ordsets:from_list(HugeList2),
-        ordsets:subtract(HugeSet1, HugeSet2)</code>
-
-     <p>Obviously, that code does not work if the original order
-     of the list is important. If the order of the list must be
-     preserved, do as follows:</p>
-
-        <p><em>DO</em></p>
-        <code type="none"><![CDATA[
-        Set = gb_sets:from_list(HugeList2),
-        [E || E <- HugeList1, not gb_sets:is_element(E, Set)]]]></code>
-
-     <note><p>This code behaves differently from "<c>--</c>"
-     if the lists contain duplicate elements (one occurrence
-     of an element in HugeList2 removes <em>all</em>
-     occurrences in HugeList1.)</p>
-     <p>Also, this code compares lists elements using the
-     "<c>==</c>" operator, while "<c>--</c>" uses the "<c>=:=</c>" operator.
-     If that difference is important, <c>sets</c> can be used instead of
-     <c>gb_sets</c>, but <c>sets:from_list/1</c> is much
-     slower than <c>gb_sets:from_list/1</c> for long lists.</p></note>
-
-     <p>Using the "<c>--</c>" operator to delete an element
-     from a list is not a performance problem:</p>
-
-        <p><em>OK</em></p>
-        <code type="none">
-        HugeList1 -- [Element]</code>
-
-   </section>
-
 </chapter>
 
diff --git a/system/doc/efficiency_guide/retired_myths.xml b/system/doc/efficiency_guide/retired_myths.xml
index 9b914a3b6e..144c942c2b 100644
--- a/system/doc/efficiency_guide/retired_myths.xml
+++ b/system/doc/efficiency_guide/retired_myths.xml
@@ -60,4 +60,18 @@
     That leads us to the myth that tail-recursive functions are faster
     than body-recursive functions.</p>
   </section>
+
+  <section>
+    <title>Myth: List subtraction ("--" operator) is slow</title>
+
+    <p>List subtraction used to have a run-time complexity proportional to the
+      product of the length of its operands, so it was extremely slow when both
+      lists were long.</p>
+
+    <p>As of OTP 22 the run-time complexity is "n log n" and the operation will
+      complete quickly even when both lists are very long. In fact, it is
+      faster and uses less memory than the commonly used workaround to convert
+      both lists to ordered sets before subtracting them with
+      <c>ordsets:subtract/2</c>.</p>
+  </section>
 </chapter>
author	John Högberg <[email protected]>	2018-10-15 18:17:12 +0200
committer	Lukas Larsson <[email protected]>	2018-11-02 17:15:32 +0100
commit	bf5886b790f8f386ed425f543506a4bebb48448c (patch)
tree	b2ef444c7bc5fc21b9e1e4cf905041ca70832c78 /system
parent	db6059a9217767a6e42e93cec05089c0ec977d20 (diff)
download	otp-bf5886b790f8f386ed425f543506a4bebb48448c.tar.gz otp-bf5886b790f8f386ed425f543506a4bebb48448c.tar.bz2 otp-bf5886b790f8f386ed425f543506a4bebb48448c.zip