From bf5886b790f8f386ed425f543506a4bebb48448c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?John=20H=C3=B6gberg?= Date: Mon, 15 Oct 2018 18:17:12 +0200 Subject: Optimize operator '--' and yield on large inputs The removal set now uses a red-black tree instead of an array on large inputs, decreasing runtime complexity from `n*n` to `n*log(n)`. It will also exit early when there are no more items left in the removal set, drastically improving performance and memory use when the items to be removed are present near the head of the list. This got a lot more complicated than before as the overhead of always using a red-black tree was unacceptable when either of the inputs were small, but this compromise has okay-to-decent performance regardless of input size. Co-authored-by: Dmytro Lytovchenko --- system/doc/efficiency_guide/commoncaveats.xml | 48 --------------------------- 1 file changed, 48 deletions(-) (limited to 'system/doc/efficiency_guide/commoncaveats.xml') diff --git a/system/doc/efficiency_guide/commoncaveats.xml b/system/doc/efficiency_guide/commoncaveats.xml index b41ffc3902..367da09ba3 100644 --- a/system/doc/efficiency_guide/commoncaveats.xml +++ b/system/doc/efficiency_guide/commoncaveats.xml @@ -169,53 +169,5 @@ multiple_setelement(T0) -> {Bin1,Bin2} = split_binary(Bin, Num) -
- Operator "--" -

The "--" operator has a complexity - proportional to the product of the length of its operands. - This means that the operator is very slow if both of its operands - are long lists:

- -

DO NOT

- - -

Instead use the ordsets - module in STDLIB:

- -

DO

- - HugeSet1 = ordsets:from_list(HugeList1), - HugeSet2 = ordsets:from_list(HugeList2), - ordsets:subtract(HugeSet1, HugeSet2) - -

Obviously, that code does not work if the original order - of the list is important. If the order of the list must be - preserved, do as follows:

- -

DO

- - -

This code behaves differently from "--" - if the lists contain duplicate elements (one occurrence - of an element in HugeList2 removes all - occurrences in HugeList1.)

-

Also, this code compares lists elements using the - "==" operator, while "--" uses the "=:=" operator. - If that difference is important, sets can be used instead of - gb_sets, but sets:from_list/1 is much - slower than gb_sets:from_list/1 for long lists.

- -

Using the "--" operator to delete an element - from a list is not a performance problem:

- -

OK

- - HugeList1 -- [Element] - -
- -- cgit v1.2.3