Age | Commit message (Collapse) | Author |
|
* anders/diameter/sctp/OTP-12768:
Fix connection timeouts in test transports
Fix start order of alternate transports
Log discarded answers
Ensure accepting processes are first in, first out
Remove upgrade-related code
Be less parallel in traffic suite
Increase send/receive buffers for testsuite SCTP listeners
Decrease unnecessarily long testsuite timetraps
Simplify accepting transport start
Simplify peeloff signaling
Simplify socket close at terminate
Don't monitor listener after peeloff
Don't receive initial messages out of order
Remove assumption that SCTP association ids will be unique
|
|
* anders/diameter/grouped_errors/OTP-12721:
Fix decode of Grouped AVPs containing errors
Simplify logic
Simplify logic
|
|
Without a timeout, TCP/SCTP connect can take some time to fail, which
resulted in failures in the pool suite after the parent commit fixed the
previously faulty sctp-first-then-tcp connect.
|
|
A transport configured with diameter:add_transport/2 can be passed
multiple transport_module/transport_config tuples in order to specify
alternate configuration, modules being attempted in order until one
succeeds. This is primarily for the connecting case, to allow a
transport to be configured to first attempt connection over SCTP, and
then TCP in case SCTP fails, with configuration like that documented:
{transport_module, diameter_sctp},
{transport_config, [...], 5000},
{transport_module, diameter_tcp},
{transport_config, [...]}
If the options are the same in both cases, another possibility would be
configuration like this, which attaches the same transport_config to
both modules:
{transport_module, diameter_sctp},
{transport_module, diameter_tcp},
{transport_config, [...], 5000},
However, in this case the start order was reversed relative to the
documented order: first tcp, then sctp. This commit restores the
intended order.
OTP-12851
|
|
|
|
Not needed with the parent commit's restart_application.
|
|
To diameter_lib:log/4, which was last motivated in commit 39acfdb0.
|
|
|
|
A listener process in diameter_sctp starts accepting transport processes
as required, either as associations are established or as diameter asks
for a processes to be started. Since this can happen in any order, the
listener maintains two queues: one for processes that diameter has
requested and which are waiting to be given an association, another for
processes that have been started to become owners of an association but
are waiting for diameter to request them. Only one queue at a time is
non-empty. The first queue's length is bounded by the number of
accepting processes configured as pool_size. Entries in the second queue
are short-lived since diameter starts a replacement transport process
whenever an existing one dies or communicates that it has an
association.
The two queues were previously implemented in an ets ordered_set, whose
keys were the pid() of transport processes. Removing an element from the
queue was then done with ets:first/1. The problem with this it's not
really a queue: there's no guarantee that pid-ordering is the same as
the order in which processes are started. If it isn't then it's possible
that an established association never be given to diameter as a
transport process if there's always a newer association whose pid sorts
first. This isn't a problem in practice since it would require new
associations to be established faster than diameter starts transport
processes, but redo the implementation as a queue, with strict FIFO
semantics.
|
|
The changes in some of the previous commits assume application restart.
|
|
At the current count, there are 128 groups run in parallel, each of
which runs 52 testcases in parallel. That makes for 128*52 = 6656
testcases, which is probably also a factor in the sporadic failures
addressed by the parent commit. Don't run the 128 groups in parallel.
|
|
The defaults result in sporadic timeouts in the traffic suite after
testing over SCTP was added in commit fadf753b. The behaviour looks to
be specific to SLES 11, and is presumably the same resends/congestion
that lead to the buffers being increased in the gen_sctp suite in commit
12febf13 (and commented in commit e931991f). The behaviour hasn't been
seen on SLES 10.
|
|
|
|
Don't pass an association id that's no longer used.
|
|
In particular, don't give the accepting transport process the listening
socket. It was used to match the initial sctp message received in a
peeloff message, but replace the socket in the forwarded message
instead.
|
|
The existing code was a remnant of the pre-peeloff implementation.
There's no need to close anything but the whole socket.
|
|
Listener death should have no effect on a peeled off association.
|
|
Forwarding an sctp message from the listener process at the same time
that the controlling process is changed means there's no guarantee that
the message order will be preserved. Selectively receive the peeloff
message before entering the gen_server loop to ensure the order is
preserved.
|
|
This is not the case under Solaris for one: successive
associations can receive the same association id as a result of peeloff,
the id only being unique for the controlling port, not for the listening
port as is the case under Linux for example. This made for many failures
in the diameter test suites, the traffic suite in particular.
Peeloff in diameter_sctp was introduced in 9a671bf0, before which the
assumption was fine since it was the listening process that owned all
associations. (Which obviously had other drawbacks.) Other remnants of
the pre-peeloff implementation have also been removed: that the listener
process might receive a message on a socket after peeloff for one.
Peeloff in gen_sctp became available in commit 067cfe79, after the
original implementation of diameter_sctp.
This is trace on the unpatched code showing id reuse under Solaris:
+ {trace_ts,<0.103.0>,call,
{diameter_sctp,handle_info,
[{sctp,#Port<0.1625>,
{127,0,0,1},
35904,
{[],{sctp_assoc_change,comm_up,0,32,32,1}}},
{listener,#Ref<0.0.1.948>,#Port<0.1625>,4,
57384,
{-4,61481},
#Ref<0.0.8.12>,
[]}]},
{1432,458752,612168}}
+ {trace_ts,<0.103.0>,call,
{diameter_sctp,handle_info,
[{sctp,#Port<0.1625>,
{127,0,0,1},
35905,
{[],{sctp_assoc_change,comm_up,0,32,32,1}}},
{listener,#Ref<0.0.1.948>,#Port<0.1625>,4,
57384,
{-3,61481},
#Ref<0.0.8.12>,
[]}]},
{1432,458752,613042}}
The result was this, when the second association was incorrectly
forwarded to the first association's controlling process:
** {function_clause,
[{diameter_sctp,transition,
[{peeloff,#Port<0.1635>,
{sctp,#Port<0.1625>,
{127,0,0,1},
35892,
{[],{sctp_assoc_change,comm_up,0,32,32,1}}},
[]},
{transport,<0.107.0>,accept,#Port<0.1634>,1,undefined,{32,32},0}],
[{file,"transport/diameter_sctp.erl"},{line,561}]},
{diameter_sctp,t,2,[{file,"transport/diameter_sctp.erl"},{line,549}]},
{diameter_sctp,handle_info,2,
[{file,"transport/diameter_sctp.erl"},{line,397}]},
{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,614}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,680}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,238}]}]}
|
|
Init esdp->timer_wheel as NULL to please setup_aux_work_timer().
|
|
* rickard/time-api-doc-fix:
Minor doc fixes
|
|
|
|
|
|
* hb/dialyzer/fix_opaque_types/OTP-12493:
dialyzer: Expand opaque types before other types
|
|
Opaque types need to be expanded before record field types, otherwise
any() could be used for opaque types.
|
|
RFC 6733 says this of Failed-AVP in 7.5:
In the case where the offending AVP is embedded within a Grouped AVP,
the Failed-AVP MAY contain the grouped AVP, which in turn contains
the single offending AVP. The same method MAY be employed if the
grouped AVP itself is embedded in yet another grouped AVP and so on.
In this case, the Failed-AVP MAY contain the grouped AVP hierarchy up
to the single offending AVP. This enables the recipient to detect
the location of the offending AVP when embedded in a group.
It says this of DIAMETER_INVALID_AVP_LENGTH in 7.1.5:
The request contained an AVP with an invalid length. A Diameter
message indicating this error MUST include the offending AVPs
within a Failed-AVP AVP. In cases where the erroneous AVP length
value exceeds the message length or is less than the minimum AVP
header length, it is sufficient to include the offending AVP
header and a zero filled payload of the minimum required length
for the payloads data type. If the AVP is a Grouped AVP, the
Grouped AVP header with an empty payload would be sufficient to
indicate the offending AVP. In the case where the offending AVP
header cannot be fully decoded when the AVP length is less than
the minimum AVP header length, it is sufficient to include an
offending AVP header that is formulated by padding the incomplete
AVP header with zero up to the minimum AVP header length.
The AVPs placed in the errors field of a diameter_packet record are
intended to be appropriate for inclusion in a Failed-AVP, but neither of
the above paragraphs has been followed in the Grouped case: the entire
faulty AVP (non-faulty components and all) has been included. This made
it impossible to identify the actual faulty AVP in all but simple case.
This commit adapts the decode to the RFC, and implements the suggested
single faulty AVP, nested in as many Grouped containers as required.
The best-effort decode of Failed-AVP in answer messages, initially
implemented in commit 0f9cdbaf, is also applied.
|
|
Testing is_failed() is unnecessary since put/2 a second time will
return a previously put 'true'.
|
|
Failed == undefined implies is_failed() == true. This was true even when
the code was written, in commit c2c00fdd.
|
|
* egil/fix-erts_debug-disasm-select_tuple_arity:
erts: Fix erts_debug:df/1 in debug
|
|
* sverk/poll-grow:
erts: Refactor growth of fd tables
|
|
to have one common implementation for both _kp and _nkp.
|
|
* egil/cuddle-tests:
kernel: Remove ?line macros in inet_SUITE:t_gethostbyaddr/1
erts: Tweak statistics_SUITE:scheduler_wall_time/1
|
|
* egil/refactor-close:
erts: Add brackets to statement
|
|
|
|
* egil/change-event-default/OTP-12849:
erts: Do not preallocate too large event pool
|
|
* egil/license-compliance/OTP-12848:
Revert "lcnt: Let runq locks reflect actual call location"
Revert "Demote rare debug slogan of message discarding to debug build"
Revert "Add missing error string to syslog logging in epmd"
Revert "Add run queue index to process dump info"
Revert "Add thread index to allocator enomem dump slogan"
Revert "Add number of entries to mnesia copy debug message"
|
|
* legoscia/dbg-doc:
Fix markup in dbg.xml
|
|
Put the name of the functions linked to inside the seealso tag, not the word "see".
|
|
|
|
|
|
|
|
* rickard/io-bytes/OTP-12842:
Save IO bytes in scheduler specific data
|
|
* egil/lcnt-refactor/OTP-12846:
erts: Refactor LCNT
|
|
Sentinels in select_tuple_arity instructions are not proper
tuple arities and thus cannot be checked in debug.
Print them as small integers instead.
|
|
* egil/fix-compiler-beam_bool/OTP-12844:
compiler: Add regressions_SUITE
compiler: Fix beam_bool pass for get_map_elements
|
|
regressions_SUITE will have code snippets which previously crashed the compiler.
This commits includes a test for Maps crash in beam_bool.
|
|
Before beam_split the get_map_elements instruction is still in
blocks and the helper function in beam_jump did not reflect this.
Reported-by: Quviq twitter account
|
|
* nybek/fix_so_linger_zero__simple:
Update prim_inet.beam
Fix socket option {linger, {true, 0}} to abort TCP connections
Apply 'show_econnreset' socket option to send errors as well
Add 'show_econnreset' TCP socket option
|
|
|
|
This reverts commit efefd4bfda3156c6c19a61d7aa3d2f50a026d0e5.
Conflicts:
erts/emulator/beam/erl_process.h
|