Age | Commit message (Collapse) | Author |
|
A listener process in diameter_sctp starts accepting transport processes
as required, either as associations are established or as diameter asks
for a processes to be started. Since this can happen in any order, the
listener maintains two queues: one for processes that diameter has
requested and which are waiting to be given an association, another for
processes that have been started to become owners of an association but
are waiting for diameter to request them. Only one queue at a time is
non-empty. The first queue's length is bounded by the number of
accepting processes configured as pool_size. Entries in the second queue
are short-lived since diameter starts a replacement transport process
whenever an existing one dies or communicates that it has an
association.
The two queues were previously implemented in an ets ordered_set, whose
keys were the pid() of transport processes. Removing an element from the
queue was then done with ets:first/1. The problem with this it's not
really a queue: there's no guarantee that pid-ordering is the same as
the order in which processes are started. If it isn't then it's possible
that an established association never be given to diameter as a
transport process if there's always a newer association whose pid sorts
first. This isn't a problem in practice since it would require new
associations to be established faster than diameter starts transport
processes, but redo the implementation as a queue, with strict FIFO
semantics.
|
|
The changes in some of the previous commits assume application restart.
|
|
At the current count, there are 128 groups run in parallel, each of
which runs 52 testcases in parallel. That makes for 128*52 = 6656
testcases, which is probably also a factor in the sporadic failures
addressed by the parent commit. Don't run the 128 groups in parallel.
|
|
The defaults result in sporadic timeouts in the traffic suite after
testing over SCTP was added in commit fadf753b. The behaviour looks to
be specific to SLES 11, and is presumably the same resends/congestion
that lead to the buffers being increased in the gen_sctp suite in commit
12febf13 (and commented in commit e931991f). The behaviour hasn't been
seen on SLES 10.
|
|
|
|
Don't pass an association id that's no longer used.
|
|
In particular, don't give the accepting transport process the listening
socket. It was used to match the initial sctp message received in a
peeloff message, but replace the socket in the forwarded message
instead.
|
|
The existing code was a remnant of the pre-peeloff implementation.
There's no need to close anything but the whole socket.
|
|
Listener death should have no effect on a peeled off association.
|
|
Forwarding an sctp message from the listener process at the same time
that the controlling process is changed means there's no guarantee that
the message order will be preserved. Selectively receive the peeloff
message before entering the gen_server loop to ensure the order is
preserved.
|
|
This is not the case under Solaris for one: successive
associations can receive the same association id as a result of peeloff,
the id only being unique for the controlling port, not for the listening
port as is the case under Linux for example. This made for many failures
in the diameter test suites, the traffic suite in particular.
Peeloff in diameter_sctp was introduced in 9a671bf0, before which the
assumption was fine since it was the listening process that owned all
associations. (Which obviously had other drawbacks.) Other remnants of
the pre-peeloff implementation have also been removed: that the listener
process might receive a message on a socket after peeloff for one.
Peeloff in gen_sctp became available in commit 067cfe79, after the
original implementation of diameter_sctp.
This is trace on the unpatched code showing id reuse under Solaris:
+ {trace_ts,<0.103.0>,call,
{diameter_sctp,handle_info,
[{sctp,#Port<0.1625>,
{127,0,0,1},
35904,
{[],{sctp_assoc_change,comm_up,0,32,32,1}}},
{listener,#Ref<0.0.1.948>,#Port<0.1625>,4,
57384,
{-4,61481},
#Ref<0.0.8.12>,
[]}]},
{1432,458752,612168}}
+ {trace_ts,<0.103.0>,call,
{diameter_sctp,handle_info,
[{sctp,#Port<0.1625>,
{127,0,0,1},
35905,
{[],{sctp_assoc_change,comm_up,0,32,32,1}}},
{listener,#Ref<0.0.1.948>,#Port<0.1625>,4,
57384,
{-3,61481},
#Ref<0.0.8.12>,
[]}]},
{1432,458752,613042}}
The result was this, when the second association was incorrectly
forwarded to the first association's controlling process:
** {function_clause,
[{diameter_sctp,transition,
[{peeloff,#Port<0.1635>,
{sctp,#Port<0.1625>,
{127,0,0,1},
35892,
{[],{sctp_assoc_change,comm_up,0,32,32,1}}},
[]},
{transport,<0.107.0>,accept,#Port<0.1634>,1,undefined,{32,32},0}],
[{file,"transport/diameter_sctp.erl"},{line,561}]},
{diameter_sctp,t,2,[{file,"transport/diameter_sctp.erl"},{line,549}]},
{diameter_sctp,handle_info,2,
[{file,"transport/diameter_sctp.erl"},{line,397}]},
{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,614}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,680}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,238}]}]}
|
|
|
|
* anders/diameter/test/OTP-12767:
Replace config suite call to erlang:now/0
Fix incorrect suite usage of OTP 18 monotonic time
Make tls suite crash more verbosely
|
|
* anders/diameter/17.5.5/OTP-12757:
vsn -> 1.9.2
Update appup for 17.5.5
Fix mangled release note
|
|
* anders/diameter/sctp/OTP-12744:
Fix diameter_sctp listener race
Tweak transport suite failures
Run traffic suite over SCTP
|
|
* anders/diameter/counters/OTP-12741:
Fix counting of no_result_code/invalid_error_bit
Count relayed answers
Rename dictionary-related functions/variables
Lift answer send up the call chain
Count discarded incoming messages
Include R-bit in unknown message counter keys
Fix broken relay counters
Fix broken result code counters
Add counters testcase to relay suite
|
|
Commit 4b691d8d made it possible for accepting transport processes to be
started concurrently, and commit 77c1b162 adapted diameter_sctp to this,
but missed that the publication of the listener process in diameter_reg
has to precede the return of its start function. As a result, concurrent
starts could result in multiple listener processes.
|
|
Make anything but a comm_up sctp_assoc_change crash. Make timeouts more
reasonable.
|
|
Previously it was only run over TCP.
Configure a pool of accepting processes since simultaneous connections
are otherwise prone to rejection, as discussed in commit 4b691d8d.
Tweak timeouts to more reasonable values.
|
|
To remove a compilation warning with OTP 18.
|
|
Value was used as strictly increasing when it's only non-decreasing,
causing testcases to fail.
|
|
To see why it's failing on at least one test machine.
|
|
|
|
- OTP-12741: disfunctional counters
- OTP-12744: diameter_sctp race
No load order requirements.
|
|
|
|
The message was regarded as unknown if the answer message in question
set the E-bit and the application dictionary was not the common
dictionary.
|
|
That is, outgoing answer messages received in response to a
handle_request callback having returned {relay, Opts}.
|
|
To clarify what it is that's being computed, which isn't entirely
obvious. No functional change, just renaming.
|
|
As the first step in starting to count outgoing, relayed answer
messages.
|
|
An incoming Diameter message is either a request, an answer to an
outstanding request, or an unexpected answer. The latter weren't
counted, but are now counted on keys of this form:
{pid(), {{unknown, 0}, recv, discarded}}
The form of the second element is similar to those of other counters,
like:
{{relay, 0|1}, send|recv, invalid_error_bit}
Compare this to the key used when counting known answers:
{{ApplicationId, CommandCode, 0}, recv}
The application id and command code aren't included so as not to count
on arbitrary keys, a topic last visited in commit 49e8b11c.
|
|
To differentiate between requests and answers, in analogy with relay
counters. This isn't backwards compatible, but these counters aren't yet
documented.
|
|
Commit 49e8b11c broke the counting of relayed message, causing them to
be accumulated as unknown messages.
|
|
Commit a1df50b3 broke result code counters in the case of answer
messages sent as a header/avp lists (unless the avps, untypically, set
the name field), and for answers sent/received in the relay application.
|
|
|
|
* hans/ssh/ssh_msg_debug_fun/OTP-12738:
ssh: option for handling the SSH_MSG_DEBUG message's printouts
|
|
Which fails for a variety of reasons to be addressed in subsequent
commits.
|
|
A fun could be given in the options that will be called whenever
the SSH_MSG_DEBUG message arrives. This enables the user to
format the printout or just discard it.
The default is changed to not print the message. In RFC4253
printing is a SHOULD, but our new default is to protect logs
from dos attacs.
|
|
|
|
|
|
* anders/diameter/17.5.3/OTP-12702:
Fix broken pre-17.4 appup
Update appup for 17.5.3
vsn -> 1.9.1
|
|
* anders/diameter/counters/OTP-12701:
Add counters testcase to 3xxx suite
Fix counting error with unknown application id
Add missing doc wording
|
|
* anders/diameter/result_codes/OTP-12654:
Fix broken traffic testcase
Match harder in traffic suite
Don't confuse Result-Code and Experimental-Result
|
|
* anders/diameter/extra_avp_bit/OTP-12642:
Remove extra avp bit from diameter_avp decode
|
|
* dgud/common_test/netconf-user-caps/OTP-12707:
common_test: Add user capability option to hello
|
|
* peppe/common_test/ct_telnet_wait_for_prompt.maint:
Introduce wait_for_prompt option
|
|
* peppe/common_test/timetrap_line.maint:
Fix problem with line number not always showing in log
|
|
* peppe/common_test/modify_vts.maint:
Get the VTS mode working with private CT version of webtool
Change order of ct_run help sections
Prepare for webtool integration with CT
|
|
* peppe/common_test/longname_problem.maint:
Fix error in ct_logs, not handling long names correctly
|
|
* peppe/test_tools_vsn.maint:
Bump version numbers
|
|
* dgud/common_test/netconf/OTP-12698:
common_test: Recurse when there is more data in netconf
|