Age | Commit message (Collapse) | Author |
|
* anders/diameter/pool/OTP-12428:
Fix SCTP match blunder in suites
Be backwards compatible with diameter_sctp listener state
Add gen_tcp testcase that fails sporadically
Simplify transport suite
Remove (ancient) dead code
Don't orphan slave nodes in example suite
Refresh example code
Improve language consistency in diameter(1)
Add pool suite to test transport_opt() pool_size
Adapt tcp/sctp transport modules for pool_size > 1
Add transport_opt() pool_size
|
|
* anders/diameter/shutdown/OTP-12412:
Increase service shutdown timeout
Set shutdown = infinity for supervisor children
Monitor more efficiently at shutdown
|
|
* anders/diameter/retransmission/OTP-12415:
Fix retransmission of messages sent as header/avps list
|
|
Clause matching error for specific test cases was harmless since the
subsequent clause also matched. Errors detected by the server result in
Failed-AVP being sent, which should not lead to a decode error in the
client.
|
|
Outgoing answers missing a Result-Code AVP or setting an E-bit
inappropriately were discarded, but there's no particular reason for
doing so if the answer can be encoded, and the sender has no way of
knowing that their answer has been discarded. It's also inappropriate
that the message be discarded in the relay case. Answers are now sent,
and an error counter incremented.
|
|
More than an incoming message can contain ancillary data, which the
gen_sctp and transport suites did not expect. On FreeBSD 10, an
sctp_assoc_change event appears always to contain ancillary data.
|
|
Commit 24993fc2 modified the state even in the case that the new
pool_size option the change was introduced to support was not used.
Doing so made downgrade impossible since old code would not be prepared
for the modified state. Retain a compatible state, so that simple code
replacement is enough.
|
|
On OS X at least. The testcase opens a listening socket, spawns 8
processes that call gen_tcp:accept/1, waits a couple of seconds, and
then spawns 8 processes that call gen_tcp:connect/3. Some of these
occasionally return {error, econnreset}.
|
|
Using the fact that transport processes can now be started concurrently.
The suite serialized starts itself when pretending to be diameter
starting a transport process.
|
|
Commit 9a671bf0 removed the need for diameter_sctp to send outgoing
messages through the listening process. That was prior to R5B02, so the
clause isn't need for any upgrade case.
|
|
Stops were aborted at the first failure.
|
|
Which hasn't received any attention for some time. Clean it up, rename
the poorly named peer.erl (it's Diameter *nodes* that are implemented),
and make the it possible to specify arbitrary transport configuration.
|
|
In particular, do away with unnecessary articles in the first sentence
of item lists.
|
|
With testcases that uses restrict_connections and pool_size config to
establish multiple connections between two Diameter nodes, checking for
the expected number of transport processes using
diameter:service_info/2.
|
|
In particular, that starts for the same transport reference can now be
concurrent. Looking up a listener process and starting a new one if not
found did handle this (more than one process could find no listener),
and diameter_sctp assumed there could only be one transport process
waiting for an association.
|
|
Transport processes are started by diameter one at a time. In the
listening case, a transport process accepts a connection, tells the
peer_fsm process, which tells its watchdog process, which tells its
service process, which then starts a new watchdog, which starts a new
peer_fsm, which starts a new transport process, which (finally) goes
about accepting another connection. In other words, not particularly
aggressive in accepting new connections. This behaviour doesn't do
particularly well with a large number of concurrent connections: with
TCP and 250 connecting peers we see connections being refused.
This commit adds the possibilty of configuring a pool of accepting
processes, by way of a new transport option, pool_size. Instead of
diameter:add_transport/2 starting just a single process, it now starts
the configured number, so that instead of a single process waiting for a
connection there's now a pool.
The option is even available for connecting processes, which provides an
alternate to adding multiple transports when multiple connections to the
same peer are required. In practice this also means configuring
{restrict_connections, false}: this is not implicit.
For backwards compatibility, the form of
diameter:service_info(_,transport) differs in the connecting case,
depending on whether or not pool_size is configured.
Note that transport processes for the same transport_ref() can be
started concurrently when pool_size > 1. This places additional
requirements on diameter_{tcp,sctp}, that will be dealt with in a
subsequent commit.
|
|
Extracting the End-to-End and Hop-by-Hop identifiers resulted in a
function clause error, causing the send to fail.
|
|
Shutting down the service causes DPR to be sent on all open transports
under the service. These in turn have a timeout for the reception of
DPA, but the timeout is bounded by the supervisor's in practice. Both
timeouts were 1 second. Increase the supervisor timeout to 5 seconds.
Note that the service supervisor is furthest to the right in the
supervision tree in diameter_sup. Thus is significant, so that the
transport-related processes aren't shutdown first.
|
|
As suggested in supervisor(3). The leaves of the supervision tree should
determine the timeouts.
|
|
There's no need for building a pid list only to map it to a list of
monitor references. Also, monitoring before banging the shutdown
message makes for better trace, avoiding unnecessary noproc reasons when
the process dies before the monitor is created.
|
|
|
|
|
|
OTP-12196 remote request table leak
OTP-12233 3xxx result code without E-bit
OTP-12281 ignored connect_timer
OTP-12308 filter ordering
|
|
* anders/diameter/filters/OTP-12308:
Order peers in pick_peer callbacks
|
|
* anders/diameter/connect_timer/OTP-12281:
Tweak reason in closed event
Fix ignored connect timer
Check {connect,watchdog}_timer distinction in event testcases
Rename reconnect_timer to connect_timer in examples and suites
|
|
* anders/diameter/3xxx/OTP-12233:
Fix handling of 3xxx Result-Code without E-bit
|
|
The order of peers presented to a diameter_app(3) pick_peer callback has
previously not been documented, but there are use cases that are
simplified by an ordering. For example, consider preferring a direct
connection to a specified Destination-Host/Realm to any host in the
realm. The implementation previously treated this as a special case by
placing matching hosts at the head of the peers list, but the
documentation made no guarantees. Now present peers in match-order, so
that the desired sorting is the result of the following filter.
{any, [{all, [host, realm]}, realm]}
The implementation is not backwards compatible in the sense that a realm
filter alone is no longer equivalent in this case. However, as stated,
the documentation never made any guarantees regarding the sorting.
|
|
From {error, Reason} to {no_connection, Reason} when a connection can't
be established. The exit reason of a diameter_peer_fsm process is turned
into a message from the corresponding diameter_watchdog process to the
relevant diameter_service process, the latter sending a 'closed' event
including the reason to any subscribers. Reason = [] when none of the
configured transport modules succeeds in establishing a connection,
which admittedly isn't terribly descriptive. (The lists is of error
reasons from transport start functions, which is empty as long as
transport processes start successfully.)
Note that this form of the closed event is undocumented, aside from the
documentation saying that one should expect undocumented events. The
explicitly documented forms are currently specific to CER/CEA failures.
|
|
There are two timers governing the establishment of peer connections:
connect_timer and watchdog_timer. The former is the RFC 6733 Tc timer
and is used by diameter_service to establish an initial connection. The
latter is RFC 3539 TwInit and is used by diameter_watchdog for
connection reestablishment after the watchdog leaves state INITIAL. A
connecting transport ignored the connect timer since the watchdog
process never died, regardless of the watchdog state, causing the
watchdog timer to handle reconnection.
This seems to have been broken for some time.
|
|
The connect timer is currently ignored by a connecting transport,
so the check causes one testcase to fail.
|
|
The timer was renamed in commit abea7186.
|
|
Commit 00584303 broke the population of the errors field of the
diameter_packet record when an incoming request with an
E-bit/Result-Code mismatch was decoded. Instead of the intended
{5004, #diameter_avp{value = integer()}},
the value was a 4-tuple containing the integer Result-Code.
|
|
An outgoing request whose pick_peer callback selected a transport on
another node resulted in an orphaned diameter_request entry on that
node.
|
|
|
|
* anders/diameter/17.3_release/OTP-12093:
Add recompilation admonition to 17.2 release notes
|
|
* anders/diameter/Failed-AVP/OTP-12094:
Fix ?MODULE in preprocessed dictionary forms
|
|
That dictionaries need to be recompiled, which is the case whenever
diameter_gen.hrl is modified.
|
|
By replacing literal diameter_gen_relay atoms in forms extracted from
that module by the name of the module in question. This has been wrong
for some time, but only became noticable when the parent commit started
using ?MODULE as more than a process dictionary key or tag to match on.
In particular, the function dict/1 in diameter_gen.hrl (included by
every dictionary module) can now return ?MODULE, which is (not
surprisingly) expected to be the name of the dictionary module in
question. It wasn't in the case of a module compiled from forms: it was
diameter_gen_relay, since that's the module the forms were extracted
from.
The fix only affects dictionaries compiled from forms, as returned by
diameter_make:codec/2. In particular, dictionaries compiled from Erlang
source returned by this function, or by diameterc(1), are unaffected.
|
|
* anders/diameter/17.3_release/OTP-12093:
vsn -> 1.7.1
Update appup for OTP-12094
Update appup for OTP-12080
Update appup for OTP-12069
|
|
* anders/diameter/5014/OTP-12074:
Don't leave extra bit in decoded AVP data
|
|
* anders/diameter/Failed-AVP/OTP-12094:
Fix best effort decode of Failed-AVP
Fix decode of Failed-AVP in RFC 3588 answer-message
|
|
* anders/diameter/counters/OTP-12080:
Fix counters for answer-message
Count relayed messages on {relay, Rbit}
Count request retransmissions
Fix counting of outgoing requests
|
|
The bit is added in diameter_codec to induce a decode error in the case
of 5014 errors, but was not removed before returning the decoded result.
Code examining the binary data in a diameter_avp record would then see
the extra bit.
|
|
|
|
diameter_codec must be loaded before diameter_traffic.
|
|
|
|
|
|
Commit c2c00fdd didn't get it quite right: it only decoded failed AVPs
in the common dictionary since it's this dictionary an answer-message is
decoded in. An extra dictionary isn't something that's easily passed
through the decode without rewriting dictionary compilation however, and
that's no small job, so continue with the use/abuse of the process
dictionary by storing the dictionary module for the decode to retrieve.
This is one step worse than previous uses since the dictionary is put in
one module (diameter_codec) and got in another (the dictionary module),
but it's the lesser of two evils.
|
|
Commit 066544fa had the unintended consequence of breaking the decode of
Failed-AVP in answer-message as defined in the RFC 3588, since the
grammar doesn't list Failed-AVP as an explicit component AVP, in
contrast to the RFC 6733 grammar, which does. Handle this case
explicitly, as an exception, just as with Failed-AVP as parent AVP.
|
|
An answer message that sets the E-bit is encoded/decoded with Diameter
common dictionary, using the answer-message grammar specified in the
RFC. However, the dictionary of the application in question is the one
that knows the command code of the message. Commit df19c272 didn't make
this distinction when incrementing counters for an answer-message, using
the common dictionary for both purposes, causing the message to be
counted as unknown. This commit remedies that.
|