Age | Commit message (Collapse) | Author |
|
|
|
|
|
Each service process maintains a dictionary of peers, mapping an
application alias to a {pid(), #diameter_caps{}} list of connected
peers. These lists are potentially large, peers were appended to the end
of the list for no particular reason, and these long lists were
constructed/deconstructed when filtering them for pick_peer callbacks.
Many simultaneous outgoing request could then slow the VM to a crawl,
with many scheduled processes mired in list manipulation.
The pseudo-dicts are now replaced by plain ets tables. The reason for
them was (once upon a time) to have an interface interchangeable with a
plain dict for debugging purposes, but strict swapablity hasn't been the
case for some time now, and in practice a swap has never taken place.
Additional tables mapping Origin-Host/Realm have also been introduced,
to minimize the size of the peers lists when peers are filtered on
host/realm. For example, a filter like
{any, [{all, [realm, host]}, realm]}
is probably a very common case: preferring a Destination-Realm/Host
match before falling back on Destination-Realm alone. This is now more
efficiently (but not equivalently) expressed as
{first, [{all, [realm, host]}, realm]}
to stop the search when the best match is made, and extracts peers from
host/realm tables instead of searching through the list of all peers
supporting the application in question. The code to try and start with a
lookup isn't exhaustive, and the 'any' filter is still as inefficient as
previously.
|
|
The diffs are all about adapting to the OTP 18 time interface. The code
was previously backwards compatible, falling back on the erlang:now/0 if
erlang:monotonic_time/0 is unavailable, but this was seen to be a bad
thing in commit 9c0f2f2c. Use of erlang:now/0 is now removed.
|
|
* anders/diameter/17/time/OTP-12926:
Simplify time manipulation
Remove use of monotonic time in pre-18 code
Remove unnecessary redefinition of erlang:max/2
|
|
* anders/diameter/grouped_errors/OTP-12930:
Fix decode of Grouped AVPs containing errors
Simplify logic
Simplify logic
|
|
* anders/diameter/lcnt/OTP-12912:
Make ets diameter_stats a set
Remove unnecessary sorting in stats suite
Set ets {write_concurrency, true} on diameter_stats
Don't start watchdog timers unnecessarily
Remove unnecessary erlang:monitor/2 qualification
Add missing watchdog suite clause
|
|
The ordering of (ets) diameter_stats (also unnecessary) ensures the
sorting.
|
|
By doing away with more wrapping that the parent commit started to
remove.
|
|
Commit c74b593a fixed the problem that a decoded deep diameter_avp list
couldn't be encoded, but did so in the wrong way: there's no need to
reencode component AVPs since the Grouped AVP itself already contains
the encoded binary. The blunder caused diameter_codec:pack_avp/1 to fail
if the first element of the AVP list to be encoded was itself a list.
Thanks to Andrzej TrawiĆski for reporting the problem.
|
|
The suite pretends to be gen_tcp-ish in configuring itself to
diameter_tcp. The function close/1 can be called as a result.
|
|
OTP-12845
* bruce/change-license:
fix errors caused by changed line numbers
Change license text to APLv2
|
|
* anders/diameter/sctp/OTP-12768:
Fix connection timeouts in test transports
Fix start order of alternate transports
Log discarded answers
Ensure accepting processes are first in, first out
Remove upgrade-related code
Be less parallel in traffic suite
Increase send/receive buffers for testsuite SCTP listeners
Decrease unnecessarily long testsuite timetraps
Simplify accepting transport start
Simplify peeloff signaling
Simplify socket close at terminate
Don't monitor listener after peeloff
Don't receive initial messages out of order
Remove assumption that SCTP association ids will be unique
|
|
* anders/diameter/grouped_errors/OTP-12721:
Fix decode of Grouped AVPs containing errors
Simplify logic
Simplify logic
|
|
Without a timeout, TCP/SCTP connect can take some time to fail, which
resulted in failures in the pool suite after the parent commit fixed the
previously faulty sctp-first-then-tcp connect.
|
|
At the current count, there are 128 groups run in parallel, each of
which runs 52 testcases in parallel. That makes for 128*52 = 6656
testcases, which is probably also a factor in the sporadic failures
addressed by the parent commit. Don't run the 128 groups in parallel.
|
|
The defaults result in sporadic timeouts in the traffic suite after
testing over SCTP was added in commit fadf753b. The behaviour looks to
be specific to SLES 11, and is presumably the same resends/congestion
that lead to the buffers being increased in the gen_sctp suite in commit
12febf13 (and commented in commit e931991f). The behaviour hasn't been
seen on SLES 10.
|
|
|
|
|
|
RFC 6733 says this of Failed-AVP in 7.5:
In the case where the offending AVP is embedded within a Grouped AVP,
the Failed-AVP MAY contain the grouped AVP, which in turn contains
the single offending AVP. The same method MAY be employed if the
grouped AVP itself is embedded in yet another grouped AVP and so on.
In this case, the Failed-AVP MAY contain the grouped AVP hierarchy up
to the single offending AVP. This enables the recipient to detect
the location of the offending AVP when embedded in a group.
It says this of DIAMETER_INVALID_AVP_LENGTH in 7.1.5:
The request contained an AVP with an invalid length. A Diameter
message indicating this error MUST include the offending AVPs
within a Failed-AVP AVP. In cases where the erroneous AVP length
value exceeds the message length or is less than the minimum AVP
header length, it is sufficient to include the offending AVP
header and a zero filled payload of the minimum required length
for the payloads data type. If the AVP is a Grouped AVP, the
Grouped AVP header with an empty payload would be sufficient to
indicate the offending AVP. In the case where the offending AVP
header cannot be fully decoded when the AVP length is less than
the minimum AVP header length, it is sufficient to include an
offending AVP header that is formulated by padding the incomplete
AVP header with zero up to the minimum AVP header length.
The AVPs placed in the errors field of a diameter_packet record are
intended to be appropriate for inclusion in a Failed-AVP, but neither of
the above paragraphs has been followed in the Grouped case: the entire
faulty AVP (non-faulty components and all) has been included. This made
it impossible to identify the actual faulty AVP in all but simple case.
This commit adapts the decode to the RFC, and implements the suggested
single faulty AVP, nested in as many Grouped containers as required.
The best-effort decode of Failed-AVP in answer messages, initially
implemented in commit 0f9cdbaf, is also applied.
|
|
|
|
* anders/diameter/test/OTP-12767:
Replace config suite call to erlang:now/0
Fix incorrect suite usage of OTP 18 monotonic time
Make tls suite crash more verbosely
|
|
* anders/diameter/sctp/OTP-12744:
Fix diameter_sctp listener race
Tweak transport suite failures
Run traffic suite over SCTP
|
|
Make anything but a comm_up sctp_assoc_change crash. Make timeouts more
reasonable.
|
|
Previously it was only run over TCP.
Configure a pool of accepting processes since simultaneous connections
are otherwise prone to rejection, as discussed in commit 4b691d8d.
Tweak timeouts to more reasonable values.
|
|
To remove a compilation warning with OTP 18.
|
|
Value was used as strictly increasing when it's only non-decreasing,
causing testcases to fail.
|
|
To see why it's failing on at least one test machine.
|
|
An incoming Diameter message is either a request, an answer to an
outstanding request, or an unexpected answer. The latter weren't
counted, but are now counted on keys of this form:
{pid(), {{unknown, 0}, recv, discarded}}
The form of the second element is similar to those of other counters,
like:
{{relay, 0|1}, send|recv, invalid_error_bit}
Compare this to the key used when counting known answers:
{{ApplicationId, CommandCode, 0}, recv}
The application id and command code aren't included so as not to count
on arbitrary keys, a topic last visited in commit 49e8b11c.
|
|
To differentiate between requests and answers, in analogy with relay
counters. This isn't backwards compatible, but these counters aren't yet
documented.
|
|
Which fails for a variety of reasons to be addressed in subsequent
commits.
|
|
Conflicts:
OTP_VERSION
erts/vsn.mk
lib/test_server/src/erl2html2.erl
|
|
* anders/diameter/17.5.3/OTP-12702:
Fix broken pre-17.4 appup
Update appup for 17.5.3
vsn -> 1.9.1
|
|
* anders/diameter/counters/OTP-12701:
Add counters testcase to 3xxx suite
Fix counting error with unknown application id
Add missing doc wording
|
|
The send_error testcase tested that Session-Id in an answer-message was
not undefined, but that's always the case since the AVP has arity 0 or
1. The correct test is that it's a list of length 1, to ensure that
diameter has inserted the session id as expected.
|
|
To ensure that the expected answer messages are received.
|
|
Decode of an answer message not setting the E-bit, and containing
Experiment-Result but not Result-Code, identified Result-Code as the
erroneous when Erroneous-Result-Code was 3xxx. Here's an example (from
trace) of a the errors field after decode:
[{5004,
{diameter_avp,undefined,undefined,false,false,undefined,'Result-Code',
3001,undefined,undefined}}],
The diameter_avp was just constructed from the AVP name and decoded
result, without regard for which result code AVP contained the value.
Fix by extracting the AVP from the incoming message.
|
|
Upgrade instructions have been added for each 17.X release without
adjusting the instructions for preceeding releases: the instructions
have only been sufficient to upgrading one release at a time: 17.0 to
17.1, 17.1 to 17.2, etc.
Conficting load order requirements make smooth upgrade from an
arbitrarily old release impossible. In this case, 17.3 looks to be as
far back as we can go, so require restart from 17.[0-2] or older.
Update the app suite to deal with binary regexps in appup, and to match
version numbers harder.
|
|
To start checking that the counters are counting what's expected. The
parent commit fixes a case in which they weren't.
|
|
|
|
As for the port number in the parent commit, a FQDN can't be arbitrarily
long, at most 255 octets. Make decode fail if it's more.
|
|
A port number is a 16-bit integer, but the regexp used to parse it in
commit 1590920 slavishly followed the RFC 6733 grammar in matching an
arbitrary number of digits. Make decode fail if it's anything more than
5, to avoid doing erlang:list_to_integer/1 on arbitrarily large lists.
Also make it fail if the resulting integer is outside of the expected
range.
|
|
To bound the length of incoming messages that will be decoded. A message
longer than the specified number of bytes is discarded. An
incoming_maxlen_exceeded counter is incremented to make note of the
occurrence.
The motivation is to prevent a sufficiently malicious peer from
generating significant load by sending long messages with many AVPs for
diameter to decode. The 24-bit message length header accomodates
(16#FFFFFF - 20) div 12 = 1398099
Unsigned32 AVPs for example, which the current record-valued decode is
too slow with in practice. A bound of 16#FFFF bytes allows for 5461
small AVPs, which is probably more than enough for the majority of
applications, but the default is the full 16#FFFFFF.
|
|
It was possible to configure the option, but doing so caused the service
to fail when starting a watchdog process:
{function_clause,
[{diameter_service,'-spawn_opts/1-lc$^0/1-0-',
[false],
[{file,"base/diameter_service.erl"},{line,846}]},
{diameter_service,start,5,
[{file,"base/diameter_service.erl"},{line,820}]},
{diameter_service,start,3,
[{file,"base/diameter_service.erl"},{line,782}]},
{diameter_service,handle_call,3,
[{file,"base/diameter_service.erl"},{line,385}]},
{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,607}]},
{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,639}]},
{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]}
Tests for the option in the config suite were also missing.
Bungled in commit 78b3dc6.
|
|
* anders/diameter/dpr/OTP-12609:
Discard incoming/outgoing requests after incoming DPR
Add transport_opt() dpr_timeout
Be lenient with errors in incoming DPR
|
|
Both RFC 3588 and 6733 disallow the combination. Make its encode fail.
|
|
* anders/diameter/string_decode/OTP-11952:
Let examples override default service options
Set {restrict_connections, false} in example server
Set {string_decode, false} in examples
Test {string_decode, false} in traffic suite
Add service_opt() string_decode
Strip potentially large terms when sending outgoing Diameter messages
Improve language consistency in diameter(1)
|
|
By adding string decode or not in the server or client as another
combination. Run all traffic cases in parallel: remove the sequential
tests. Common test seems unable to deal with {group, X, [parallel]}
within a group.
|
|
To control whether stringish Diameter types are decoded to string or
left as binary. The motivation is the same as in the parent commit: to
avoid large strings being copied when incoming Diameter messages are
passed between processes; or *if* in the case of messages destined for
handle_request and handle_answer callbacks, since these are decoded in
the dedicated processes that the callbacks take place in. It would be
possible to do something about other messages without requiring an
option, but disabling the decode is the most effective.
The value is a boolean(), true being the default for backwards
compatibility. Setting false causes both diameter_caps records and
decoded messages to contain binary() in relevant places that previously
had string(): diameter_app(3) callbacks need to be prepared for the
change.
The Diameter types affected are OctetString and the derived types that
can contain arbitrarily large values: OctetString, UTF8String,
DiameterIdentity, DiameterURI, IPFilterRule, and QoSFilterRule. Time and
Address are unaffected.
The DiameterURI decode has been redone using re(3), which both
simplifies and does away with a vulnerability resulting from the
conversion of arbitrary strings to atom.
The solution continues the use and abuse of the process dictionary for
encode/decode purposes, last seen in commit 0f9cdba.
|
|
To cause a peer connection to be closed following an outgoing DPA, in
case the peer fails to do so. It is the recipient of DPA that should
close the connection according to RFC 6733.
|