Age | Commit message (Collapse) | Author |
|
Whether making record declarations unreadable to compensate for
dialyzer's ignorance of match specs is worth it is truly debatable.
|
|
* anders/diameter/retransmission/OTP-13342:
Fix handling of shared peer connections in watchdog state SUSPECT
Remove unnecessary parentheses
Remove dead export
|
|
A peer connection shared from a remote node was regarded as being
available for peer selection (aka up) as long as its peer_fsm process
was alive; that is, for the lifetime of the peer connection. In
particular, it didn't take note of transitions into watchdog state
SUSPECT, when the connection remains. As a result, retransmissions could
select the same peer connection whose watchdog transition caused the
retransmission.
A service process now broadcasts a peer_down event just as it
does a peer_up event.
The fault predates the table rearrangements of commit 8fd4e5f4.
|
|
Each service process maintains a dictionary of peers, mapping an
application alias to a {pid(), #diameter_caps{}} list of connected
peers. These lists are potentially large, peers were appended to the end
of the list for no particular reason, and these long lists were
constructed/deconstructed when filtering them for pick_peer callbacks.
Many simultaneous outgoing request could then slow the VM to a crawl,
with many scheduled processes mired in list manipulation.
The pseudo-dicts are now replaced by plain ets tables. The reason for
them was (once upon a time) to have an interface interchangeable with a
plain dict for debugging purposes, but strict swapablity hasn't been the
case for some time now, and in practice a swap has never taken place.
Additional tables mapping Origin-Host/Realm have also been introduced,
to minimize the size of the peers lists when peers are filtered on
host/realm. For example, a filter like
{any, [{all, [realm, host]}, realm]}
is probably a very common case: preferring a Destination-Realm/Host
match before falling back on Destination-Realm alone. This is now more
efficiently (but not equivalently) expressed as
{first, [{all, [realm, host]}, realm]}
to stop the search when the best match is made, and extracts peers from
host/realm tables instead of searching through the list of all peers
supporting the application in question. The code to try and start with a
lookup isn't exhaustive, and the 'any' filter is still as inefficient as
previously.
|
|
See commit 862af31d.
|
|
|
|
Each service process maintains a dictionary of peers, mapping an
application alias to a {pid(), #diameter_caps{}} list of connected
peers. These lists are potentially large, peers were appended to the end
of the list for no particular reason, and these long lists were
constructed/deconstructed when filtering them for pick_peer callbacks.
Many simultaneous outgoing request could then slow the VM to a crawl,
with many scheduled processes mired in list manipulation.
The pseudo-dicts are now replaced by plain ets tables. The reason for
them was (once upon a time) to have an interface interchangeable with a
plain dict for debugging purposes, but strict swapablity hasn't been the
case for some time now, and in practice a swap has never taken place.
Additional tables mapping Origin-Host/Realm have also been introduced,
to minimize the size of the peers lists when peers are filtered on
host/realm. For example, a filter like
{any, [{all, [realm, host]}, realm]}
is probably a very common case: preferring a Destination-Realm/Host
match before falling back on Destination-Realm alone. This is now more
efficiently (but not equivalently) expressed as
{first, [{all, [realm, host]}, realm]}
to stop the search when the best match is made, and extracts peers from
host/realm tables instead of searching through the list of all peers
supporting the application in question. The code to try and start with a
lookup isn't exhaustive, and the 'any' filter is still as inefficient as
previously.
|
|
See commit 862af31d.
|
|
* anders/diameter/17.5.6.7/OTP-13211:
vsn -> 1.9.2.2
Update/fix appup for 17.5.6.7
Be resilient to diameter_service state upgrades
|
|
By not failing in code that looks up state: pick_peer and service_info.
|
|
* anders/diameter/M-bit/OTP-12947:
Add service_opt() strict_mbit
|
|
There are differing opinions on whether or not reception of an arbitrary
AVP setting the M-bit is an error. 1.3.4 of RFC 6733 says this about
how an existing Diameter application may be modified:
o The M-bit allows the sender to indicate to the receiver whether or
not understanding the semantics of an AVP and its content is
mandatory. If the M-bit is set by the sender and the receiver
does not understand the AVP or the values carried within that AVP,
then a failure is generated (see Section 7).
It is the decision of the protocol designer when to develop a new
Diameter application rather than extending Diameter in other ways.
However, a new Diameter application MUST be created when one or more
of the following criteria are met:
M-bit Setting
An AVP with the M-bit in the MUST column of the AVP flag table is
added to an existing Command/Application. An AVP with the M-bit
in the MAY column of the AVP flag table is added to an existing
Command/Application.
The point here is presumably interoperability: that the command grammar
should specify explicitly what mandatory AVPs much be understood, and
that anything more is an error.
On the other hand, 3.2 says thus about command grammars:
avp-name = avp-spec / "AVP"
; The string "AVP" stands for *any* arbitrary AVP
; Name, not otherwise listed in that Command Code
; definition. The inclusion of this string
; is recommended for all CCFs to allow for
; extensibility.
This renders 1.3.4 pointless unless "*any* AVP" is qualified by "not
setting the M-bit", since the sender can effectively violate 1.3.4
without this necessitating an error at the receiver. If clients add
arbitrary AVPs setting the M-bit then request handling becomes more
implementation-dependent.
The current interpretation in diameter is strict: if a command grammar
doesn't explicitly allow an AVP setting the M-bit then reception of such
an AVP is regarded as an error. The strict_mbit option now allows this
behaviour to be changed, false turning all responsibility for the M-bit
over to the user.
|
|
The diffs are all about adapting to the OTP 18 time interface. The code
was previously backwards compatible, falling back on the erlang:now/0 if
erlang:monotonic_time/0 is unavailable, but this was seen to be a bad
thing in commit 9c0f2f2c. Use of erlang:now/0 is now removed.
|
|
By doing away with more wrapping that the parent commit started to
remove.
|
|
|
|
To bound the length of incoming messages that will be decoded. A message
longer than the specified number of bytes is discarded. An
incoming_maxlen_exceeded counter is incremented to make note of the
occurrence.
The motivation is to prevent a sufficiently malicious peer from
generating significant load by sending long messages with many AVPs for
diameter to decode. The 24-bit message length header accomodates
(16#FFFFFF - 20) div 12 = 1398099
Unsigned32 AVPs for example, which the current record-valued decode is
too slow with in practice. A bound of 16#FFFF bytes allows for 5461
small AVPs, which is probably more than enough for the majority of
applications, but the default is the full 16#FFFFFF.
|
|
To control whether stringish Diameter types are decoded to string or
left as binary. The motivation is the same as in the parent commit: to
avoid large strings being copied when incoming Diameter messages are
passed between processes; or *if* in the case of messages destined for
handle_request and handle_answer callbacks, since these are decoded in
the dedicated processes that the callbacks take place in. It would be
possible to do something about other messages without requiring an
option, but disabling the decode is the most effective.
The value is a boolean(), true being the default for backwards
compatibility. Setting false causes both diameter_caps records and
decoded messages to contain binary() in relevant places that previously
had string(): diameter_app(3) callbacks need to be prepared for the
change.
The Diameter types affected are OctetString and the derived types that
can contain arbitrarily large values: OctetString, UTF8String,
DiameterIdentity, DiameterURI, IPFilterRule, and QoSFilterRule. Time and
Address are unaffected.
The DiameterURI decode has been redone using re(3), which both
simplifies and does away with a vulnerability resulting from the
conversion of arbitrary strings to atom.
The solution continues the use and abuse of the process dictionary for
encode/decode purposes, last seen in commit 0f9cdba.
|
|
* anders/diameter/time/OTP-12439:
Use new time api in test suites
Use new time api in implementation
|
|
* anders/diameter/pool/OTP-12428:
Fix SCTP match blunder in suites
Be backwards compatible with diameter_sctp listener state
Add gen_tcp testcase that fails sporadically
Simplify transport suite
Remove (ancient) dead code
Don't orphan slave nodes in example suite
Refresh example code
Improve language consistency in diameter(1)
Add pool suite to test transport_opt() pool_size
Adapt tcp/sctp transport modules for pool_size > 1
Add transport_opt() pool_size
|
|
In particular, deal with the deprecation of erlang:now/0 in OTP 18. Be
backwards compatible with older releases: the new api is only used when
available.
The test suites have not been modified.
|
|
Transport processes are started by diameter one at a time. In the
listening case, a transport process accepts a connection, tells the
peer_fsm process, which tells its watchdog process, which tells its
service process, which then starts a new watchdog, which starts a new
peer_fsm, which starts a new transport process, which (finally) goes
about accepting another connection. In other words, not particularly
aggressive in accepting new connections. This behaviour doesn't do
particularly well with a large number of concurrent connections: with
TCP and 250 connecting peers we see connections being refused.
This commit adds the possibilty of configuring a pool of accepting
processes, by way of a new transport option, pool_size. Instead of
diameter:add_transport/2 starting just a single process, it now starts
the configured number, so that instead of a single process waiting for a
connection there's now a pool.
The option is even available for connecting processes, which provides an
alternate to adding multiple transports when multiple connections to the
same peer are required. In practice this also means configuring
{restrict_connections, false}: this is not implicit.
For backwards compatibility, the form of
diameter:service_info(_,transport) differs in the connecting case,
depending on whether or not pool_size is configured.
Note that transport processes for the same transport_ref() can be
started concurrently when pool_size > 1. This places additional
requirements on diameter_{tcp,sctp}, that will be dealt with in a
subsequent commit.
|
|
There's no need for building a pid list only to map it to a list of
monitor references. Also, monitoring before banging the shutdown
message makes for better trace, avoiding unnecessary noproc reasons when
the process dies before the monitor is created.
|
|
The order of peers presented to a diameter_app(3) pick_peer callback has
previously not been documented, but there are use cases that are
simplified by an ordering. For example, consider preferring a direct
connection to a specified Destination-Host/Realm to any host in the
realm. The implementation previously treated this as a special case by
placing matching hosts at the head of the peers list, but the
documentation made no guarantees. Now present peers in match-order, so
that the desired sorting is the result of the following filter.
{any, [{all, [host, realm]}, realm]}
The implementation is not backwards compatible in the sense that a realm
filter alone is no longer equivalent in this case. However, as stated,
the documentation never made any guarantees regarding the sorting.
|
|
That is, instead of including the list in a diameter:service_info/2 info
tuple, only include the number of references and the number of bytes
referenced. The list itself can be quite large and typically isn't that
interesting, at least not to a diameter user.
|
|
To extract only process info from connections info, which can be useful
to reduce the amount of information returned.
Choose 'info' for the item since process_info is more than one word: all
others are one. Don't choose memory since it's too specific: might want
to use it for more.
|
|
To show process_info of interest. This is not yet documented since it
may well change.
|
|
The former were a little over-enthusiastic and could cause a node to be
logged to death if a peer Diameter node was sufficiently ill-willed.
The function calls are to diameter_lib:log/4, the arguments of which
identify the happening in question, and which does nothing but provide a
function to trace on. Many existing log calls have been shrunk.
The only remaining traffic-related report (hopefully) is that resulting
from {answer_errors, report} config, and this has been slimmed.
|
|
* anders/diameter/watchdog_leak/OTP-11934:
Simplify sending of 'close' to watchdog
Fix watchdog table leak
|
|
* anders/diameter/request_leak/OTP-11893:
Fix leaking request table
Add check that request table is empty to failover suite
Comment fix
|
|
There's no need to send the message immediately if there's no transport
configuration since that in itself means the service process will tell
the watchdogs to die.
|
|
Commit ef5fddcb (diameter-1.4.1, R16B) caused the leak in the case of an
accepting watchdog with restrict_connections = false. It (correctly)
ensured the state remained at INITIAL but a subsequent 'close' message
to terminate the process was ignored since the state was not DOWN. In
fact, no 'close' was sent since there was no state transition or
previous connection: the former triggers the message from
diameter_service, the latter from diameter_watchdog. The message is now
sent to self() from the watchdog itself.
Send 'close' in the same way when multiple connections to the same peer
are allowed, to avoid waiting for a watchdog timer expiry for the
process to terminate in this case.
|
|
A new connection writes the pid to the table diameter_request. The
normal handling is that loss of a connection leads to a watchdog state
change in the service process, which removes the entry, but this usually
won't happen in the case of diameter:stop_service/1 since the service
process is terminated without waiting for watchdog transitions.
The request table should really be service-specific, so that the table
is deleted when the service is stopped, which requires passing the table
identifier into request processes and handling that the table may not
exist. Just clear out the service-specific entries at service process
termination for now.
|
|
* anders/diameter/pick_peer/OTP-11789:
Fix pick_peer case clause failure
|
|
In the case of {call_mutates_state, true} configuration on the service
in question, any peer selection that failed to select a peer resulted in
a case clause failure in diameter_service:pick_peer/5, when the call to
the service process returned false. This was noticed in the case of a
peer failover in which an alternate peer wasn't available.
The explicit matching is intentional, to match exactly what's expected.
|
|
No longer needed to update code in runtime since the emulator is
restarted at a major release.
|
|
The former was misleading since the timer only applies to initial
connection attempts, reconnection attempts being governed by
watchdog_timer. The name is a historic remnant from a (dark, pre-OTP)
time in which RFC 3539 was followed less slavishly than it is now, and
the timer actually did apply to reconnection attempts.
Note that connect_timer corresponds to RFC 6733 Tc, while watchdog_timer
corresponds to RFC 3539 TwInit. The latter RFC makes clear that TwInit
should apply to reconnection attempts. It's less clear if only RFC 6733
is read.
Note also that reconnect_timer is still accepted for backwards
compatibility. It would be possible to add an option to make
reconnect_timer behave strictly as the name suggests (ie. ignore RFC 3539
and interpret RFC 6733 at face value; something that has some value for
testing at least) but no such option is implemented in this commit.
|
|
The option was morphed into a boolean() and then ignored.
|
|
That is, for the process that's spawned for each incoming Diameter
request message.
|
|
A service process maintains a table keyed on watchdog process pids. When
a watchdog process dies the corresponding entry should be removed but
this was broken in commit f115a9f7, causing entries with watchdog state
DOWN to accumulate.
Watchdog processes die as a result of diameter:remove_transport/2, or
when a peer reestablishes a connection in the listening case. Neither is
typically a frequent occurrence.
The fault manifests itself in the return value of
diameter:service_info(SvcName, transport), which displays entries for
watchdog processes that are no longer alive.
|
|
|
|
Allow both share_peers and use_shared_peers to be a list of nodes, or a
function that returns a list of nodes.
|
|
This is the functionality that allows transports to be shared between
identically-named services on different nodes, which has been neither
documented nor tested (until now).
|
|
Counters read by diameter:service_info(SvcName, transport) can be
selected at the same time as the diameter_stats server is folding them
into another key, possibly resulting in inaccurate values. Have
diameter_stats select from the server process to avoid this and add
diameter_stats:sum/1 to sum values from all contributors on a given
term.
|
|
|
|
Traffic handling is connected to the service implementation through the
pick_peer callback and failover but diameter_service was getting
unwieldy as home to both the service process and traffic handling.
|
|
In particular, remove fields containing values that are known (as of the
preceding commit) to the request process.
|
|
In order to be able to remove fields from the request process that don't
need to be there and do less in the service process. The pick_peer
callback now takes place in the request process in the case of immutable
state, just as in the case of the initial send.
|
|
Instead, use whatever dictionary a transport has configured as
supporting application id 0. This is to support the updated RFC 6733
dictionaries (which bring with them updated records) and also to be able
to transparently support any changed semantics (eg. 5xxx in
answer-message).
|
|
This was the result of the watchdog process exiting as a consequence of
peer death in some casesi, causing a restarted transport to enter
INITIAL when it should enter REOPEN. The watchdog now remains alive as
long as peer shutdown isn't requested and a 'close' message to the
service process (instead of watchdog death) generates 'closed' events
from the service.
|
|
|