Age | Commit message (Collapse) | Author |
|
* anders/diameter/capx_strictness/OTP-14257:
Add transport_opt() capx_strictness
|
|
To allow the Peer State Machine requirement that only the expected
capabilities exchange message be received in the relevant state to be
relaxed. If {capx_strictness, false} is configured then anything bu the
expected CER/CEA is ignored.
This is non-standard behaviour, and thusfar undocumented. Use at your
own risk.
|
|
When relaying outgoing requests through transport on a remote node,
terms that were stripped when sending to the transport process weren't
stripped when spawning a process on the remote node.
Also, don't save the request to the process dictionary in a process that
just relays an answer.
|
|
The table has existed forever, to route incoming answers to a waiting
request process: each outgoing request writes to the table, and each
incoming answer reads. This has been seen to suffer from lock contention
at high load however, so this commit moves the routing into the
diameter_peer_fsm processes that are diameter's conduit to transport
processes. The request table is still used for failover detection, but
entries are only written when a watchdog state transitions leaves or
enters state OKAY.
|
|
Commit 9a878743 addressed inefficiency at failover, but introduced
inefficiency in the sending of outgoing requests in so doing: each
outgoing request added an request table entry keyed on a transport pid,
then looked for a specific element with this key, and then (later)
removed the inserted element. Since the request table is a bag, this
results in linear searches over a potentially long list of element
keyed on the same pid. The higher the rate of outgoing calls, the more
costly it becomes.
Instead of writing entries to the request table, the peer_up/down calls
to diameter_traffic that mirror transitions to and from the OKAY state
in the RFC 3539 watchdog state machine now result in a process for
request processes to monitor in order to detect failover.
|
|
* anders/diameter/19.1/OTP-13838:
vsn -> 1.12.1
Update appup for 19.1
Fix xmllint errors in documentation
Remove documentation overkill
Don't run traffic tests in parallel when {string_decode, true}
Remove copyright from generated dictionary modules
Fix dictionary function typo
Fix dictionary typo in relay example
|
|
* anders/diameter/failover/OTP-13412:
Make peer failover more efficient
|
|
The setting in all diameter server processes has existed since the
beginning of time. Whether it's actually useful is questionable, but it
does lead to increased memory usage, especially if there are many peer
connections whose processes wouldn't otherwise be large. Let the setting
be disabled with -diameter min_heap_size false. (Or any value that isn't
a non-negative integer.)
The diameter application itself only calls
diameter_lib:spawn_opts(server, []), but let other arguments remain for
backwards compatibility, since diameter_lib:spawn_opts/2 has been abused
from outside of diameter.
|
|
It's '#get-'/2, not 'get-'/2. Only failed if the dictionary in question
defined no Failed-AVP, which is rarely the case in practice.
Thanks to Ferenc Holzhauser.
|
|
* anders/diameter/rand/OTP-13664:
Use rand(3) instead of random(3)
|
|
The latter is deprecated in OTP 19.
|
|
Not difficult to avoid, and better without.
|
|
To properly handle system messages. Initially implemented in commit
5ca5fb71.
|
|
The transport interface documented in diameter_transport(3) is used to
start/stop accepting/connecting transport processes: they're started
with a function call, and told to die with their parent process. In the
accepting case, both diameter_tcp and diameter_sctp start a listening
process when the first accepting transport is started. However, there's
no way for a listening process to find out that that it should stop
listening when transport configuration is removed.
Both diameter_tcp and diameter_sctp have used a timer to terminate the
listening process after all existing accepting processes have died as a
consequence of transport removal. The problem with this is that nothing
stops a new client from connecting before this, and also that no new
transport can succeed in opening the same listening port (eg.
reconfiguration) until the old listener dies.
This commit solves the problem by adding diameter_reg:subscribe/2, to
allow callers to subscribe to messages about added/removed associations.
A call to diameter:add_transport/2 results in a new child process that
registers a term that a listening process subscribes to. Transport
removal results in the death of the child, and the resulting
notification to the listener causes the latter to close its socket and
terminate.
This is still an internal interface, but the subscription mechanism
should probably be made external (eg. a diameter:subscribe/1 that can
be used to subscribe to specified messages), so that transport modules
other than diameter's own can make use of it. There is no support for
soft upgrade.
|
|
A replacement accepting transport could be started after the service
process received a shutdown message from diameter_config, if a
connection was accepted before the transport process in question was
terminated. The replacement lived on until the service needed to restart
it.
|
|
Letters are cheap.
|
|
To allow processes to subscribe to a message when a matching association
is added or removed. The intention is to use this in
diameter_{tcp,sctp}, in order for listening processes to find out when
transport their transport configuration has been removed.
|
|
Last missed in commit 25bef13f.
|
|
Unused, and in the way for what's to come.
|
|
Unexpected messages don't happen in practice, and no_auto_import is
neither necessery nor difficult to avoid.
|
|
* anders/diameter/info/OTP-13508:
Add diameter:peer_find/1
Add diameter:peer_info/1
|
|
* anders/diameter/overload/OTP-13330:
Suppress dialyzer warning
Remove dead case clause
Let throttling callback send a throttle message
Acknowledge answers to notification pids when throttling
Throttle properly with TLS
Don't ask throttling callback to receive more unless needed
Let a throttling callback answer a received message
Let a throttling callback discard a received message
Let throttling callback return a notification pid
Make throttling callbacks on message reception
Add diameter_tcp option throttle_cb
|
|
To return a peer_fsm/transport pair given one of them.
|
|
To return information about a single peer_ref(), to avoid having to
retrieve more than is needed with service_info/2.
|
|
Failover caused the entire request table to be scanned in search of
entries with the transport process in question. With many entries
(possibly as a result of the leak fixed in commit 6c9cbd96), this can
lead to the service process hanging in ets:select_trap/1, with memory
growth when many request processes write concurrently. Now write entries
keyed on the transport pid, so that finding request processes at
failover is a lookup rather than a select scanning the entire table.
There is no upgrade handling in that new code doesn't consider that old
code didn't write entries on the transport pid. Thus, a request whose
table entries were written in old code will timeout rather than failover
in new code. That is, there is a small window for failover to be missed
(since request processes are short-lived), but it requires that it take
place during the upgrade.
As a minor aside, don't ignore failovers when sending binaries (which
isn't officially supported), let prepare_retransmit callbacks deal with
modifying the binary as required.
|
|
|
|
By sending {diameter, {answer, pid()}} when an incoming answer is sent
to the specified pid, instead of a discard message as previously. The
latter now literally means that the message has been discarded.
|
|
In addition to returning ok or {timeout, Tmo}, let a throttling callback
for message reception return a pid(), which is then notified if the
message in question is either discarded or results in a request process.
Notification is by way of messages of the form
{diameter, discard | {request, pid()}}
where the pid is that of a request process resulting from the received
message. This allows the notification process to keep track of the
maximum number of request processes a peer connection can have given
rise to.
|
|
* anders/diameter/dialyzer/OTP-13400:
Fix dialyzer warnings
|
|
|
|
Whether making record declarations unreadable to compensate for
dialyzer's ignorance of match specs is worth it is truly debatable.
|
|
Whether making record declarations unreadable to compensate for
dialyzer's ignorance of match specs is worth it is truly debatable.
|
|
|
|
* anders/diameter/retransmission/OTP-13342:
Fix handling of shared peer connections in watchdog state SUSPECT
Remove unnecessary parentheses
Remove dead export
|
|
A peer connection shared from a remote node was regarded as being
available for peer selection (aka up) as long as its peer_fsm process
was alive; that is, for the lifetime of the peer connection. In
particular, it didn't take note of transitions into watchdog state
SUSPECT, when the connection remains. As a result, retransmissions could
select the same peer connection whose watchdog transition caused the
retransmission.
A service process now broadcasts a peer_down event just as it
does a peer_up event.
The fault predates the table rearrangements of commit 8fd4e5f4.
|
|
Not needed as of commit 6c9cbd96.
|
|
The export of diameter_traffic:failover/1 happened with the creation of
the module in commit e49e7acc, but was never needed since the calling
code was also moved into diameter_traffic.
|
|
Each service process maintains a dictionary of peers, mapping an
application alias to a {pid(), #diameter_caps{}} list of connected
peers. These lists are potentially large, peers were appended to the end
of the list for no particular reason, and these long lists were
constructed/deconstructed when filtering them for pick_peer callbacks.
Many simultaneous outgoing request could then slow the VM to a crawl,
with many scheduled processes mired in list manipulation.
The pseudo-dicts are now replaced by plain ets tables. The reason for
them was (once upon a time) to have an interface interchangeable with a
plain dict for debugging purposes, but strict swapablity hasn't been the
case for some time now, and in practice a swap has never taken place.
Additional tables mapping Origin-Host/Realm have also been introduced,
to minimize the size of the peers lists when peers are filtered on
host/realm. For example, a filter like
{any, [{all, [realm, host]}, realm]}
is probably a very common case: preferring a Destination-Realm/Host
match before falling back on Destination-Realm alone. This is now more
efficiently (but not equivalently) expressed as
{first, [{all, [realm, host]}, realm]}
to stop the search when the best match is made, and extracts peers from
host/realm tables instead of searching through the list of all peers
supporting the application in question. The code to try and start with a
lookup isn't exhaustive, and the 'any' filter is still as inefficient as
previously.
|
|
See commit 862af31d.
|
|
|
|
|
|
Each service process maintains a dictionary of peers, mapping an
application alias to a {pid(), #diameter_caps{}} list of connected
peers. These lists are potentially large, peers were appended to the end
of the list for no particular reason, and these long lists were
constructed/deconstructed when filtering them for pick_peer callbacks.
Many simultaneous outgoing request could then slow the VM to a crawl,
with many scheduled processes mired in list manipulation.
The pseudo-dicts are now replaced by plain ets tables. The reason for
them was (once upon a time) to have an interface interchangeable with a
plain dict for debugging purposes, but strict swapablity hasn't been the
case for some time now, and in practice a swap has never taken place.
Additional tables mapping Origin-Host/Realm have also been introduced,
to minimize the size of the peers lists when peers are filtered on
host/realm. For example, a filter like
{any, [{all, [realm, host]}, realm]}
is probably a very common case: preferring a Destination-Realm/Host
match before falling back on Destination-Realm alone. This is now more
efficiently (but not equivalently) expressed as
{first, [{all, [realm, host]}, realm]}
to stop the search when the best match is made, and extracts peers from
host/realm tables instead of searching through the list of all peers
supporting the application in question. The code to try and start with a
lookup isn't exhaustive, and the 'any' filter is still as inefficient as
previously.
|
|
See commit 862af31d.
|
|
* anders/diameter/17.5.6.7/OTP-13211:
vsn -> 1.9.2.2
Update/fix appup for 17.5.6.7
Be resilient to diameter_service state upgrades
|
|
* anders/diameter/request_leak/OTP-13137:
Fix request table leak at retransmission
Fix request table leak at exit signal
|
|
By not failing in code that looks up state: pick_peer and service_info.
|
|
|
|
* anders/diameter/request_leak/OTP-13137:
Fix request table leak at retransmission
Fix request table leak at exit signal
|
|
In the case of retranmission, a prepare_retransmit callback could modify
End-to-End and/or Hop-by-Hop identifiers so that the resulting
diameter_request entry was not removed, since the removal was of entries
with the identifiers of the original request. The chances someone doing
this in practice are probably minimal.
|
|
The storing of request records in the ets table diameter_request was
wrapped in a try/after so that the latter would unconditionally remove
written entries. The problem is that it didn't deal with the process
exiting as a result of an exit signal, since this doesn't raise in an
exception. Since the process in question applies callbacks to user code,
we can potentially be linked to other process and exit as a result.
Trapping exits changes the current behaviour of the process, so spawn a
monitoring process that cleans up upon reception of 'DOWN'.
|