Age | Commit message (Collapse) | Author |
|
* anders/diameter/dpr/OTP-12609:
Discard incoming/outgoing requests after incoming DPR
Add transport_opt() dpr_timeout
Be lenient with errors in incoming DPR
|
|
Despite claims of full backwards compatibility, the text of RFC 6733
changes the interpretation of unspecified values in a DiameterURI. In
particular, 3588 says that the default port and transport are 3868 and
sctp respectively, while 6733 says it's either 3868/tcp (aaa) or
5658/tcp (aaas). The 3588 defaults were used regardless, but now use
them only if the common dictionary is diameter_gen_base_rfc3588. The
6733 defaults are used otherwise.
This kind of change in the standard can lead to interop problems, since
a node has to know which RFC its peer is following to know that it will
properly interpret missing URI components. Encode of a URI includes all
components to avoid such confusion.
That said, note that the defaults in the diameter_uri record have *not*
been changed. This avoids breaking code that depends on them, but the
risk is that such code sends inappropriate values. The record defaults
may be changed in a future release, to force values to be explicitly
specified.
|
|
Both RFC 3588 and 6733 disallow the combination. Make its encode fail.
|
|
* anders/diameter/string_decode/OTP-11952:
Let examples override default service options
Set {restrict_connections, false} in example server
Set {string_decode, false} in examples
Test {string_decode, false} in traffic suite
Add service_opt() string_decode
Strip potentially large terms when sending outgoing Diameter messages
Improve language consistency in diameter(1)
|
|
* anders/diameter/route_record/OTP-12551:
Fix ordering of AVPs in relayed messages
|
|
To control whether stringish Diameter types are decoded to string or
left as binary. The motivation is the same as in the parent commit: to
avoid large strings being copied when incoming Diameter messages are
passed between processes; or *if* in the case of messages destined for
handle_request and handle_answer callbacks, since these are decoded in
the dedicated processes that the callbacks take place in. It would be
possible to do something about other messages without requiring an
option, but disabling the decode is the most effective.
The value is a boolean(), true being the default for backwards
compatibility. Setting false causes both diameter_caps records and
decoded messages to contain binary() in relevant places that previously
had string(): diameter_app(3) callbacks need to be prepared for the
change.
The Diameter types affected are OctetString and the derived types that
can contain arbitrarily large values: OctetString, UTF8String,
DiameterIdentity, DiameterURI, IPFilterRule, and QoSFilterRule. Time and
Address are unaffected.
The DiameterURI decode has been redone using re(3), which both
simplifies and does away with a vulnerability resulting from the
conversion of arbitrary strings to atom.
The solution continues the use and abuse of the process dictionary for
encode/decode purposes, last seen in commit 0f9cdba.
|
|
Both incoming and outgoing Diameter messages pass through two or three
processes, depending on whether they're incoming or outgoing: the
transport process and corresponding peer_fsm process and (for incoming)
watchdog processes. Since terms other than binary are copied when
passing process boundaries, large terms lead to copying that can be
problematic, if frequent enough. Since only the bin and transport_data
fields of a diameter_packet record are needed by the transport process,
discard others when sending outgoing messages.
Strictly speaking, the statement that only the aforementioned fields are
needed by the transport process depends on the transport process. It's
true of those implemented by diameter (in diameter_tcp and
diameter_sctp), but an implementation that makes use of other fields is
assuming more than the documentation in diameter_transport(3) promises.
|
|
|
|
With the same motivation as in commits 5bd2d72 and b1fd629.
As in the latter, incoming DPR is the only exception.
|
|
To cause a peer connection to be closed following an outgoing DPA, in
case the peer fails to do so. It is the recipient of DPA that should
close the connection according to RFC 6733.
|
|
To avoid having the peer interpret the error as meaning the connection
shouldn't be closed, which probably does more harm than ignoring
syntactic errors in the DPR.
Note that RFC 6733 says this about incoming DPR, in 5.4 Disconnecting
Peer Connections:
Upon receipt of the message, the Disconnect-Peer-Answer message is
returned, which SHOULD contain an error if messages have recently been
forwarded, and are likely in flight, which would otherwise cause a race
condition.
The race here is presumably between answers to forwarded requests and
the outgoing DPA, but we have no handling for this: whether or not there
are pending answers is irrelevant to how DPR is answered. It's
questionable that a peer should be able to prevent disconnection in any
case: it has to be the node sending DPR that decides if it's approriate,
and the peer should take it as an indication of what's coming.
Incoming DPA is already treated leniently: the only error that's not
ignored is mismatching End-to-End and Hop-by-Hop Identifiers, since
there's no distinguishing an erroneous value from an unsolicited DPA.
This mismatch could also be ignored, which is the case for DWA for
example, but this problem is already dealt with by dpa_timeout, which
causes a connection to be closed even when the expected DPA isn't
received.
|
|
* anders/diameter/dpr/OTP-12542:
Discard CER or DWR sent with diameter:call/4
Allow DPR to be sent with diameter:call/4
Add transport_opt() dpa_timeout
Add testcase for sending DPR with diameter:call/4
|
|
* anders/diameter/dpr/OTP-12543:
Discard incoming requests after outgoing DPR
Discard outgoing requests after outgoing DPR
|
|
6.1.9 of RFC 6733 states this:
A relay or proxy agent MUST append a Route-Record AVP to all requests
forwarded.
The AVP was inserted as the head of the AVP list, not appended, since
the entire AVP list was reversed relative to the received order.
Thanks to Andrzej TrawiĆski.
|
|
These are requests that diameter itself sends. It's previously been
possible to send them, but answers timed out at the caller since they
were discarded in diameter_watchdog. Answers will still timeout, but now
the requests are discarded before being sent.
|
|
DPR is sent by diameter at application shutdown, service stop, or
transport removal. It has been possible to send the request with
diameter:call/4, but the answer was discarded, instead of the transport
process being terminated. This commit causes DPR to be handled in the
same way regardless of whether it's sent by diameter or by
diameter:call/4.
Note that the behaviour subsequent to DPA is unchanged. In particular,
in the connecting case, the closed connection will be reestablished
after a connect_timer expiry unless the transport is removed. The more
probable use case is the listening case, to disconnect a single peer
associated with a listening transport.
|
|
To make the default DPA timeout configurable. The timeout say how many
milliseconds to wait for DPA in response to an outgoing DPR before
terminating the transport process regardless.
|
|
Since there's a race between an answer being sent and the connection
being closed upon the reception of DPA that's likely to be lost, and
because of the questionability of sending messages after DPR, as
discussed in the parent commit. An exception is made for DPR so that
simultaneous DPR in both directions doesn't result in it being discarded
on both ends.
|
|
RFC 6733 isn't terribly clear about what should happen to incoming or
outgoing messages once DPR is sent and the Peer State Machine
transitions into state Closing. There's no event for this in section
5.6, Peer State Machine, and no clarification in section 5.4,
Disconnecting Peer Connections. There is a little bit of discussion in
2.1.1, SCTP Guidelines, in relation to unordered message delivery, but
the tone there is that messages might be received after DPR because of
unordered delivery, not because they were actually sent after DPR.
Discarding outgoing answers may do more harm than good, but requests are
more likely to be unexpected, as has been seen to be the case with DWR
following DPR. DPR indicates a desire to close the connection: discard
any subsequent outgoing requests.
|
|
|
|
* anders/diameter/time/OTP-12439:
Adapt to changes in time api
|
|
* rickard/time_api/OTP-11997: (22 commits)
Update primary bootstrap
inets: Suppress deprecated warning on erlang:now/0
inets: Cleanup of multiple copies of functions Add inets_lib with common functions used by multiple modules
inets: Update comments
Suppress deprecated warning on erlang:now/0
Use new time API and be back-compatible in inets Remove unused functions and removed redundant test
asn1 test SUITE: Eliminate use of now/0
Disable deprecated warning on erlang:now/0 in diameter_lib
Use new time API and be back-compatible in ssh
Replace all calls to now/0 in CT with new time API functions
test_server: Replace usage of erlang:now() with usage of new API
Replace usage of erlang:now() with usage of new API
Replace usage of erlang:now() with usage of new API
Replace usage of erlang:now() with usage of new API
Replace usage of erlang:now() with usage of new API
otp_SUITE: Warn for calls to erlang:now/0
Replace usage of erlang:now() with usage of new API
Multiple timer wheels
Erlang based BIF timer implementation for scalability
Implement ethread events with timeout
...
Conflicts:
bootstrap/bin/start.boot
bootstrap/bin/start_clean.boot
bootstrap/lib/compiler/ebin/beam_asm.beam
bootstrap/lib/compiler/ebin/compile.beam
bootstrap/lib/kernel/ebin/auth.beam
bootstrap/lib/kernel/ebin/dist_util.beam
bootstrap/lib/kernel/ebin/global.beam
bootstrap/lib/kernel/ebin/hipe_unified_loader.beam
bootstrap/lib/kernel/ebin/inet_db.beam
bootstrap/lib/kernel/ebin/inet_dns.beam
bootstrap/lib/kernel/ebin/inet_res.beam
bootstrap/lib/kernel/ebin/os.beam
bootstrap/lib/kernel/ebin/pg2.beam
bootstrap/lib/stdlib/ebin/dets.beam
bootstrap/lib/stdlib/ebin/dets_utils.beam
bootstrap/lib/stdlib/ebin/erl_tar.beam
bootstrap/lib/stdlib/ebin/escript.beam
bootstrap/lib/stdlib/ebin/file_sorter.beam
bootstrap/lib/stdlib/ebin/otp_internal.beam
bootstrap/lib/stdlib/ebin/qlc.beam
bootstrap/lib/stdlib/ebin/random.beam
bootstrap/lib/stdlib/ebin/supervisor.beam
bootstrap/lib/stdlib/ebin/timer.beam
erts/aclocal.m4
erts/emulator/beam/bif.c
erts/emulator/beam/erl_bif_info.c
erts/emulator/beam/erl_db_hash.c
erts/emulator/beam/erl_init.c
erts/emulator/beam/erl_process.h
erts/emulator/beam/erl_thr_progress.c
erts/emulator/beam/utils.c
erts/emulator/sys/unix/sys.c
erts/preloaded/ebin/erlang.beam
erts/preloaded/ebin/erts_internal.beam
erts/preloaded/ebin/init.beam
erts/preloaded/src/erts_internal.erl
lib/common_test/test/ct_hooks_SUITE_data/cth/tests/empty_cth.erl
lib/diameter/src/base/diameter_lib.erl
lib/kernel/src/os.erl
lib/ssh/test/ssh_basic_SUITE.erl
system/doc/efficiency_guide/advanced.xml
|
|
The code itself is backwards compatible with OTP 17, since development
is still largely based on 17. Updates for the new time api in OTP 18
were merged into maint in commit 5e5b2221, and on to master in commit
ebf24297.
Conflicts:
lib/diameter/src/base/diameter_lib.erl
|
|
erlang:convert_time_resolution/3 has been renamed to convert_time_unit.
erlang:time_resolution/0 has been removed: use new time resolution
values instead.
|
|
* anders/diameter/grouped_decode/OTP-12475:
Allow encode of decoded diameter_avp list
Add testcases for diameter_avp decode
Fix handling of length errors on Grouped AVPs
Don't discard component diameter_avp list on Grouped AVP decode error
Fix process dictionary manipulation during message decode
|
|
* anders/diameter/time/OTP-12439:
Use new time api in test suites
Use new time api in implementation
|
|
* anders/diameter/pool/OTP-12428:
Fix SCTP match blunder in suites
Be backwards compatible with diameter_sctp listener state
Add gen_tcp testcase that fails sporadically
Simplify transport suite
Remove (ancient) dead code
Don't orphan slave nodes in example suite
Refresh example code
Improve language consistency in diameter(1)
Add pool suite to test transport_opt() pool_size
Adapt tcp/sctp transport modules for pool_size > 1
Add transport_opt() pool_size
|
|
* anders/diameter/shutdown/OTP-12412:
Increase service shutdown timeout
Set shutdown = infinity for supervisor children
Monitor more efficiently at shutdown
|
|
* anders/diameter/retransmission/OTP-12415:
Fix retransmission of messages sent as header/avps list
|
|
The decode of an incoming request in a non-relay application results in
a deep list of diameter_avp records. Encoding such a list resulted in a
function_clause error in diameter_codec:pack_avp/1, which expected a
flat list. The list is only flat in the relay case, or in the absence of
AVPs of type Grouped.
This is also related to code that exists but isn't documented. It's
documented that a diameter_app(3) handle_request callback can return
{relay, Opts} to relay a request received in the relay application.
What's not documented is that it can also return {proxy|resend, Opts} in
a non-relay application, but this leads to encode failure when there are
Grouped AVPs. This shouldn't be interpreted as meaning that proxy|resend
are now supported: they aren't. The two extra terms are a historical
relic that should probably be removed. Neither are generally usable
since, for example, a proxy agent may want to modify a request before
resending it. A specific handle_request return is not needed to
implement a proxy agent. Even {relay, Opts} isn't strictly necessary.
|
|
The decode of a Grouped AVP ignored the case that extracting component
AVPs with diameter_codec:collect_avps/1 returned a tuple, in the case of
a truncated AVP header.
|
|
Outgoing answers missing a Result-Code AVP or setting an E-bit
inappropriately were discarded, but there's no particular reason for
doing so if the answer can be encoded, and the sender has no way of
knowing that their answer has been discarded. It's also inappropriate
that the message be discarded in the relay case. Answers are now sent,
and an error counter incremented.
|
|
In particular, deal with the deprecation of erlang:now/0 in OTP 18. Be
backwards compatible with older releases: the new api is only used when
available.
The test suites have not been modified.
|
|
Transport processes are started by diameter one at a time. In the
listening case, a transport process accepts a connection, tells the
peer_fsm process, which tells its watchdog process, which tells its
service process, which then starts a new watchdog, which starts a new
peer_fsm, which starts a new transport process, which (finally) goes
about accepting another connection. In other words, not particularly
aggressive in accepting new connections. This behaviour doesn't do
particularly well with a large number of concurrent connections: with
TCP and 250 connecting peers we see connections being refused.
This commit adds the possibilty of configuring a pool of accepting
processes, by way of a new transport option, pool_size. Instead of
diameter:add_transport/2 starting just a single process, it now starts
the configured number, so that instead of a single process waiting for a
connection there's now a pool.
The option is even available for connecting processes, which provides an
alternate to adding multiple transports when multiple connections to the
same peer are required. In practice this also means configuring
{restrict_connections, false}: this is not implicit.
For backwards compatibility, the form of
diameter:service_info(_,transport) differs in the connecting case,
depending on whether or not pool_size is configured.
Note that transport processes for the same transport_ref() can be
started concurrently when pool_size > 1. This places additional
requirements on diameter_{tcp,sctp}, that will be dealt with in a
subsequent commit.
|
|
Extracting the End-to-End and Hop-by-Hop identifiers resulted in a
function clause error, causing the send to fail.
|
|
Shutting down the service causes DPR to be sent on all open transports
under the service. These in turn have a timeout for the reception of
DPA, but the timeout is bounded by the supervisor's in practice. Both
timeouts were 1 second. Increase the supervisor timeout to 5 seconds.
Note that the service supervisor is furthest to the right in the
supervision tree in diameter_sup. Thus is significant, so that the
transport-related processes aren't shutdown first.
|
|
As suggested in supervisor(3). The leaves of the supervision tree should
determine the timeouts.
|
|
There's no need for building a pid list only to map it to a list of
monitor references. Also, monitoring before banging the shutdown
message makes for better trace, avoiding unnecessary noproc reasons when
the process dies before the monitor is created.
|
|
* anders/diameter/filters/OTP-12308:
Order peers in pick_peer callbacks
|
|
* anders/diameter/connect_timer/OTP-12281:
Tweak reason in closed event
Fix ignored connect timer
Check {connect,watchdog}_timer distinction in event testcases
Rename reconnect_timer to connect_timer in examples and suites
|
|
* anders/diameter/3xxx/OTP-12233:
Fix handling of 3xxx Result-Code without E-bit
|
|
The order of peers presented to a diameter_app(3) pick_peer callback has
previously not been documented, but there are use cases that are
simplified by an ordering. For example, consider preferring a direct
connection to a specified Destination-Host/Realm to any host in the
realm. The implementation previously treated this as a special case by
placing matching hosts at the head of the peers list, but the
documentation made no guarantees. Now present peers in match-order, so
that the desired sorting is the result of the following filter.
{any, [{all, [host, realm]}, realm]}
The implementation is not backwards compatible in the sense that a realm
filter alone is no longer equivalent in this case. However, as stated,
the documentation never made any guarantees regarding the sorting.
|
|
From {error, Reason} to {no_connection, Reason} when a connection can't
be established. The exit reason of a diameter_peer_fsm process is turned
into a message from the corresponding diameter_watchdog process to the
relevant diameter_service process, the latter sending a 'closed' event
including the reason to any subscribers. Reason = [] when none of the
configured transport modules succeeds in establishing a connection,
which admittedly isn't terribly descriptive. (The lists is of error
reasons from transport start functions, which is empty as long as
transport processes start successfully.)
Note that this form of the closed event is undocumented, aside from the
documentation saying that one should expect undocumented events. The
explicitly documented forms are currently specific to CER/CEA failures.
|
|
There are two timers governing the establishment of peer connections:
connect_timer and watchdog_timer. The former is the RFC 6733 Tc timer
and is used by diameter_service to establish an initial connection. The
latter is RFC 3539 TwInit and is used by diameter_watchdog for
connection reestablishment after the watchdog leaves state INITIAL. A
connecting transport ignored the connect timer since the watchdog
process never died, regardless of the watchdog state, causing the
watchdog timer to handle reconnection.
This seems to have been broken for some time.
|
|
Commit 00584303 broke the population of the errors field of the
diameter_packet record when an incoming request with an
E-bit/Result-Code mismatch was decoded. Instead of the intended
{5004, #diameter_avp{value = integer()}},
the value was a 4-tuple containing the integer Result-Code.
|
|
An outgoing request whose pick_peer callback selected a transport on
another node resulted in an orphaned diameter_request entry on that
node.
|
|
* anders/diameter/Failed-AVP/OTP-12094:
Fix best effort decode of Failed-AVP
Fix decode of Failed-AVP in RFC 3588 answer-message
|
|
* anders/diameter/counters/OTP-12080:
Fix counters for answer-message
Count relayed messages on {relay, Rbit}
Count request retransmissions
Fix counting of outgoing requests
|
|
Commit c2c00fdd didn't get it quite right: it only decoded failed AVPs
in the common dictionary since it's this dictionary an answer-message is
decoded in. An extra dictionary isn't something that's easily passed
through the decode without rewriting dictionary compilation however, and
that's no small job, so continue with the use/abuse of the process
dictionary by storing the dictionary module for the decode to retrieve.
This is one step worse than previous uses since the dictionary is put in
one module (diameter_codec) and got in another (the dictionary module),
but it's the lesser of two evils.
|
|
An answer message that sets the E-bit is encoded/decoded with Diameter
common dictionary, using the answer-message grammar specified in the
RFC. However, the dictionary of the application in question is the one
that knows the command code of the message. Commit df19c272 didn't make
this distinction when incrementing counters for an answer-message, using
the common dictionary for both purposes, causing the message to be
counted as unknown. This commit remedies that.
|