aboutsummaryrefslogtreecommitdiffstats
path: root/lib/diameter/src/base/diameter_watchdog.erl
AgeCommit message (Collapse)Author
2017-09-09Merge branch 'anders/diameter/config_consistency/OTP-14555' into maintAnders Svensson
* anders/diameter/config_consistency/OTP-14555: Fix type spec Fix strict_arities blunder
2017-09-08Fix strict_arities blunderAnders Svensson
The watchdog process retained the configuration, causing DWR/DWA encode to fail on a string-valued Origin-Host/Realm if the encode arity was relaxed. Bungled in commit 5f3becad.
2017-09-01Rename decode_format false to noneAnders Svensson
Which reads better and makes it easier to distinguish this false from others.
2017-08-25Let a service configure default transport optionsAnders Svensson
Only a default spawn_opt has been possible to configure, but there's no reason why most others should need to be configured per transport. Those options that still only make sense on a transport are transport_module/config (because of the semantics of multiple values), applications/capabilities (since these override service options), and private (since it's only to allow user-specific options in a backwards compatible way).
2017-08-10Add service_opt() traffic_countersAnders Svensson
To be able to disable the counting of messages for which application callbacks take place. Messages sent/handled by diameter itself are always counted.
2017-08-03Add service_opt() strict_aritiesAnders Svensson
To be able to disable the relatively expensive check that the number of AVPs received in a message or grouped AVP agrees with the dictionary in question. The may well be easier for the user in handle_request/answer callbacks, when digesting the received message, and in some cases may not be important. The check at encode can also be disabled, allowing messages that don't agree with the dictionary in question to be sent, which can be useful in test (at least).
2017-08-03Rename record_decode -> decode_formatAnders Svensson
{record_decode, map} is a bit too quirky.
2017-08-03Add service_opt() record_decodeAnders Svensson
To control whether or not messages and grouped AVPs are decoded to records, in #diameter_packet.msg and #diameter_avp.value respectively. The decode became unnecessary for diameter's needs in parent commit, which decoupled it from the checking of AVP arities.
2017-06-13Add diameter_codec option ordered_encodeAnders Svensson
To allow list-valued messaged to be encoded in the specified order, instead of in the dictionary order by first converting the list to a record. This is not yet exposed in configuration.
2017-06-13Remove use of process dictionary in decodeAnders Svensson
By passing additional arguments through it.
2017-06-11Simplify acks to transport processesAnders Svensson
What's interesting when implementing some form of load regulation is when an incoming request has been answered or discarded. Acknowledge exactly this, not the identity of handler processes as previously. A transport process can request acks of nonforthcoming answers by sending {diameter, ack} to the parent peer_fsm, a handler processes identifies itself with a {handler, pid()} message, and the peer_fsm monitors on this to be able to send a notification to the transport if the handler dies before sending an answer.
2017-06-11Fix broken discard acknowledgementAnders Svensson
A transport process can request acknowledgement of the fate of an incoming message to a specified pid, causing it to receive one of {diameter, {request|answer, pid()} | discard} depending on whether or not diameter passes the message off to a handler process. This was broken in commit a4da06a5 (since recv/3 threw a message that should be received), but is of little consequence since the interface isn't yet documented and is only used from diameter_tcp with configuration that will soon change.
2017-03-07Don't use request table for answer routingAnders Svensson
The table has existed forever, to route incoming answers to a waiting request process: each outgoing request writes to the table, and each incoming answer reads. This has been seen to suffer from lock contention at high load however, so this commit moves the routing into the diameter_peer_fsm processes that are diameter's conduit to transport processes. The request table is still used for failover detection, but entries are only written when a watchdog state transitions leaves or enters state OKAY.
2016-06-11Use rand(3) instead of random(3)Anders Svensson
The latter is deprecated in OTP 19.
2016-05-09Merge branch 'anders/diameter/overload/OTP-13330'Anders Svensson
* anders/diameter/overload/OTP-13330: Suppress dialyzer warning Remove dead case clause Let throttling callback send a throttle message Acknowledge answers to notification pids when throttling Throttle properly with TLS Don't ask throttling callback to receive more unless needed Let a throttling callback answer a received message Let a throttling callback discard a received message Let throttling callback return a notification pid Make throttling callbacks on message reception Add diameter_tcp option throttle_cb
2016-03-13Let throttling callback return a notification pidAnders Svensson
In addition to returning ok or {timeout, Tmo}, let a throttling callback for message reception return a pid(), which is then notified if the message in question is either discarded or results in a request process. Notification is by way of messages of the form {diameter, discard | {request, pid()}} where the pid is that of a request process resulting from the received message. This allows the notification process to keep track of the maximum number of request processes a peer connection can have given rise to.
2015-09-07Fix watchdog function_clauseAnders Svensson
Commit 4f365c07 introduced the error on set_watchdog/2, as a consequence of timeout/1 returning stop, which only happens with accepting transports with {restrict_connections, false}.
2015-09-07Fix watchdog function_clauseAnders Svensson
Commit 4f365c07 introduced the error on set_watchdog/2, as a consequence of timeout/1 returning stop, which only happens with accepting transports with {restrict_connections, false}.
2015-08-13Merge branch 'maint-17' into maintAnders Svensson
The diffs are all about adapting to the OTP 18 time interface. The code was previously backwards compatible, falling back on the erlang:now/0 if erlang:monotonic_time/0 is unavailable, but this was seen to be a bad thing in commit 9c0f2f2c. Use of erlang:now/0 is now removed.
2015-08-13Merge branch 'anders/diameter/17/time/OTP-12926' into maint-17Erlang/OTP
* anders/diameter/17/time/OTP-12926: Simplify time manipulation Remove use of monotonic time in pre-18 code Remove unnecessary redefinition of erlang:max/2
2015-08-13Merge branch 'anders/diameter/lcnt/OTP-12912' into maint-17Erlang/OTP
* anders/diameter/lcnt/OTP-12912: Make ets diameter_stats a set Remove unnecessary sorting in stats suite Set ets {write_concurrency, true} on diameter_stats Don't start watchdog timers unnecessarily Remove unnecessary erlang:monitor/2 qualification Add missing watchdog suite clause
2015-08-05Simplify time manipulationAnders Svensson
By doing away with more wrapping that the parent commit started to remove.
2015-08-05Remove use of monotonic time in pre-18 codeAnders Svensson
This has been seen to be a bottleneck at high load: each undef results in a loop out to the code server. Originally implemented as suggested in the erts user's guide, in commits e6d19a18 and d4386254.
2015-08-04Truncate potentially large terms passed to diameter_lib:log/4Anders Svensson
Last visited in commit 00584303.
2015-07-19Don't start watchdog timers unnecessarilyAnders Svensson
In particular, restart the timer with each incoming Diameter message, only when the previous timer has expired. Doing so has been seen to result in high lock contention at load, as in the example below: (diameter@test)9> lcnt:conflicts([{print, [name, tries, ratio, time]}]). lock #tries collisions [%] time [us] ----- ------- --------------- ---------- bif_timers 7844528 99.4729 1394434884 db_tab 17240988 1.7947 6286664 timeofday 7358692 5.6729 1399624 proc_link 4814938 2.2736 482985 drv_ev_state 2324012 0.5951 98920 run_queue 21768213 0.2091 63516 pollset 1190174 1.7170 42499 pix_lock 1956 2.5562 39770 make_ref 4697067 0.3669 20211 proc_msgq 9475944 0.0295 5200 timer_wheel 5325966 0.0568 2654 proc_main 10005332 2.8190 1079 pollset_rm_list 59768 1.7752 480
2015-07-19Remove unnecessary erlang:monitor/2 qualificationAnders Svensson
The function has been auto-exported since R14B.
2015-06-22Merge branch 'bruce/change-license'Bruce Yinhe
OTP-12845 * bruce/change-license: fix errors caused by changed line numbers Change license text to APLv2
2015-06-20Remove dead upgrade-related codeAnders Svensson
Not needed with the parent commit's restart_application.
2015-06-18Change license text to APLv2Bruce Yinhe
2015-03-24Adapt to changed DiameterURI defaults in RFC 6733Anders Svensson
Despite claims of full backwards compatibility, the text of RFC 6733 changes the interpretation of unspecified values in a DiameterURI. In particular, 3588 says that the default port and transport are 3868 and sctp respectively, while 6733 says it's either 3868/tcp (aaa) or 5658/tcp (aaas). The 3588 defaults were used regardless, but now use them only if the common dictionary is diameter_gen_base_rfc3588. The 6733 defaults are used otherwise. This kind of change in the standard can lead to interop problems, since a node has to know which RFC its peer is following to know that it will properly interpret missing URI components. Encode of a URI includes all components to avoid such confusion. That said, note that the defaults in the diameter_uri record have *not* been changed. This avoids breaking code that depends on them, but the risk is that such code sends inappropriate values. The record defaults may be changed in a future release, to force values to be explicitly specified.
2015-03-24Merge branch 'anders/diameter/string_decode/OTP-11952' into maintAnders Svensson
* anders/diameter/string_decode/OTP-11952: Let examples override default service options Set {restrict_connections, false} in example server Set {string_decode, false} in examples Test {string_decode, false} in traffic suite Add service_opt() string_decode Strip potentially large terms when sending outgoing Diameter messages Improve language consistency in diameter(1)
2015-03-24Add service_opt() string_decodeAnders Svensson
To control whether stringish Diameter types are decoded to string or left as binary. The motivation is the same as in the parent commit: to avoid large strings being copied when incoming Diameter messages are passed between processes; or *if* in the case of messages destined for handle_request and handle_answer callbacks, since these are decoded in the dedicated processes that the callbacks take place in. It would be possible to do something about other messages without requiring an option, but disabling the decode is the most effective. The value is a boolean(), true being the default for backwards compatibility. Setting false causes both diameter_caps records and decoded messages to contain binary() in relevant places that previously had string(): diameter_app(3) callbacks need to be prepared for the change. The Diameter types affected are OctetString and the derived types that can contain arbitrarily large values: OctetString, UTF8String, DiameterIdentity, DiameterURI, IPFilterRule, and QoSFilterRule. Time and Address are unaffected. The DiameterURI decode has been redone using re(3), which both simplifies and does away with a vulnerability resulting from the conversion of arbitrary strings to atom. The solution continues the use and abuse of the process dictionary for encode/decode purposes, last seen in commit 0f9cdba.
2015-03-23Strip potentially large terms when sending outgoing Diameter messagesAnders Svensson
Both incoming and outgoing Diameter messages pass through two or three processes, depending on whether they're incoming or outgoing: the transport process and corresponding peer_fsm process and (for incoming) watchdog processes. Since terms other than binary are copied when passing process boundaries, large terms lead to copying that can be problematic, if frequent enough. Since only the bin and transport_data fields of a diameter_packet record are needed by the transport process, discard others when sending outgoing messages. Strictly speaking, the statement that only the aforementioned fields are needed by the transport process depends on the transport process. It's true of those implemented by diameter (in diameter_tcp and diameter_sctp), but an implementation that makes use of other fields is assuming more than the documentation in diameter_transport(3) promises.
2015-03-23Merge branch 'anders/diameter/dpr/OTP-12542' into maintAnders Svensson
* anders/diameter/dpr/OTP-12542: Discard CER or DWR sent with diameter:call/4 Allow DPR to be sent with diameter:call/4 Add transport_opt() dpa_timeout Add testcase for sending DPR with diameter:call/4
2015-03-22Allow DPR to be sent with diameter:call/4Anders Svensson
DPR is sent by diameter at application shutdown, service stop, or transport removal. It has been possible to send the request with diameter:call/4, but the answer was discarded, instead of the transport process being terminated. This commit causes DPR to be handled in the same way regardless of whether it's sent by diameter or by diameter:call/4. Note that the behaviour subsequent to DPA is unchanged. In particular, in the connecting case, the closed connection will be reestablished after a connect_timer expiry unless the transport is removed. The more probable use case is the listening case, to disconnect a single peer associated with a listening transport.
2015-02-20Use new time api in implementationAnders Svensson
In particular, deal with the deprecation of erlang:now/0 in OTP 18. Be backwards compatible with older releases: the new api is only used when available. The test suites have not been modified.
2014-11-03Fix ignored connect timerAnders Svensson
There are two timers governing the establishment of peer connections: connect_timer and watchdog_timer. The former is the RFC 6733 Tc timer and is used by diameter_service to establish an initial connection. The latter is RFC 3539 TwInit and is used by diameter_watchdog for connection reestablishment after the watchdog leaves state INITIAL. A connecting transport ignored the connect timer since the watchdog process never died, regardless of the watchdog state, causing the watchdog timer to handle reconnection. This seems to have been broken for some time.
2014-05-27Merge branch 'anders/diameter/hardening/OTP-11721' into maintAnders Svensson
* anders/diameter/hardening/OTP-11721: Simplify example server Make example server answer unsupported requests with 3001 Make example code quiet Don't count messages on arbitrary keys Replace traffic-related log reports with no-op function calls
2014-05-26Don't count messages on arbitrary keysAnders Svensson
That is, don't use a key constructed from an incoming Diameter header unless the message is known to the dictionary in question. Otherwise there are 2^32 application ids, 2^24 command codes, and 2 R-bits for an ill-willed peer to choose from, each resulting in new keys in the counter table (diameter_stats). The usual {ApplicationId, CommandCode, Rbit} in a key is replaced by the atom 'unknown' if the message in question is unknown to the decoding dictionary. Counters for messages sent and received by a relay are (still) not implemented.
2014-05-26Replace traffic-related log reports with no-op function callsAnders Svensson
The former were a little over-enthusiastic and could cause a node to be logged to death if a peer Diameter node was sufficiently ill-willed. The function calls are to diameter_lib:log/4, the arguments of which identify the happening in question, and which does nothing but provide a function to trace on. Many existing log calls have been shrunk. The only remaining traffic-related report (hopefully) is that resulting from {answer_errors, report} config, and this has been slimmed.
2014-05-26Merge branch 'anders/diameter/dpr/OTP-11938' into maintAnders Svensson
* anders/diameter/dpr/OTP-11938: Ensure watchdog dies with transport if DPA was sent
2014-05-25Merge branch 'anders/diameter/rc_counters/OTP-11937' into maintAnders Svensson
* anders/diameter/rc_counters/OTP-11937: Count encode errors in outgoing messages Count decode errors in incoming requests Count decode errors independently of result codes
2014-05-25Merge branch 'anders/diameter/rc_counters/OTP-11891' into maintAnders Svensson
* anders/diameter/rc_counters/OTP-11891: Count result codes in CEA/DWA/DPA
2014-05-25Merge branch 'anders/diameter/watchdog_leak/OTP-11934' into maintAnders Svensson
* anders/diameter/watchdog_leak/OTP-11934: Simplify sending of 'close' to watchdog Fix watchdog table leak
2014-05-23Ensure watchdog dies with transport if DPA was sentAnders Svensson
A DPR/DPA exchange should always cause the watchdog process in question to die with the transport, so that a subsequent connection with the same peer doesn't result in a 3 x DWR/DWA exchange. Commit 5903d6db saw to this for the sending of DPR but neglected the corresponding problem for DPA. In the case of sending DPR (the aforementioned commit), note that there's no distinction between receiving DPA as expected and not: the watchdog dies with the transport regardless. diameter_watchdog must be loaded first at upgrade.
2014-05-23Count encode errors in outgoing messagesAnders Svensson
Only decode errors were counted previously. Keys are of the form {Id, send, error}, where Id is: {ApplicationId, CommandCode, Rbit} | unknown The latter will be the case if not even a #diameter_header{} can be constructed.
2014-05-22Count decode errors in incoming requestsAnders Svensson
Errors were only counted in incoming answers. Counters are keyed on tuples of the same form: {{ApplicationId, CommandCode, Rbit}, recv, error}
2014-05-21Count result codes in CEA/DWA/DPAAnders Svensson
Corresponding counters for other answer messages have been counted previously, but those for CEA, DWA, and DPA have been missing since diameter itself sends these messages and the implementation is as bit more separate than it might be. The counters are keyed on values of the following form. {{ApplicationId, CommandCode, 0 = Rbit}, send|recv, {'Result-Code', RC}}
2014-05-20Simplify sending of 'close' to watchdogAnders Svensson
There's no need to send the message immediately if there's no transport configuration since that in itself means the service process will tell the watchdogs to die.
2014-05-20Fix watchdog table leakAnders Svensson
Commit ef5fddcb (diameter-1.4.1, R16B) caused the leak in the case of an accepting watchdog with restrict_connections = false. It (correctly) ensured the state remained at INITIAL but a subsequent 'close' message to terminate the process was ignored since the state was not DOWN. In fact, no 'close' was sent since there was no state transition or previous connection: the former triggers the message from diameter_service, the latter from diameter_watchdog. The message is now sent to self() from the watchdog itself. Send 'close' in the same way when multiple connections to the same peer are allowed, to avoid waiting for a watchdog timer expiry for the process to terminate in this case.