aboutsummaryrefslogtreecommitdiffstats
path: root/lib/diameter/src/base/diameter_watchdog.erl
AgeCommit message (Collapse)Author
2015-09-07Fix watchdog function_clauseAnders Svensson
Commit 4f365c07 introduced the error on set_watchdog/2, as a consequence of timeout/1 returning stop, which only happens with accepting transports with {restrict_connections, false}.
2015-08-13Merge branch 'anders/diameter/17/time/OTP-12926' into maint-17Erlang/OTP
* anders/diameter/17/time/OTP-12926: Simplify time manipulation Remove use of monotonic time in pre-18 code Remove unnecessary redefinition of erlang:max/2
2015-08-13Merge branch 'anders/diameter/lcnt/OTP-12912' into maint-17Erlang/OTP
* anders/diameter/lcnt/OTP-12912: Make ets diameter_stats a set Remove unnecessary sorting in stats suite Set ets {write_concurrency, true} on diameter_stats Don't start watchdog timers unnecessarily Remove unnecessary erlang:monitor/2 qualification Add missing watchdog suite clause
2015-08-05Simplify time manipulationAnders Svensson
By doing away with more wrapping that the parent commit started to remove.
2015-08-05Remove use of monotonic time in pre-18 codeAnders Svensson
This has been seen to be a bottleneck at high load: each undef results in a loop out to the code server. Originally implemented as suggested in the erts user's guide, in commits e6d19a18 and d4386254.
2015-08-04Truncate potentially large terms passed to diameter_lib:log/4Anders Svensson
Last visited in commit 00584303.
2015-07-19Don't start watchdog timers unnecessarilyAnders Svensson
In particular, restart the timer with each incoming Diameter message, only when the previous timer has expired. Doing so has been seen to result in high lock contention at load, as in the example below: (diameter@test)9> lcnt:conflicts([{print, [name, tries, ratio, time]}]). lock #tries collisions [%] time [us] ----- ------- --------------- ---------- bif_timers 7844528 99.4729 1394434884 db_tab 17240988 1.7947 6286664 timeofday 7358692 5.6729 1399624 proc_link 4814938 2.2736 482985 drv_ev_state 2324012 0.5951 98920 run_queue 21768213 0.2091 63516 pollset 1190174 1.7170 42499 pix_lock 1956 2.5562 39770 make_ref 4697067 0.3669 20211 proc_msgq 9475944 0.0295 5200 timer_wheel 5325966 0.0568 2654 proc_main 10005332 2.8190 1079 pollset_rm_list 59768 1.7752 480
2015-07-19Remove unnecessary erlang:monitor/2 qualificationAnders Svensson
The function has been auto-exported since R14B.
2015-03-24Adapt to changed DiameterURI defaults in RFC 6733Anders Svensson
Despite claims of full backwards compatibility, the text of RFC 6733 changes the interpretation of unspecified values in a DiameterURI. In particular, 3588 says that the default port and transport are 3868 and sctp respectively, while 6733 says it's either 3868/tcp (aaa) or 5658/tcp (aaas). The 3588 defaults were used regardless, but now use them only if the common dictionary is diameter_gen_base_rfc3588. The 6733 defaults are used otherwise. This kind of change in the standard can lead to interop problems, since a node has to know which RFC its peer is following to know that it will properly interpret missing URI components. Encode of a URI includes all components to avoid such confusion. That said, note that the defaults in the diameter_uri record have *not* been changed. This avoids breaking code that depends on them, but the risk is that such code sends inappropriate values. The record defaults may be changed in a future release, to force values to be explicitly specified.
2015-03-24Merge branch 'anders/diameter/string_decode/OTP-11952' into maintAnders Svensson
* anders/diameter/string_decode/OTP-11952: Let examples override default service options Set {restrict_connections, false} in example server Set {string_decode, false} in examples Test {string_decode, false} in traffic suite Add service_opt() string_decode Strip potentially large terms when sending outgoing Diameter messages Improve language consistency in diameter(1)
2015-03-24Add service_opt() string_decodeAnders Svensson
To control whether stringish Diameter types are decoded to string or left as binary. The motivation is the same as in the parent commit: to avoid large strings being copied when incoming Diameter messages are passed between processes; or *if* in the case of messages destined for handle_request and handle_answer callbacks, since these are decoded in the dedicated processes that the callbacks take place in. It would be possible to do something about other messages without requiring an option, but disabling the decode is the most effective. The value is a boolean(), true being the default for backwards compatibility. Setting false causes both diameter_caps records and decoded messages to contain binary() in relevant places that previously had string(): diameter_app(3) callbacks need to be prepared for the change. The Diameter types affected are OctetString and the derived types that can contain arbitrarily large values: OctetString, UTF8String, DiameterIdentity, DiameterURI, IPFilterRule, and QoSFilterRule. Time and Address are unaffected. The DiameterURI decode has been redone using re(3), which both simplifies and does away with a vulnerability resulting from the conversion of arbitrary strings to atom. The solution continues the use and abuse of the process dictionary for encode/decode purposes, last seen in commit 0f9cdba.
2015-03-23Strip potentially large terms when sending outgoing Diameter messagesAnders Svensson
Both incoming and outgoing Diameter messages pass through two or three processes, depending on whether they're incoming or outgoing: the transport process and corresponding peer_fsm process and (for incoming) watchdog processes. Since terms other than binary are copied when passing process boundaries, large terms lead to copying that can be problematic, if frequent enough. Since only the bin and transport_data fields of a diameter_packet record are needed by the transport process, discard others when sending outgoing messages. Strictly speaking, the statement that only the aforementioned fields are needed by the transport process depends on the transport process. It's true of those implemented by diameter (in diameter_tcp and diameter_sctp), but an implementation that makes use of other fields is assuming more than the documentation in diameter_transport(3) promises.
2015-03-23Merge branch 'anders/diameter/dpr/OTP-12542' into maintAnders Svensson
* anders/diameter/dpr/OTP-12542: Discard CER or DWR sent with diameter:call/4 Allow DPR to be sent with diameter:call/4 Add transport_opt() dpa_timeout Add testcase for sending DPR with diameter:call/4
2015-03-22Allow DPR to be sent with diameter:call/4Anders Svensson
DPR is sent by diameter at application shutdown, service stop, or transport removal. It has been possible to send the request with diameter:call/4, but the answer was discarded, instead of the transport process being terminated. This commit causes DPR to be handled in the same way regardless of whether it's sent by diameter or by diameter:call/4. Note that the behaviour subsequent to DPA is unchanged. In particular, in the connecting case, the closed connection will be reestablished after a connect_timer expiry unless the transport is removed. The more probable use case is the listening case, to disconnect a single peer associated with a listening transport.
2015-02-20Use new time api in implementationAnders Svensson
In particular, deal with the deprecation of erlang:now/0 in OTP 18. Be backwards compatible with older releases: the new api is only used when available. The test suites have not been modified.
2014-11-03Fix ignored connect timerAnders Svensson
There are two timers governing the establishment of peer connections: connect_timer and watchdog_timer. The former is the RFC 6733 Tc timer and is used by diameter_service to establish an initial connection. The latter is RFC 3539 TwInit and is used by diameter_watchdog for connection reestablishment after the watchdog leaves state INITIAL. A connecting transport ignored the connect timer since the watchdog process never died, regardless of the watchdog state, causing the watchdog timer to handle reconnection. This seems to have been broken for some time.
2014-05-27Merge branch 'anders/diameter/hardening/OTP-11721' into maintAnders Svensson
* anders/diameter/hardening/OTP-11721: Simplify example server Make example server answer unsupported requests with 3001 Make example code quiet Don't count messages on arbitrary keys Replace traffic-related log reports with no-op function calls
2014-05-26Don't count messages on arbitrary keysAnders Svensson
That is, don't use a key constructed from an incoming Diameter header unless the message is known to the dictionary in question. Otherwise there are 2^32 application ids, 2^24 command codes, and 2 R-bits for an ill-willed peer to choose from, each resulting in new keys in the counter table (diameter_stats). The usual {ApplicationId, CommandCode, Rbit} in a key is replaced by the atom 'unknown' if the message in question is unknown to the decoding dictionary. Counters for messages sent and received by a relay are (still) not implemented.
2014-05-26Replace traffic-related log reports with no-op function callsAnders Svensson
The former were a little over-enthusiastic and could cause a node to be logged to death if a peer Diameter node was sufficiently ill-willed. The function calls are to diameter_lib:log/4, the arguments of which identify the happening in question, and which does nothing but provide a function to trace on. Many existing log calls have been shrunk. The only remaining traffic-related report (hopefully) is that resulting from {answer_errors, report} config, and this has been slimmed.
2014-05-26Merge branch 'anders/diameter/dpr/OTP-11938' into maintAnders Svensson
* anders/diameter/dpr/OTP-11938: Ensure watchdog dies with transport if DPA was sent
2014-05-25Merge branch 'anders/diameter/rc_counters/OTP-11937' into maintAnders Svensson
* anders/diameter/rc_counters/OTP-11937: Count encode errors in outgoing messages Count decode errors in incoming requests Count decode errors independently of result codes
2014-05-25Merge branch 'anders/diameter/rc_counters/OTP-11891' into maintAnders Svensson
* anders/diameter/rc_counters/OTP-11891: Count result codes in CEA/DWA/DPA
2014-05-25Merge branch 'anders/diameter/watchdog_leak/OTP-11934' into maintAnders Svensson
* anders/diameter/watchdog_leak/OTP-11934: Simplify sending of 'close' to watchdog Fix watchdog table leak
2014-05-23Ensure watchdog dies with transport if DPA was sentAnders Svensson
A DPR/DPA exchange should always cause the watchdog process in question to die with the transport, so that a subsequent connection with the same peer doesn't result in a 3 x DWR/DWA exchange. Commit 5903d6db saw to this for the sending of DPR but neglected the corresponding problem for DPA. In the case of sending DPR (the aforementioned commit), note that there's no distinction between receiving DPA as expected and not: the watchdog dies with the transport regardless. diameter_watchdog must be loaded first at upgrade.
2014-05-23Count encode errors in outgoing messagesAnders Svensson
Only decode errors were counted previously. Keys are of the form {Id, send, error}, where Id is: {ApplicationId, CommandCode, Rbit} | unknown The latter will be the case if not even a #diameter_header{} can be constructed.
2014-05-22Count decode errors in incoming requestsAnders Svensson
Errors were only counted in incoming answers. Counters are keyed on tuples of the same form: {{ApplicationId, CommandCode, Rbit}, recv, error}
2014-05-21Count result codes in CEA/DWA/DPAAnders Svensson
Corresponding counters for other answer messages have been counted previously, but those for CEA, DWA, and DPA have been missing since diameter itself sends these messages and the implementation is as bit more separate than it might be. The counters are keyed on values of the following form. {{ApplicationId, CommandCode, 0 = Rbit}, send|recv, {'Result-Code', RC}}
2014-05-20Simplify sending of 'close' to watchdogAnders Svensson
There's no need to send the message immediately if there's no transport configuration since that in itself means the service process will tell the watchdogs to die.
2014-05-20Fix watchdog table leakAnders Svensson
Commit ef5fddcb (diameter-1.4.1, R16B) caused the leak in the case of an accepting watchdog with restrict_connections = false. It (correctly) ensured the state remained at INITIAL but a subsequent 'close' message to terminate the process was ignored since the state was not DOWN. In fact, no 'close' was sent since there was no state transition or previous connection: the former triggers the message from diameter_service, the latter from diameter_watchdog. The message is now sent to self() from the watchdog itself. Send 'close' in the same way when multiple connections to the same peer are allowed, to avoid waiting for a watchdog timer expiry for the process to terminate in this case.
2014-01-27Remove upgrade-related codeAnders Svensson
No longer needed to update code in runtime since the emulator is restarted at a major release.
2013-12-02Merge branch 'anders/diameter/timer_confusion/OTP-11168' into maintAnders Svensson
* anders/diameter/timer_confusion/OTP-11168: Rename reconnect_timer -> connect_timer
2013-11-29Rename reconnect_timer -> connect_timerAnders Svensson
The former was misleading since the timer only applies to initial connection attempts, reconnection attempts being governed by watchdog_timer. The name is a historic remnant from a (dark, pre-OTP) time in which RFC 3539 was followed less slavishly than it is now, and the timer actually did apply to reconnection attempts. Note that connect_timer corresponds to RFC 6733 Tc, while watchdog_timer corresponds to RFC 3539 TwInit. The latter RFC makes clear that TwInit should apply to reconnection attempts. It's less clear if only RFC 6733 is read. Note also that reconnect_timer is still accepted for backwards compatibility. It would be possible to add an option to make reconnect_timer behave strictly as the name suggests (ie. ignore RFC 3539 and interpret RFC 6733 at face value; something that has some value for testing at least) but no such option is implemented in this commit.
2013-10-07Fix broken DWAAnders Svensson
Commit e762d7d1 broke outgoing DWA by setting new Hop-by-Hop and End-to-End identifiers instead of those of the incoming DWR.
2013-07-12Ensure DWR isn't sent immediately after DWAAnders Svensson
Having the peer_fsm process answer DWR meant that watchdog timer expiry could result in an outgoing DWR despite the fact that an incoming DWR was just answered. Having the watchdog process answer avoids this. diameter_peer_fsm must be loaded before diameter_watchdog. It's possible for one incoming DWR to go unanswered but a subsequent DWR will be answered so no harm is done.
2013-05-27Fix watchdog function_clauseAnders Svensson
Commit 0b7c87dc caused diameter_watchdog:restart/2 to start returning 'stop', so that a watchdog process for a listening transport that allowed multiple connections to the same peer would die one watchdog timeout after losing a connection. The new return value was supposed to be passed up to transition/2, but was instead passed to set_watchdog/1, resulting in a function_clause error. The resulting crash was harmless but unseemly. Not detected by dialyzer. Thanks to Aleksander Nycz.
2013-03-26Deal with config errors detected at transport start less brutallyAnders Svensson
Crashing watchdog and peer_fsm processes was somewhat unseemly. Emit an error report and die silently instead.
2013-03-26Move most transport_opt() validation into diameter_configAnders Svensson
Faulty configuration was previously passed directly on to watchdog and peer_fsm processes, diameter:add_transport/2 happily returning ok and the error resulting on failure of watchdog and/or peer_fsm processes. Now check for errors before getting this far, returning {error, Reason} from diameter:add_transport/2 when one is detected. There are still some errors that can only be detected after transport start (eg. a misbehaving callback) but most will be caught early.
2013-03-12Tweak okay -> suspect configAnders Svensson
Make it just a number of timeouts, without a new DWR being sent.
2013-03-04Add transport_opt() watchdog_configAnders Svensson
To make the number of watchdogs sent before the transitions REOPEN -> OKAY and OKAY -> SUSPECT configurable. Using anything other then the default config is non-standard and should only be used for test.
2013-02-08Split message handling in diameter_service into diameter_trafficAnders Svensson
Traffic handling is connected to the service implementation through the pick_peer callback and failover but diameter_service was getting unwieldy as home to both the service process and traffic handling.
2013-02-08Don't hardcode common dictionaryAnders Svensson
Instead, use whatever dictionary a transport has configured as supporting application id 0. This is to support the updated RFC 6733 dictionaries (which bring with them updated records) and also to be able to transparently support any changed semantics (eg. 5xxx in answer-message).
2013-02-08Fix faulty watchdog transition INITIAL -> DOWNAnders Svensson
There is no such transition in RFC 3539, the state remains in INITIAL.
2013-02-08Fix faulty watchdog transition DOWN -> INITIALAnders Svensson
This was the result of the watchdog process exiting as a consequence of peer death in some casesi, causing a restarted transport to enter INITIAL when it should enter REOPEN. The watchdog now remains alive as long as peer shutdown isn't requested and a 'close' message to the service process (instead of watchdog death) generates 'closed' events from the service.
2013-02-08Simplify watchdog transitions in service processAnders Svensson
In particular, use watchdog messages as input and do away with the older connection_up/down (and other) messages. Also, only maintain the watchdog state, not the older up/down op state.
2013-02-08Simplify transport shutdownAnders Svensson
Service process informs the watchdog process which informs the peer process. (Instead of going directly to the latter in one case.)
2013-02-08Remove upgrade code not needed after application restartAnders Svensson
Which will be the case with R16B in this case.
2013-01-23Remove upgrade code not needed at a major releaseAnders Svensson
2012-11-15Add comment about lack of identifier checks on DWAAnders Svensson
2012-11-15Ensure watchdog dies with transport if DPR was sentAnders Svensson
A watchdog timeout after DPR but before DPA would previously result in the watchdog restarting the transport.
2012-11-05Implement service_opt() restrict_connectionsAnders Svensson