aboutsummaryrefslogtreecommitdiffstats
path: root/lib/diameter/src/base/diameter_watchdog.erl
AgeCommit message (Collapse)Author
2014-05-26Merge branch 'anders/diameter/dpr/OTP-11938' into maintAnders Svensson
* anders/diameter/dpr/OTP-11938: Ensure watchdog dies with transport if DPA was sent
2014-05-25Merge branch 'anders/diameter/rc_counters/OTP-11937' into maintAnders Svensson
* anders/diameter/rc_counters/OTP-11937: Count encode errors in outgoing messages Count decode errors in incoming requests Count decode errors independently of result codes
2014-05-25Merge branch 'anders/diameter/rc_counters/OTP-11891' into maintAnders Svensson
* anders/diameter/rc_counters/OTP-11891: Count result codes in CEA/DWA/DPA
2014-05-25Merge branch 'anders/diameter/watchdog_leak/OTP-11934' into maintAnders Svensson
* anders/diameter/watchdog_leak/OTP-11934: Simplify sending of 'close' to watchdog Fix watchdog table leak
2014-05-23Ensure watchdog dies with transport if DPA was sentAnders Svensson
A DPR/DPA exchange should always cause the watchdog process in question to die with the transport, so that a subsequent connection with the same peer doesn't result in a 3 x DWR/DWA exchange. Commit 5903d6db saw to this for the sending of DPR but neglected the corresponding problem for DPA. In the case of sending DPR (the aforementioned commit), note that there's no distinction between receiving DPA as expected and not: the watchdog dies with the transport regardless. diameter_watchdog must be loaded first at upgrade.
2014-05-23Count encode errors in outgoing messagesAnders Svensson
Only decode errors were counted previously. Keys are of the form {Id, send, error}, where Id is: {ApplicationId, CommandCode, Rbit} | unknown The latter will be the case if not even a #diameter_header{} can be constructed.
2014-05-22Count decode errors in incoming requestsAnders Svensson
Errors were only counted in incoming answers. Counters are keyed on tuples of the same form: {{ApplicationId, CommandCode, Rbit}, recv, error}
2014-05-21Count result codes in CEA/DWA/DPAAnders Svensson
Corresponding counters for other answer messages have been counted previously, but those for CEA, DWA, and DPA have been missing since diameter itself sends these messages and the implementation is as bit more separate than it might be. The counters are keyed on values of the following form. {{ApplicationId, CommandCode, 0 = Rbit}, send|recv, {'Result-Code', RC}}
2014-05-20Simplify sending of 'close' to watchdogAnders Svensson
There's no need to send the message immediately if there's no transport configuration since that in itself means the service process will tell the watchdogs to die.
2014-05-20Fix watchdog table leakAnders Svensson
Commit ef5fddcb (diameter-1.4.1, R16B) caused the leak in the case of an accepting watchdog with restrict_connections = false. It (correctly) ensured the state remained at INITIAL but a subsequent 'close' message to terminate the process was ignored since the state was not DOWN. In fact, no 'close' was sent since there was no state transition or previous connection: the former triggers the message from diameter_service, the latter from diameter_watchdog. The message is now sent to self() from the watchdog itself. Send 'close' in the same way when multiple connections to the same peer are allowed, to avoid waiting for a watchdog timer expiry for the process to terminate in this case.
2014-01-27Remove upgrade-related codeAnders Svensson
No longer needed to update code in runtime since the emulator is restarted at a major release.
2013-12-02Merge branch 'anders/diameter/timer_confusion/OTP-11168' into maintAnders Svensson
* anders/diameter/timer_confusion/OTP-11168: Rename reconnect_timer -> connect_timer
2013-11-29Rename reconnect_timer -> connect_timerAnders Svensson
The former was misleading since the timer only applies to initial connection attempts, reconnection attempts being governed by watchdog_timer. The name is a historic remnant from a (dark, pre-OTP) time in which RFC 3539 was followed less slavishly than it is now, and the timer actually did apply to reconnection attempts. Note that connect_timer corresponds to RFC 6733 Tc, while watchdog_timer corresponds to RFC 3539 TwInit. The latter RFC makes clear that TwInit should apply to reconnection attempts. It's less clear if only RFC 6733 is read. Note also that reconnect_timer is still accepted for backwards compatibility. It would be possible to add an option to make reconnect_timer behave strictly as the name suggests (ie. ignore RFC 3539 and interpret RFC 6733 at face value; something that has some value for testing at least) but no such option is implemented in this commit.
2013-10-07Fix broken DWAAnders Svensson
Commit e762d7d1 broke outgoing DWA by setting new Hop-by-Hop and End-to-End identifiers instead of those of the incoming DWR.
2013-07-12Ensure DWR isn't sent immediately after DWAAnders Svensson
Having the peer_fsm process answer DWR meant that watchdog timer expiry could result in an outgoing DWR despite the fact that an incoming DWR was just answered. Having the watchdog process answer avoids this. diameter_peer_fsm must be loaded before diameter_watchdog. It's possible for one incoming DWR to go unanswered but a subsequent DWR will be answered so no harm is done.
2013-05-27Fix watchdog function_clauseAnders Svensson
Commit 0b7c87dc caused diameter_watchdog:restart/2 to start returning 'stop', so that a watchdog process for a listening transport that allowed multiple connections to the same peer would die one watchdog timeout after losing a connection. The new return value was supposed to be passed up to transition/2, but was instead passed to set_watchdog/1, resulting in a function_clause error. The resulting crash was harmless but unseemly. Not detected by dialyzer. Thanks to Aleksander Nycz.
2013-03-26Deal with config errors detected at transport start less brutallyAnders Svensson
Crashing watchdog and peer_fsm processes was somewhat unseemly. Emit an error report and die silently instead.
2013-03-26Move most transport_opt() validation into diameter_configAnders Svensson
Faulty configuration was previously passed directly on to watchdog and peer_fsm processes, diameter:add_transport/2 happily returning ok and the error resulting on failure of watchdog and/or peer_fsm processes. Now check for errors before getting this far, returning {error, Reason} from diameter:add_transport/2 when one is detected. There are still some errors that can only be detected after transport start (eg. a misbehaving callback) but most will be caught early.
2013-03-12Tweak okay -> suspect configAnders Svensson
Make it just a number of timeouts, without a new DWR being sent.
2013-03-04Add transport_opt() watchdog_configAnders Svensson
To make the number of watchdogs sent before the transitions REOPEN -> OKAY and OKAY -> SUSPECT configurable. Using anything other then the default config is non-standard and should only be used for test.
2013-02-08Split message handling in diameter_service into diameter_trafficAnders Svensson
Traffic handling is connected to the service implementation through the pick_peer callback and failover but diameter_service was getting unwieldy as home to both the service process and traffic handling.
2013-02-08Don't hardcode common dictionaryAnders Svensson
Instead, use whatever dictionary a transport has configured as supporting application id 0. This is to support the updated RFC 6733 dictionaries (which bring with them updated records) and also to be able to transparently support any changed semantics (eg. 5xxx in answer-message).
2013-02-08Fix faulty watchdog transition INITIAL -> DOWNAnders Svensson
There is no such transition in RFC 3539, the state remains in INITIAL.
2013-02-08Fix faulty watchdog transition DOWN -> INITIALAnders Svensson
This was the result of the watchdog process exiting as a consequence of peer death in some casesi, causing a restarted transport to enter INITIAL when it should enter REOPEN. The watchdog now remains alive as long as peer shutdown isn't requested and a 'close' message to the service process (instead of watchdog death) generates 'closed' events from the service.
2013-02-08Simplify watchdog transitions in service processAnders Svensson
In particular, use watchdog messages as input and do away with the older connection_up/down (and other) messages. Also, only maintain the watchdog state, not the older up/down op state.
2013-02-08Simplify transport shutdownAnders Svensson
Service process informs the watchdog process which informs the peer process. (Instead of going directly to the latter in one case.)
2013-02-08Remove upgrade code not needed after application restartAnders Svensson
Which will be the case with R16B in this case.
2013-01-23Remove upgrade code not needed at a major releaseAnders Svensson
2012-11-15Add comment about lack of identifier checks on DWAAnders Svensson
2012-11-15Ensure watchdog dies with transport if DPR was sentAnders Svensson
A watchdog timeout after DPR but before DPA would previously result in the watchdog restarting the transport.
2012-11-05Implement service_opt() restrict_connectionsAnders Svensson
2012-11-05Add reopen message from watchdogAnders Svensson
This makes capabilities available to service_info as soon as capabilities exchange has been completed. In particular, before state OKAY is reached.
2012-11-05Implement sequence masksAnders Svensson
Code should be loaded in this order: diameter_session (sequence/1) diameter_peer_fsm (calls to sequence/1) diameter_service (sequence config, mask in receive_message/3) diameter_watchdog (mask in peer start and receive_message/3) diameter_config (accept sequence config) Order of diameter and diameter_peer doesn't matter.
2012-09-25Exit peer_fsm with {shutdown, watchdog_timeout}, not shutdownAnders Svensson
This was a remnant of the time when sasl interpreted everything but shutdown or normal as a crash.
2012-08-26Minor spec fixAnders Svensson
2012-08-24Add events for watchdog state transitionsAnders Svensson
2011-12-16Ensure capabilities exchange can't fail too earlyAnders Svensson
In particular, not before the service process has a monitor on the watchdog since the watchdog's exit reason is meaningful.
2011-11-10Rename some functions plus comment tweakAnders Svensson
In diameter_service: make_packet -> make_request_packet make_header -> make_request_header make_reply_packet -> make_answer_packet
2011-10-17One makefile for src build instead of recursionAnders Svensson
Simpler, no duplication of similar makefiles and makes for better dependencies. (Aka, recursive make considered harmful.)