aboutsummaryrefslogtreecommitdiffstats
path: root/lib/os_mon/src/cpu_sup.erl
AgeCommit message (Collapse)Author
2015-03-18Handle big loadavg values correctlyViacheslav V. Kovalev
Do not crash with badmatch when integer part of loadavg has more than 2 digits.
2014-08-29Clarify error for slow `cpu_sup` port initLuca Favatella
I noticed that running an R16B03-1 node on an overloaded host produced log entries like the following ones: ``` 2014-08-22 21:52:31 =ERROR REPORT==== Error in process <0.24112.3> on node '[email protected]' with exit value: {{case_clause,{data,4711}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,227}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} ``` ``` ===== ALIVE Fri Aug 22 21:50:14 CEST 2014 [os_mon] cpu supervisor port (cpu_sup): Erlang has closed [os_mon] cpu supervisor port (cpu_sup): Erlang has closed [os_mon] cpu supervisor port (cpu_sup): Erlang has closed [os_mon] cpu supervisor port (cpu_sup): Erlang has closed ===== ALIVE Fri Aug 22 22:07:46 CEST 2014 ``` I performed a code inspection on the `cpu_sup` module and I concluded that the `case_clause` error shows a small issue in the `cpu_sup` module, that happens when the port used in `cpu_sup` is slow to start - as it may happen on an overloaded node. The `cpu_sup` `gen_server` process keeps in its state the pid of an unlinked process (called "measurement server" - see `cpu_sup:init/1`), in order to do dirty stuff (e.g. reading the filesystem, running OS commands) and start_link-ing & managing the connected process to a port (called "port server" - see `measurement_server_init/0`). So the process organization looks like this: ``` cpu_sup - measurement server - port server - port ``` When the measurement server start_links the port server (see `port_server_start/0`) it sends a `{self(), ?ping}` message to it and expects an answer within 6s, otherwise it returns `{error, timeout}` rather than the pid. This has two issues: * The measurement server keeps `{error, ...}` in its state as if it were a pid - that makes no sense; * A late `{Pid, {data,4711}}` response may arrive in the mailbox of the measurement server, that will believe it to be a request to be processed, causing a `case_clause` error. This commit teaches the measurement server to check the success of the initialization of the port server by matching on the return value of `port_server_start/0` (renamed to `port_server_start_link/0` for the sake of clarity) in order to fail earlier and with an error clearer than `{case_clause,{data,4711}}`. In such case I expect the measurement server to be restarted by the `cpu_sup` `gen_server` (see `handle_call/3`) - as before. BTW It is not clear to me when the `handle_info({'EXIT', _Port, Reason}, State)` may be called (the `cpu_sup` `gen_server` does not link to the measurement server) but I am leaving it.
2012-01-25Look for port in priv/bin/arch/ as well as priv/bin/Lukas Larsson
2010-02-17Merge branch 'ks/cleanups' into ccase/r13b04_devErlang/OTP
* ks/cleanups: percept: Clean up as suggested by tidier percept: Modernize types and specs parsetools: Don't use 'try...of' when 'try' will do parsetools: Use %% for comments at the beginning of a line parsetools: Replace lists:keysearch/3 with lists:keyfind/3 parsetools: Modernize types and specs parsetools: Replace TABs with spaces runtime_tools: Modernize specs sasl: Eliminate tuple used as fun sasl: Add missing modules to app file asn1: Clean up as suggested by tidier os_mon: Modernize types and specs wx: Clean up as suggested by tidier OTP-8455 ks/cleanups
2010-02-16os_mon: Modernize types and specsKostis Sagonas
2009-11-20The R13B03 release.OTP_R13B03Erlang/OTP