Age | Commit message (Collapse) | Author |
|
* maint:
erts: Add nif_SUITE:t_on_load
erts: Improve nif_SUITE:upgrade test
Don't leak old code when loading a modules with an on_load function
Conflicts:
erts/preloaded/ebin/erts_code_purger.beam
erts/preloaded/ebin/erts_internal.beam
erts/preloaded/src/erts_code_purger.erl
|
|
* bjorn/erts/on_load/ERL-240/OTP-13893:
erts: Add nif_SUITE:t_on_load
erts: Improve nif_SUITE:upgrade test
Don't leak old code when loading a modules with an on_load function
|
|
|
|
Normally, calling code:delete/1 before re-loading the code for a
module is unnecessary but causes no problem.
But there will be be problems if the new code has an on_load function.
Code with an on_load function will always be loaded as old code
to allowed it to be easily purged if the on_load function would fail.
If the on_load function succeeds, the old and current code will be
swapped.
So in the scenario where code:delete/1 has been called explicitly,
there is old code but no current code. Loading code with an
on_load function will cause the reference to the old code to be
overwritten. That will at best cause a memory leak, and at worst
an emulator crash (especially if NIFs are involved).
To avoid that situation, we will put the code with the on_load
function in a special, third slot in Module.
ERL-240
|
|
|
|
When printing the error term from on_load, insert a newline because the
term is never short and will produce hard to read/copy output that is
just hanging off to the right.
|
|
* lukas/erts/fix_init_stop_code_load_race/OTP-13802:
erts/kernel: Fix code loading deadlock during init:stop
|
|
When init:stop is called it walks the application hierarchy
and terminates each process. Some of these processes may do
something while terminating and sometimes that something
needs to load some new code in order to work. When this happens
the code_server could just be in the process of terminating
or the erl_prim_loader could be active. In both these cases
the request to load the new code would cause a deadlock in the
termination of the system.
This commit fixes this by init rejecting attempts to load new code
when init:stop has been called and fixing a termination race in
the code_server.
This however means that the process that tried to do something
when told to terminate (for instance logging that it is terminating)
will crash instead of loading the code.
|
|
|
|
If an on_load function had not yet returned, the code server
would only correctly handle calls to code:ensure_loaded/1. That
is, each process that called code:ensured_loaded/1 for the
module in question would be suspended until the on_load function
had returned.
The code server handled calls to code:load_binary/1, code:load_file/1,
and code:load_abs/1 in the same way as for code:ensure_loaded/1.
That means that if call to one those functions attempted to load
*different* code for the module, that code would not get loaded.
Note that code:finish_loading/1 (code:atomic_load/1) will still return
{error,pending_on_load} if there is a pending on_load function for one
of the modules that are about to be loaded. The reason is that
code:finish_loading/1 is meant to either succeed directly, or fail
quickly if there is any problem.
|
|
itself
|
|
For historical reasons, the second argument in handle_call/3 is
{Pid,Tag}. The tag is always 'call' and is never used.
|
|
Load the module as old code; swap old and new code if the
-on_load function succeeds. That way, a failed update attempt
for a module that has an -on_load function will preserve the
previous version of the code.
|
|
|
|
|
|
|
|
Refactor get_name/1 to facilitate an optimization in the next commit.
|
|
make_path() returns a tuple whose second element is not used by
any of its caller. Eliminate the tuple.
Also avoid calling filename:dirname/1 when we already have the result
stored in a variable, and avoid calling filename:basename/2 on a
complete path when we already have the last part of the pathname
stored in a variable.
|
|
absname/1 is quite expensive, so we should not call if we already
have a normalized absolute path.
|
|
The same filtering of sub directories is done in archive_subdirs/1
and try_archive_subdirs/1. Actually, archive_subdirs/1 does nothing
useful. It can be removed and all_archive_subdirs/1 can be renamed
to archive_subdirs/1.
In try_archive_subdirs/1, we can also replace filename:join/1
with the cheaper filename:append/1, since we know that pathnames
have already been normalized.
|
|
It is faster and easier to use filename:split/1 and filename:join/1
than use four different 'filename' operations. Each call in the
original call first flattens the input list, and then traverse the
entire string to the end. So there are roughly 8 complete traversals.
In comparison, filename:split/1 will traverse the input string
twice (once to flatten, once to split) and filename:join/1 once.
For background on why this optimization is worthwhile, here is the
result of profiling the start-up of the run-time system on my computer
(done before this optimization):
$ $ERL_TOP/bin/erl -profile_boot
.
.
.
filename:join1/4 - 13573 : 13805 us
erlang:finish_loading/1 - 27 : 19963 us
filename:do_flatten/2 - 29337 : 49518 us
erlang:prepare_loading/2 - 49 : 52270 us
Note that filename:do_flatten/2 ends up in second place, almost
as expensive as erlang:prepare_loading/2. Your mileage may vary,
depending on the length of $ERL_TOP and the number of extra
directories added to the path using $ERL_LIBS.
|
|
* henrik/update-copyrightyear:
update copyright-year
|
|
On Windows, the pathnames for modules that are loaded early are
returned with mixed backslashes and slashes:
1> code:which(lists).
"C:\\Program Files\\erl8.0/lib/stdlib-2.7/ebin/lists.beam"
2>
Modules loaded later are fully normalized.
When starting the code_server, normalize the pathnames for all modules
that have been loaded so far.
|
|
|
|
Add functions to 'code' to allow loading of multiple modules
at once.
code:atomic_load(Modules) will load all modules at once, or fail
having loaded none of them. Since we cannot guarantee the atomicity if
there are modules with -on_load functions, the list of modules must
not contain any modules with an -on_load function.
Also, to make it possible to put an application into an inactive state
for as short time as possible, also add code:prepare_loading/1 and
code:finish_loading/1. They are used like this:
{ok,Prepared} = code:prepare_loading(Modules)
.
.
.
ok = code:finish_loading(Prepared)
code:ensure_modules_loaded/1 is useful as a pure optimization to
ensure that modules that will be needed soon have indeed been
loaded. It will not reload modules that have already been loaded and
it *will* accept modules that have an on_load function. Therefore, it
does not make sense to give any atomicity guarantees.
I did consider overloading the existing code:ensure_loaded/1
function, but rejected it because the return value is very
different. Having different forms of return values depending
on the types of arguments is confusing.
|
|
After loading a module without native code, it is still necessary
to call hipe_unified_loader:post_beam_load() to ensure that any
native calls to the module is done to the newly loaded module
(and not to a previous version of the module in native code).
Unfortunately, hipe_unified_loader:post_beam_load() can be slow
and most of the time it doesn't do anything because no previous
native code was loaded. Therefore, ad2962278f added a kludge using
the process dictionary to avoid calling post_beam_load() if no
native code at all has been loaded.
Remove the kludge by keeping track exactly of which modules that
have native code in the existing ets table. Also generalize
post_beam_load() to handle severals modules at once, since we
will soon need that functionality.
|
|
The main ets table kept by code_server contains several pieces
of information. Therefore, code_server:all_loaded/1 need to
filter the information in the table.
code_server:all_loaded/1 can be simplified if we use
ets:select/2. Currently, the filtering is done by filtering
away unwanted stuff ({sticky_dir,Mod} tuples). It is more
robust to filter on the stuff that we want to keep
({Mod,Path} tuples, where Mod is an atom) in case that we'll
add more auxiliary records to the table later.
|
|
* bjorn/kernel/clean-up-code_server:
code_server: Add specs for all exported functions
code_server: Add types to the state record
code_server: Don't export internal system_* functions
Simplify starting of code server
Remove first argument of code_server:call()
|
|
|
|
* maint:
Update documentation for code-loading functions
code: Correct the types for error returns
Eliminate run-time system crash in code:load_abs/1
|
|
|
|
There is no reason to export system_continue/3 and system_terminate/4
from code_server. Servers that use proc_lib and 'sys' to handle system
message do need those functions exported, but code_server contains a
modified copy of the system message handling code from 'sys', and that
code only make local calls to system_continue/3 and
system_terminate/4.
|
|
There is unnecessary knowledge about the -nostick option
in the 'kernel' module. -nostick can be handled entirely
in the 'code' module.
There is no need to have both code:start_link/0 and
code:start_link/1. code:start_link/0 is sufficient.
Also get rid of error handling for things that cannot happen:
The 'init' module has made sure that init:get_argument(root)
can't fail, so there is no need for any error-reporting code.
(See e1dc0aa4.)
We also don't need to test the return value from
code_server:start_link/1, because it will always return {ok,Pid}
(or crash).
|
|
Remove the first argument for code_server:call(). It makes no
sense. The only caller is 'code'.
Note that the code_server module is undocumented and that
code_server:call() is an internal helper function.
|
|
The run-time system would terminate if code:load_abs/1 was
called with a filename containing any non-latin1 characters.
The reason is that code_server would attempt to construct a
module name from the filename using list_to_atom/1 and that
atoms currently are limited to the latin1 character set.
But how should the error be reported?
I have decided to that the simplest and least confusing way
is to move the call to list_to_atom/1 to 'code' module and
let it crash the calling process. The resulting stack back
trace will make it clear what the reason for the crash was.
|
|
For some reason, there is a test in code_server that
hipe_unified_loader is loaded before trying to call it. The test was
added in R9B, but it is not clear why. Before starting the code
server, the 'code' module would always load hipe_unified_loader;
thus there is now way that the test can ever fail.
|
|
|
|
The following functions in the 'code' module only allow the
module argument to an atom:
load_file/1
load_binary/3
ensure_loaded/1
delete/1
purge/1
soft_purge/1
get_object_code/1
Therefore, there is no reason that the corresponding implementation in
code_server should allow the module to be either an atom or a
string. Only accept an atom and remove the helper functions to_atom/1
and do_mod_call/4.
|
|
by moving code from code_server to erts_code_purger.
This is more or less a copy-paste from code_server.erl
to erts_code_purger.erl. All the inner mechanics of
code:purge/1 and code:soft_purge/1 are unchanged.
|
|
In practice, it does not seem that code path cache can
improve performance. Looking for any file that is not found
will cause the cache to be rebuilt, which will negate any
gain of using the cache.
|
|
|
|
|
|
* egil/fix-cover-error_logger/OTP-12818:
cover: Unstick modules before loading remote
kernel: Add module name to sticky_dir error message
kernel: Remove ?line macros in error_logger_warn_SUITE
|
|
|
|
Make code_server be responsible for finding the architecture and deciding
whether to try to load native code, in order to avoid repeated calls to
system_info(hipe_architecture) and clumsy uses of try/catch to test if hipe
is enabled.
|
|
Commit ed06dd12ea74018b902a2c4c7924313d23cedb75 made pre-loaded
modules (such as erlang) sticky, and also made it impossible to
unstick them to prevent them from ever be reloaded.
It turns out that there are tools that may want to instrument
the erlang module (such as concuerror), so making it
impossible to unstick modules is harsh. Therefore, revert the
part of the commit that prevented unsticking the pre-loaded
modules.
|
|
Modules in the kernel, stdlib, and compiler applications are by
default "sticky", meaning that the code server will refuse to
re-load them.
The pre-loaded modules (those that are part of the run-time system
itself, such as 'erlang') are, however, not sticky. They used to be
sticky a long time ago when the pre-loaded modules were part of
the kernel application. Now they are part of the erts application.
Since re-loading a pre-loaded module can be catastrophic (especially
re-loading the 'erlang' module), the pre-loaded modules must be
sticky. Furthermore, it should not be allowed to unstick them.
The sticky_dir/1 test case in code_SUITE is never actually run and
is broken. Rewrite it.
|
|
The following sequence will NOT leave the code path unchanged:
code:add_path("/some/app/"),
.
.
.
code:del_path("/some/app/")
The reason is that code:add_path/1 will normalize the path name
(removing the trailing slash), while code:del_path/1 does not
normalize the path before searching for it in the code path.
|
|
* rickard/garbage_collect/OTP-11388:
Parallel check_process_code when code_server purge a module
Functionality for disabling garbage collection
Use asynchronous check_process_code in code_parallel_SUITE
Execution of system tasks in context of another process
Conflicts:
bootstrap/lib/kernel/ebin/hipe_unified_loader.beam
erts/preloaded/ebin/erlang.beam
erts/preloaded/ebin/erts_internal.beam
|
|
When the code_server is about to purge a module it will now issue
asynchronous check_process_code requests to all processes at once
instead of one at a time. These check_process_code operation can
execute in parallel.
|