8 files changed, 112 insertions, 73 deletions
diff --git a/erts/emulator/internal_doc/CarrierMigration.md b/erts/emulator/internal_doc/CarrierMigration.md
index 3a796d11b7..40f6031ca8 100644
--- a/erts/emulator/internal_doc/CarrierMigration.md
+++ b/erts/emulator/internal_doc/CarrierMigration.md
@@ -1,6 +1,9 @@
 Carrier Migration
 =================
 
+Introduction
+------------
+
 The ERTS memory allocators manage memory blocks in two types of raw
 memory chunks. We call these chunks of raw memory
 *carriers*. Single-block carriers which only contain one large block,
@@ -34,8 +37,7 @@ Solution
 --------
 
 In order to prevent scenarios like this we've implemented support for
-migration of multi-block carriers between allocator instances of the
-same type.
+migration of multi-block carriers between allocator instances.
 
 ### Management of Free Blocks ###
 
@@ -130,10 +132,6 @@ threads may have references to it via the pool.
 
 ### Migration ###
 
-There exists one pool for each allocator type enabling migration of
-carriers between scheduler specific allocator instances of the same
-allocator type.
-
 Each allocator instance keeps track of the current utilization of its
 multi-block carriers. When the total utilization falls below the "abandon
 carrier utilization limit" it starts to inspect the utilization of the
@@ -146,11 +144,11 @@ Since the carrier has been unlinked from the data structure of
 available free blocks, no more allocations will be made in the
 carrier.
 
-The allocator instance that created a carrier is called its **owner**.
+The allocator instance that created a carrier is called its *owner*.
 Ownership never changes.
 
 The allocator instance that has the responsibility to perform deallocations in a
-carrier is called its **employer**. The employer may also perform allocations if
+carrier is called its *employer*. The employer may also perform allocations if
 the carrier is not in the pool. Employment may change when a carrier is fetched from
 or inserted into the pool.
 
@@ -158,14 +156,14 @@ Deallocations in a carrier, while it remains in the pool, is always performed
 the owner. That is, all pooled carriers are employed by their owners.
 
 Each carrier has an atomic word containing a pointer to the employing allocator
-instance and three bit flags; IN_POOL, BUSY and HOMECOMING.
+instance and three bit flags; IN\_POOL, BUSY and HOMECOMING.
 
 When fetching a carrier from the pool, employment may change and further
 deallocations in the carrier will be redirected to the new
 employer using the delayed dealloc functionality.
 
 When a foreign allocator instance abandons a carrier back into the pool, it will
-also pass it back to its **owner** using the delayed dealloc queue. When doing
+also pass it back to its *owner* using the delayed dealloc queue. When doing
 this it will set the HOMECOMING bit flag to mark it as "enqueued". The owner
 will later clear the HOMECOMING bit when the carrier is dequeued. This mechanism
 prevents a carrier from being enqueued again before it has been dequeued.
@@ -185,14 +183,14 @@ back to the owner for deallocation using the delayed dealloc functionality.
 
 In short:
 
-* The allocator instance that created a carrier **owns** it.
-* An empty carrier is always deallocated by its **owner**.
-* **Ownership** never changes.
-* The allocator instance that uses a carrier **employs** it.
-* An **employer** can abandon a carrier into the pool.
+* The allocator instance that created a carrier *owns* it.
+* An empty carrier is always deallocated by its *owner*.
+* *Ownership* never changes.
+* The allocator instance that uses a carrier *employs* it.
+* An *employer* can abandon a carrier into the pool.
 * Pooled carriers are not allocated from.
-* Pooled carriers are always **employed** by their **owner**.
-* **Employment** can only change from **owner** to a foreign allocator
+* Pooled carriers are always *employed* by their *owner*.
+* *Employment* can only change from *owner* to a foreign allocator
   when a carrier is fetched from the pool.
 
 
@@ -208,8 +206,8 @@ limited. We only inspect a limited number of carriers. If none of
 those carriers had a free block large enough to satisfy the allocation
 request, the search will fail. A carrier in the pool can also be BUSY
 if another thread is currently doing block deallocation work on the
-carrier. A BUSY carrier will also be skipped by the search as it can
-not satisfy the request. The pool is lock-free and we do not want to
+carrier. A BUSY carrier will also be skipped by the search as it cannot
+satisfy the request. The pool is lock-free and we do not want to
 block, waiting for the other thread to finish.
 
 ### The bad cluster problem ###
@@ -234,7 +232,7 @@ carrier. When the cluster gets to the same size as the search limit,
 all searches will essentially fail.
 
 To counter the "bad cluster" problem and also ease the contention, the
-search will now always start by first looking at the allocators **own**
+search will now always start by first looking at the allocators *own*
 carriers. That is, carriers that were initially created by the
 allocator itself and later had been abandoned to the pool. If none of
 our own abandoned carrier would do, then the search continues into the
@@ -287,11 +285,3 @@ reduced using the `aoffcbf` strategy. A trade off between memory
 consumption and performance is however inevitable, and it is up to
 the user to decide what is most important. 
 
-Further work
-------------
-
-It would be quite easy to extend this to allow migration of multi-block
-carriers between all allocator types. More or less the only obstacle
-is maintenance of the statistics information.
-
-
diff --git a/erts/emulator/internal_doc/CodeLoading.md b/erts/emulator/internal_doc/CodeLoading.md
index 151b9cd57c..0b2e3070e7 100644
--- a/erts/emulator/internal_doc/CodeLoading.md
+++ b/erts/emulator/internal_doc/CodeLoading.md
@@ -45,7 +45,7 @@ free to schedule other work while the second loader is waiting. (See
 `erts_release_code_write_permission`).
 
 The ability to prepare several modules in parallel is not currently
-used as almost all code loading is serialized by the code_server
+used as almost all code loading is serialized by the code\_server
 process. The BIF interface is however prepared for this.
 
       erlang:prepare_loading(Module, Code) -> LoaderState
@@ -71,8 +71,8 @@ structures. These *code access structures* are
 
 * Export table. One entry for every exported function.
 * Module table. One entry for each loaded module.
-* "beam_catches". Identifies jump destinations for catch instructions.
-* "beam_ranges". Map code address to function and line in source file.
+* "beam\_catches". Identifies jump destinations for catch instructions.
+* "beam\_ranges". Map code address to function and line in source file.
 
 The most frequently used of these structures is the export table that
 is accessed in run time for every executed external function call to
diff --git a/erts/emulator/internal_doc/CountingInstructions.md b/erts/emulator/internal_doc/CountingInstructions.md
new file mode 100644
index 0000000000..d4b1213d00
--- /dev/null
+++ b/erts/emulator/internal_doc/CountingInstructions.md
@@ -0,0 +1,53 @@
+Counting Instructions
+=====================
+
+Here is an example that shows how to count how many times each
+instruction is executed:
+
+    $ (cd erts/emulator && make icount)
+     MAKE	icount
+    make[1]: Entering directory `/home/uabbgus/otp/erts/emulator'
+      .
+      .
+      .
+    make[1]: Leaving directory `/home/uabbgus/otp/erts/emulator'
+    $ cat t.erl
+    -module(t).
+    -compile([export_all,nowarn_export_all]).
+
+    count() ->
+        erts_debug:ic(fun benchmark/0).
+
+    benchmark() ->
+        %% Run dialyzer.
+        Root = code:root_dir(),
+        Wc1 = filename:join(Root, "lib/{kernel,stdlib}/ebin/*.beam"),
+        Wc2 = filename:join(Root, "erts/preloaded/ebin/*.beam"),
+        Files = filelib:wildcard(Wc1) ++ filelib:wildcard(Wc2),
+        Opts = [{analysis_type,plt_build},{files,Files},{get_warnings,true}],
+        dialyzer:run(Opts).
+    $ $ERL_TOP/bin/cerl -icount
+    Erlang/OTP 22 [RELEASE CANDIDATE 1] [erts-10.2.4] [source-ac0d451] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe] [instruction-counting]
+
+    Eshell V10.2.4  (abort with ^G)
+    1> c(t).
+    {ok,t}
+    2> t:count().
+               0 badarg_j
+               0 badmatch_x
+               0 bs_add_jsstx
+               0 bs_context_to_binary_x
+                .
+                .
+                .
+       536461394 move_call_last_yfQ
+       552405176 allocate_tt
+       619920327 i_is_eq_exact_immed_frc
+       636419163 is_nonempty_list_allocate_frtt
+       641859278 i_get_tuple_element_xPx
+       678196718 move_return_c
+       786289914 is_tagged_tuple_frAa
+       865826424 i_call_f
+    Total: 20728870321
+    []
+    3>
diff --git a/erts/emulator/internal_doc/GarbageCollection.md b/erts/emulator/internal_doc/GarbageCollection.md
index 1d9e3f4160..a1627b3233 100644
--- a/erts/emulator/internal_doc/GarbageCollection.md
+++ b/erts/emulator/internal_doc/GarbageCollection.md
@@ -1,6 +1,6 @@
 # Erlang Garbage Collector
 
-Erlang manages dynamic memory with a [tracing garbage collector](https://en.wikipedia.org/wiki/Tracing_garbage_collection). More precisely a per process generational semi-space copying collector using [Cheney's](#cheney) copy collection algorithm together with a global large object space.
+Erlang manages dynamic memory with a [tracing garbage collector](https://en.wikipedia.org/wiki/Tracing_garbage_collection). More precisely a per process generational semi-space copying collector using Cheney's copy collection algorithm together with a global large object space. (See C. J. Cheney in [References](#references).)
 
 ## Overview
 
@@ -12,12 +12,11 @@ Terms are created on the heap by evaluating expressions. There are two major typ
 
 Let's look at an example that returns a tuple with the newly created data.
 
-```erlang
-data(Foo) ->
-   Cons = [42|Foo],
-   Literal = {text, "hello world!"},
-   {tag, Cons, Literal}.
-```
+
+    data(Foo) ->
+       Cons = [42|Foo],
+       Literal = {text, "hello world!"},
+       {tag, Cons, Literal}.
 
 In this example we first create a new cons cell with an integer and a tuple with some text. Then a tuple of size three wrapping the other values with an atom tag is created and returned.
 
@@ -25,7 +24,6 @@ On the heap tuples require a word size for each of its elements as well as for t
 
 Compiling this code to beam assembly (`erlc -S`) shows exactly what is happening.
 
-```erlang
     ...
     {test_heap,6,1}.
     {put_list,{integer,42},{x,0},{x,1}}.
@@ -34,9 +32,8 @@ Compiling this code to beam assembly (`erlc -S`) shows exactly what is happening
     {put,{x,1}}.
     {put,{literal,{text,"hello world!"}}}.
     return.
-```
 
-Looking at the assembler code we can see three things; The heap requirement in this function turns out to be only six words, as seen by the `{test_heap,6,1}` instruction. All the allocations are combined to a single instruction. The bulk of the data `{text, "hello world!"}` is a *literal*. Literals, sometimes referred to as constants, are not allocated in the function since they are a part of the module and allocated at load time.
+Looking at the assembler code we can see three things: The heap requirement in this function turns out to be only six words, as seen by the `{test_heap,6,1}` instruction. All the allocations are combined to a single instruction. The bulk of the data `{text, "hello world!"}` is a *literal*. Literals, sometimes referred to as constants, are not allocated in the function since they are a part of the module and allocated at load time.
 
 If there is not enough space available on the heap to satisfy the `test_heap` instructions request for memory, then a garbage collection is initiated. It may happen immediately in the `test_heap` instruction, or it can be delayed until a later time depending on what state the process is in. If the garbage collection is delayed, any memory needed will be allocated in heap fragments. Heap fragments are extra memory blocks that are a part of the young heap, but are not allocated in the contigious area where terms normally reside. See [The young heap](#the-young-heap) for more details.
 
@@ -50,11 +47,9 @@ It follows all the pointers from the root-set to the heap and copies each term w
 
 After the header word has been copied a [*move marker*](https://github.com/erlang/otp/blob/OTP-18.0/erts/emulator/beam/erl_gc.h#L45-L46) is destructively placed in it pointing to the term in the *to space*. Any other term that points to the already moved term will [see this move marker](https://github.com/erlang/otp/blob/OTP-18.0/erts/emulator/beam/erl_gc.c#L1125) and copy the referring pointer instead. For example, if the have the following Erlang code:
 
-```erlang
-foo(Arg) ->
-    T = {test, Arg},
-    {wrapper, T, T, T}.
-```
+    foo(Arg) ->
+        T = {test, Arg},
+        {wrapper, T, T, T}.
 
 Only one copy of T exists on the heap and during the garbage collection only the first time T is encountered will it be copied.
 
@@ -86,15 +81,15 @@ In the next garbage collection, any pointers to the old heap will be ignored and
 
 Generational garbage collection aims to increase performance at the expense of memory. This is achieved because only the young, smaller, heap is considered in most garbage collections.
 
-The generational [hypothesis](#ungar) predicts that most terms tend to die young, and for an immutable language such as Erlang, young terms die even faster than in other languages. So for most usage patterns the data in the new heap will die very soon after it is allocated. This is good because it limits the amount of data copied to the old heap and also because the garbage collection algorithm used is proportional to the amount of live data on the heap.
+The generational hypothesis predicts that most terms tend to die young (see D. Ungar in [References](#references)), and for an immutable language such as Erlang, young terms die even faster than in other languages. So for most usage patterns the data in the new heap will die very soon after it is allocated. This is good because it limits the amount of data copied to the old heap and also because the garbage collection algorithm used is proportional to the amount of live data on the heap.
 
 One critical issue to note here is that any term on the young heap can reference terms on the old heap but *no* term on the old heap may refer to a term on the young heap. This is due to the nature of the copy algorithm. Anything referenced by an old heap term is not included in the reference tree, root-set and its followers, and hence is not copied. If it was, the data would be lost, fire and brimstone would rise to cover the earth. Fortunately, this comes naturally for Erlang because the terms are immutable and thus there can be no pointers modified on the old heap to point to the young heap.
 
-To reclaim data from the old heap, both young and old heaps are included during the collection and copied to a common *to space*. Both the *from space* of the young and old heap are then deallocated and the procedure will start over from the beginning. This type of garbage collection is called a full sweep and is triggered when the size of the area under the high-watermark is larger than the size of the free area of the old heap. It can also be triggered by doing a manual call to [erlang:garbage_collect()](http://erlang.org/doc/man/erlang.html#garbage_collect-0), or by running into the young garbage collection limit set by [spawn_opt(fun(),[{fullsweep_after, N}])](http://erlang.org/doc/man/erlang.html#spawn_opt-4) where N is the number of young garbage collections to do before forcing a garbage collection of both young and old heap.
+To reclaim data from the old heap, both young and old heaps are included during the collection and copied to a common *to space*. Both the *from space* of the young and old heap are then deallocated and the procedure will start over from the beginning. This type of garbage collection is called a full sweep and is triggered when the size of the area under the high-watermark is larger than the size of the free area of the old heap. It can also be triggered by doing a manual call to [erlang:garbage_collect()](http://erlang.org/doc/man/erlang.html#garbage_collect-0), or by running into the young garbage collection limit set by [spawn\_opt(fun(),[{fullsweep\_after, N}\])](http://erlang.org/doc/man/erlang.html#spawn_opt-4) where N is the number of young garbage collections to do before forcing a garbage collection of both young and old heap.
 
 ## The young heap
 
-The young heap, or the allocation heap, consists of the stack and heap as described in the Overview. However, it also includes any heap fragments that are attached to the heap. All of the heap fragments are considered to be above the high-watermark and part of the young generation. Heap fragments contain terms that either did not fit on the heap, or were created by another process and then attached to the heap. For instance if the bif binary_to_term created a term which does not fit on the current heap without doing a garbage collection, it will create a heap-fragment for the term and then schedule a garbage collection for later. Also if a message is sent to the process, the payload may be placed in a heap-fragment and that fragment is added to young heap when the message is matched in a receive clause.
+The young heap, or the allocation heap, consists of the stack and heap as described in the Overview. However, it also includes any heap fragments that are attached to the heap. All of the heap fragments are considered to be above the high-watermark and part of the young generation. Heap fragments contain terms that either did not fit on the heap, or were created by another process and then attached to the heap. For instance if the bif `binary_to_term/1` created a term which does not fit on the current heap without doing a garbage collection, it will create a heap-fragment for the term and then schedule a garbage collection for later. Also if a message is sent to the process, the payload may be placed in a heap-fragment and that fragment is added to young heap when the message is matched in a receive clause.
 
 This procedure differs from how it worked prior to Erlang/OTP 19.0. Before 19.0, only a contiguous memory block where the young heap and stack resided was considered to be part of the young heap. Heap fragments and messages were immediately copied into the young heap before they could be inspected by the Erlang program. The behaviour introduced in 19.0 is superior in many ways - most significantly it reduces the number of necessary copy operations and the root set for garbage collection.
 
@@ -118,21 +113,19 @@ The old heap is always one step ahead in the heap growth stages than the young h
 
 When garbage collecting a heap (young or old) all literals are left in place and not copied. To figure out if a term should be copied or not when doing a garbage collection the following pseudo code is used:
 
-```c
-if (erts_is_literal(ptr) || (on_old_heap(ptr) && !fullsweep)) {
-  /* literal or non fullsweep - do not copy */
-} else {
-  copy(ptr);
-}
-```
+    if (erts_is_literal(ptr) || (on_old_heap(ptr) && !fullsweep)) {
+      /* literal or non fullsweep - do not copy */
+    } else {
+      copy(ptr);
+    }
 
 The [`erts_is_literal`](https://github.com/erlang/otp/blob/OTP-19.0/erts/emulator/beam/global.h#L1452-L1465) check works differently on different architectures and operating systems.
 
-On 64 bit systems that allow mapping of unreserved virtual memory areas (most operating systems except Windows), an area of size 1 GB (by default) is mapped and then all literals are placed within that area. Then all that has to be done to determine if something is a literal or not is [two quick pointer checks](https://github.com/erlang/otp/blob/OTP-19.0/erts/emulator/beam/erl_alloc.h#L322-L324). This system relies on the fact that a memory page that has not been touched yet does not take any actual space. So even if 1 GB of virtual memory is mapped, only the memory which is actually needed for literals is allocated in ram. The size of the literal area is configurable through the +MIscs erts_alloc option.
+On 64 bit systems that allow mapping of unreserved virtual memory areas (most operating systems except Windows), an area of size 1 GB (by default) is mapped and then all literals are placed within that area. Then all that has to be done to determine if something is a literal or not is [two quick pointer checks](https://github.com/erlang/otp/blob/OTP-19.0/erts/emulator/beam/erl_alloc.h#L322-L324). This system relies on the fact that a memory page that has not been touched yet does not take any actual space. So even if 1 GB of virtual memory is mapped, only the memory which is actually needed for literals is allocated in ram. The size of the literal area is configurable through the +MIscs erts\_alloc option.
 
 On 32 bit systems, there is not enough virtual memory space to allocate 1 GB for just literals, so instead small 256 KB sized literal regions are created on demand and a card mark bit-array of the entire 32 bit memory space is then used to determine if a term is a literal or not. Since the total memory space is only 32 bits, the card mark bit-array is only 256 words large. On a 64 bit system the same bit-array would have to be 1 tera words large, so this technique is only viable on 32 bit systems. Doing [lookups in the array](https://github.com/erlang/otp/blob/OTP-19.0/erts/emulator/beam/erl_alloc.h#L316-L319) is a little more expensive then just doing the pointer checks that can be done in 64 bit systems, but not extremely so.
 
-On 64 bit windows, on which erts_alloc cannot do unreserved virtual memory mappings, a [special tag](https://github.com/erlang/otp/blob/OTP-19.0/erts/emulator/beam/erl_term.h#L59) within the Erlang term object is used to determine if something [is a literal or not](https://github.com/erlang/otp/blob/OTP-19.0/erts/emulator/beam/erl_term.h#L248-L252). This is very cheap, however, the tag is only available on 64 bit machines, and it is possible to do a great deal of other nice optimizations with this tag in the future (like for instance a more compact list implementation) so it is not used on operating systems where it is not needed.
+On 64 bit windows, on which erts\_alloc cannot do unreserved virtual memory mappings, a [special tag](https://github.com/erlang/otp/blob/OTP-19.0/erts/emulator/beam/erl_term.h#L59) within the Erlang term object is used to determine if something [is a literal or not](https://github.com/erlang/otp/blob/OTP-19.0/erts/emulator/beam/erl_term.h#L248-L252). This is very cheap, however, the tag is only available on 64 bit machines, and it is possible to do a great deal of other nice optimizations with this tag in the future (like for instance a more compact list implementation) so it is not used on operating systems where it is not needed.
 
 This behaviour is different from how it worked prior to Erlang/OTP 19.0. Before 19.0 the literal check was done by checking if the pointer pointed to the young or old heap block. If it did not, then it was considered a literal. This lead to considerable overhead and strange memory usage scenarios, so it was removed in 19.0.
 
@@ -182,6 +175,8 @@ Using `on_heap` will force all messages to be part of on the young heap which wi
 
 Which one of these strategies is best depends a lot on what the process is doing and how it interacts with other processes. So, as always, profile the application and see how it behaves with the different options.
 
-   <a name="cheney">[1]</a>: C. J. Cheney. A nonrecursive list compacting algorithm. Commun. ACM, 13(11):677–678, Nov. 1970.
+## References
+
+C. J. Cheney. A nonrecursive list compacting algorithm. Commun. ACM, 13(11):677–678, Nov. 1970.
 
-   <a name="ungar">[2]</a>: D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. SIGSOFT Softw. Eng. Notes, 9(3):157–167, Apr. 1984.
+D. Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. SIGSOFT Softw. Eng. Notes, 9(3):157–167, Apr. 1984.
diff --git a/erts/emulator/internal_doc/PTables.md b/erts/emulator/internal_doc/PTables.md
index 6fe0e7665d..ef61963a40 100644
--- a/erts/emulator/internal_doc/PTables.md
+++ b/erts/emulator/internal_doc/PTables.md
@@ -85,13 +85,13 @@ following:
 3.  Depending on use, issue appropriate memory barrier.
 
     A common barrier used is a barrier with acquire semantics. On
-    x86/x86_64 this maps to a compiler barrier preventing the compiler
+    x86/x86\_64 this maps to a compiler barrier preventing the compiler
     to reorder instructions, but on other hardware often some kind of
     light weight hardware memory barrier is also needed.
 
     When comparing with a locked approach, at least one heavy weight
     memory barrier will be issued when locking the lock on most, if
-    not all, hardware architectures (including x86/x86_64), and often
+    not all, hardware architectures (including x86/x86\_64), and often
     some kind of light weight memory barrier will be issued when
     unlocking the lock. 
 
diff --git a/erts/emulator/internal_doc/SuperCarrier.md b/erts/emulator/internal_doc/SuperCarrier.md
index acf722ea37..f52c6613d5 100644
--- a/erts/emulator/internal_doc/SuperCarrier.md
+++ b/erts/emulator/internal_doc/SuperCarrier.md
@@ -5,7 +5,7 @@ A super carrier is large memory area, allocated at VM start, which can
 be used during runtime to allocate normal carriers from.
 
 The super carrier feature was introduced in OTP R16B03. It is
-enabled with command line option +MMscs <size in Mb>
+enabled with command line option +MMscs &lt;size in Mb&gt;
 and can be configured with other options.
 
 Problem
@@ -65,7 +65,7 @@ carrier is full.
 
 ### Implementation ###
 
-The entire super carrier implementation is kept in erl_mmap.c. The
+The entire super carrier implementation is kept in erl\_mmap.c. The
 name suggest that it can be viewed as our own mmap implementation.
 
 A super carrier needs to satisfy two slightly different kinds of
@@ -98,8 +98,8 @@ other.
 
 ### Data structures ###
 
-The MBC area is called **sa** as in super aligned and the SBC area is
-called **sua** as in super un-aligned.
+The MBC area is called *sa* as in super aligned and the SBC area is
+called *sua* as in super un-aligned.
 
 Note that the "super" in super alignment and the "super" in super
 carrier has nothing to do with each other. We could have choosen
@@ -128,7 +128,7 @@ down or up.
 We need to keep track of all the free segments in order to reuse them
 for new carrier allocations. One initial idea was to use the same
 mechanism that is used to keep track of free blocks within MBCs
-(alloc_util and the different strategies). However, that would not be
+(alloc\_util and the different strategies). However, that would not be
 as straight forward as one can think and can also waste quite a lot of
 memory as it uses prepended block headers. The granularity of the
 super carrier is one memory page (usually 4kb). We want to allocate
diff --git a/erts/emulator/internal_doc/Tracing.md b/erts/emulator/internal_doc/Tracing.md
index 7f97f64765..196ae0dd4e 100644
--- a/erts/emulator/internal_doc/Tracing.md
+++ b/erts/emulator/internal_doc/Tracing.md
@@ -37,6 +37,7 @@ what different type of break actions that are enabled.
 
 Same Same but Different
 -----------------------
+
 Even though `trace_pattern` use the same technique as the non-blocking
 code loading with replicated generations of data structures and an
 atomic switch, the implementations are quite separate from each
@@ -72,6 +73,7 @@ aligned write operation on all hardware architectures we use.
 
 Adding a new Breakpoint
 -----------------------
+
 This is a simplified sequence describing what `trace_pattern` goes
 through when adding a new breakpoint.
 
@@ -82,7 +84,7 @@ through when adding a new breakpoint.
    instruction word in the breakpoint.
 
 3. Write a pointer to the breakpoint at offset -4 from the first
-   instruction "func_info" header.
+   instruction "func\_info" header.
 
 4. Set the staging part of the breakpoint as enabled with specified
    breakpoint data.
@@ -139,7 +141,7 @@ and removing breakpoints.
 
 2. Allocate new breakpoint structures with a disabled active part and
    the original beam instruction. Write a pointer to the breakpoint in
-   "func_info" header at offset -4.
+   "func\_info" header at offset -4.
 
 3. Update the staging part of all affected breakpoints. Disable
    breakpoints that are to be removed.
diff --git a/erts/emulator/internal_doc/beam_makeops.md b/erts/emulator/internal_doc/beam_makeops.md
index 1da8d2ab05..2880099b70 100644
--- a/erts/emulator/internal_doc/beam_makeops.md
+++ b/erts/emulator/internal_doc/beam_makeops.md
@@ -403,7 +403,7 @@ A line with `//` is also a comment.  It is recommended to only
 use this style of comments in files that define implementations of
 instructions.
 
-A long line can be broken into shorter lines by a placing a`\` before
+A long line can be broken into shorter lines by a placing a `\` before
 the newline.
 
 ### Variable definitions ###
@@ -1159,7 +1159,6 @@ implementation of `gen_element()`:
 
         return op;
     }
-}
 
 ### Defining the implementation ###
 
@@ -1452,7 +1451,7 @@ optionally additional heap space.
 
 ##### The NEXT_INSTRUCTION pre-bound variable #####
 
-The NEXT_INSTRUCTION is a pre-bound variable that is available in
+The NEXT\_INSTRUCTION is a pre-bound variable that is available in
 all instructions.  It expands to the address of the next instruction.
 
 Here is an example:
@@ -1545,7 +1544,7 @@ register, the pointer will no longer be valid.  (Y registers are
 stored on the stack.)
 
 In those circumstances, `$REFRESH_GEN_DEST()` must be invoked
-to set up the pointer again.  **beam\_makeops** will notice
+to set up the pointer again. **beam\_makeops** will notice
 if there is a call to a function that does a garbage collection and
 `$REFRESH_GEN_DEST()` is not called.