aboutsummaryrefslogtreecommitdiffstats
path: root/system/doc/efficiency_guide/profiling.xml
diff options
context:
space:
mode:
Diffstat (limited to 'system/doc/efficiency_guide/profiling.xml')
-rw-r--r--system/doc/efficiency_guide/profiling.xml258
1 files changed, 258 insertions, 0 deletions
diff --git a/system/doc/efficiency_guide/profiling.xml b/system/doc/efficiency_guide/profiling.xml
new file mode 100644
index 0000000000..65d13408bc
--- /dev/null
+++ b/system/doc/efficiency_guide/profiling.xml
@@ -0,0 +1,258 @@
+<?xml version="1.0" encoding="latin1" ?>
+<!DOCTYPE chapter SYSTEM "chapter.dtd">
+
+<chapter>
+ <header>
+ <copyright>
+ <year>2001</year><year>2009</year>
+ <holder>Ericsson AB. All Rights Reserved.</holder>
+ </copyright>
+ <legalnotice>
+ The contents of this file are subject to the Erlang Public License,
+ Version 1.1, (the "License"); you may not use this file except in
+ compliance with the License. You should have received a copy of the
+ Erlang Public License along with this software. If not, it can be
+ retrieved online at http://www.erlang.org/.
+
+ Software distributed under the License is distributed on an "AS IS"
+ basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See
+ the License for the specific language governing rights and limitations
+ under the License.
+
+ </legalnotice>
+
+ <title>Profiling</title>
+ <prepared>Ingela Anderton</prepared>
+ <docno></docno>
+ <date>2001-11-02</date>
+ <rev></rev>
+ <file>profiling.xml</file>
+ </header>
+
+ <section>
+ <title>Do not guess about performance - profile</title>
+
+ <p>Even experienced software developers often guess wrong about where
+ the performance bottlenecks are in their programs.</p>
+
+ <p>Therefore, profile your program to see where the performance
+ bottlenecks are and concentrate on optimizing them.</p>
+
+ <p>Erlang/OTP contains several tools to help finding bottlenecks.</p>
+
+ <p><c>fprof</c> and <c>eprof</c> provide the most detailed information
+ about where the time is spent, but they significantly slow downs the
+ programs they profile.</p>
+
+ <p>If the program is too big to be profiled by <c>fprof</c> or <c>eprof</c>,
+ <c>cover</c> and <c>cprof</c> could be used to locate parts of the
+ code that should be more thoroughly profiled using <c>fprof</c> or
+ <c>eprof</c>.</p>
+
+ <p><c>cover</c> provides execution counts per line per process,
+ with less overhead than <c>fprof/eprof</c>. Execution counts can
+ with some caution be used to locate potential performance bottlenecks.
+ The most lightweight tool is <c>cprof</c>, but it only provides execution
+ counts on a function basis (for all processes, not per process).</p>
+ </section>
+
+ <section>
+ <title>Big systems</title>
+ <p>If you have a big system it might be interesting to run profiling
+ on a simulated and limited scenario to start with. But bottlenecks
+ have a tendency to only appear or cause problems when
+ there are many things going on at the same time, and when there
+ are many nodes involved. Therefore it is desirable to also run
+ profiling in a system test plant on a real target system.</p>
+ <p>When your system is big you do not want to run the profiling
+ tools on the whole system. You want to concentrate on processes
+ and modules that you know are central and stand for a big part of the
+ execution.</p>
+ </section>
+
+ <section>
+ <title>What to look for</title>
+ <p>When analyzing the result file from the profiling activity
+ you should look for functions that are called many
+ times and have a long "own" execution time (time excluded calls
+ to other functions). Functions that just are called very
+ many times can also be interesting, as even small things can add
+ up to quite a bit if they are repeated often. Then you need to
+ ask yourself what can I do to reduce this time. Appropriate
+ types of questions to ask yourself are: </p>
+ <list type="bulleted">
+ <item>Can I reduce the number of times the function is called?</item>
+ <item>Are there tests that can be run less often if I change
+ the order of tests?</item>
+ <item>Are there redundant tests that can be removed? </item>
+ <item>Is there some expression calculated giving the same result
+ each time? </item>
+ <item>Is there other ways of doing this that are equivalent and
+ more efficient?</item>
+ <item>Can I use another internal data representation to make
+ things more efficient? </item>
+ </list>
+ <p>These questions are not always trivial to answer. You might
+ need to do some benchmarks to back up your theory, to avoid
+ making things slower if your theory is wrong. See <seealso marker="#benchmark">benchmarking</seealso>.</p>
+ </section>
+
+ <section>
+ <title>Tools</title>
+
+ <section>
+ <title>fprof</title>
+ <p><c>fprof</c> measures the execution time for each function,
+ both own time i.e how much time a function has used for its
+ own execution, and accumulated time i.e. including called
+ functions. The values are displayed per process. You also get
+ to know how many times each function has been
+ called. <c>fprof</c> is based on trace to file in order to
+ minimize runtime performance impact. Using fprof is just a
+ matter of calling a few library functions, see fprof manual
+ page under the application tools.</p>
+ <p><c>fprof</c> was introduced in version R8 of Erlang/OTP. Its
+ predecessor <c>eprof</c> that is based on the Erlang trace BIFs,
+ is still available, see eprof manual page under the
+ application tools. Eprof shows how much time has been used by
+ each process, and in which function calls this time has been
+ spent. Time is shown as percentage of total time, not as
+ absolute time.</p>
+ </section>
+
+ <section>
+ <title>cover</title>
+ <p><c>cover</c>'s primary use is coverage analysis to verify
+ test cases, making sure all relevant code is covered.
+ <c>cover</c> counts how many times each executable line of
+ code is executed when a program is run. This is done on a per
+ module basis. Of course this information can be used to
+ determine what code is run very frequently and could therefore
+ be subject for optimization. Using cover is just a matter of
+ calling a few library functions, see cover manual
+ page under the application tools.</p>
+ </section>
+
+ <section>
+ <title>cprof</title>
+ <p><c>cprof</c> is something in between <c>fprof</c> and
+ <c>cover</c> regarding features. It counts how many times each
+ function is called when the program is run, on a per module
+ basis. <c>cprof</c> has a low performance degradation (versus
+ <c>fprof</c> and <c>eprof</c>) and does not need to recompile
+ any modules to profile (versus <c>cover</c>).</p>
+ </section>
+
+ <section>
+ <title>Tool summarization</title>
+ <table>
+ <row>
+ <cell align="center" valign="middle">Tool</cell>
+ <cell align="center" valign="middle">Results</cell>
+ <cell align="center" valign="middle">Size of result</cell>
+ <cell align="center" valign="middle">Effects on program execution time</cell>
+ <cell align="center" valign="middle">Records number of calls</cell>
+ <cell align="center" valign="middle">Records Execution time</cell>
+ <cell align="center" valign="middle">Records called by</cell>
+ <cell align="center" valign="middle">Records garbage collection</cell>
+ </row>
+ <row>
+ <cell align="left" valign="middle"><c>fprof </c></cell>
+ <cell align="left" valign="middle">per process to screen/file </cell>
+ <cell align="left" valign="middle">large </cell>
+ <cell align="left" valign="middle">significant slowdown </cell>
+ <cell align="left" valign="middle">yes </cell>
+ <cell align="left" valign="middle">total and own</cell>
+ <cell align="left" valign="middle">yes </cell>
+ <cell align="left" valign="middle">yes </cell>
+ </row>
+ <row>
+ <cell align="left" valign="middle"><c>eprof </c></cell>
+ <cell align="left" valign="middle">per process/function to screen/file </cell>
+ <cell align="left" valign="middle">medium </cell>
+ <cell align="left" valign="middle">significant slowdown </cell>
+ <cell align="left" valign="middle">yes </cell>
+ <cell align="left" valign="middle">only total </cell>
+ <cell align="left" valign="middle">no </cell>
+ <cell align="left" valign="middle">no </cell>
+ </row>
+ <row>
+ <cell align="left" valign="middle"><c>cover </c></cell>
+ <cell align="left" valign="middle">per module to screen/file</cell>
+ <cell align="left" valign="middle">small </cell>
+ <cell align="left" valign="middle">moderate slowdown</cell>
+ <cell align="left" valign="middle">yes, per line </cell>
+ <cell align="left" valign="middle">no </cell>
+ <cell align="left" valign="middle">no </cell>
+ <cell align="left" valign="middle">no </cell>
+ </row>
+ <row>
+ <cell align="left" valign="middle"><c>cprof </c></cell>
+ <cell align="left" valign="middle">per module to caller</cell>
+ <cell align="left" valign="middle">small </cell>
+ <cell align="left" valign="middle">small slowdown </cell>
+ <cell align="left" valign="middle">yes </cell>
+ <cell align="left" valign="middle">no </cell>
+ <cell align="left" valign="middle">no </cell>
+ <cell align="left" valign="middle">no </cell>
+ </row>
+ <tcaption></tcaption>
+ </table>
+ </section>
+ </section>
+
+ <section>
+ <marker id="benchmark"></marker>
+ <title>Benchmarking</title>
+
+ <p>The main purpose of benchmarking is to find out which
+ implementation of a given algorithm or function is the fastest.
+ Benchmarking is far from an exact science. Today's operating systems
+ generally run background tasks that are difficult to turn off.
+ Caches and multiple CPU cores doesn't make it any easier.
+ It would be best to run Unix-computers in single-user mode when
+ benchmarking, but that is inconvenient to say the least for casual
+ testing.</p>
+
+ <p>Benchmarks can measure wall-clock time or CPU time.</p>
+
+ <p><seealso marker="stdlib:timer#tc/3">timer:tc/3</seealso> measures
+ wall-clock time. The advantage with wall-clock time is that I/O,
+ swapping, and other activities in the operating-system kernel are
+ included in the measurements. The disadvantage is that the
+ the measurements will vary wildly. Usually it is best to run the
+ benchmark several times and note the shortest time - that time should
+ be the minimum time that is possible to achieve under the best of
+ circumstances.</p>
+
+ <p><seealso marker="erts:erlang#statistics/1">statistics/1</seealso>
+ with the argument <c>runtime</c> measures CPU time spent in the Erlang
+ virtual machine. The advantage is that the results are more
+ consistent from run to run. The disadvantage is that the time
+ spent in the operating system kernel (such as swapping and I/O)
+ are not included. Therefore, measuring CPU time is misleading if
+ any I/O (file or sockets) are involved.</p>
+
+ <p>It is probably a good idea to do both wall-clock measurements and
+ CPU time measurements.</p>
+
+ <p>Some additional advice:</p>
+
+ <list type="bulleted">
+ <item>The granularity of both types measurement could be quite
+ high so you should make sure that each individual measurement
+ lasts for at least several seconds.</item>
+
+ <item>To make the test fair, each new test run should run in its own,
+ newly created Erlang process. Otherwise, if all tests runs in the
+ same process, the later tests would start out with larger heap sizes
+ and therefore probably does less garbage collections. You could
+ also consider restarting the Erlang emulator between each test.</item>
+
+ <item>Do not assume that the fastest implementation of a given algorithm
+ on computer architecture X also is the fast on computer architecture Y.</item>
+
+ </list>
+ </section>
+</chapter>
+