Why Test

20022009 Ericsson AB. All Rights Reserved. The contents of this file are subject to the Erlang Public License, Version 1.1, (the "License"); you may not use this file except in compliance with the License. You should have received a copy of the Erlang Public License along with this software. If not, it can be retrieved online at http://www.erlang.org/. Software distributed under the License is distributed on an "AS IS" basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License for the specific language governing rights and limitations under the License. Why Test Siri Hansen

Goals

It's not possible to prove that a program is correct by testing. On the contrary, it has been formally proven that it is impossible to prove programs in general by testing. Theoretical program proofs or plain examination of code may be viable options for those that wish to certify that a program is correct. The test server, as it is based on testing, cannot be used for certification. Its intended use is instead to (cost effectively) find bugs. A successful test suite is one that reveals a bug. If a test suite results in Ok, then we know very little that we didn't know before.

What to test?

There are many kinds of test suites. Some concentrate on calling every function in the interface to some module or server. Some other do the same, but uses all kinds of illegal parameters, and verifies that the server stays alive and rejects the requests with reasonable error codes. Some test suites simulate an application (typically consisting of a few modules of an application), some try to do tricky requests in general, some test suites even test internal functions.

Another interesting category of test suites are the ones that check that fixed bugs don't reoccur. When a bugfix is introduced, a test case that checks for that specific bug should be written and submitted to the affected test suite(s).

Aim for finding bugs. Write whatever test that has the highest probability of finding a bug, now or in the future. Concentrate more on the critical parts. Bugs in critical subsystems are a lot more expensive than others.

Aim for functionality testing rather than implementation details. Implementation details change quite often, and the test suites should be long lived. Often implementation details differ on different platforms and versions. If implementation details have to be tested, try to factor them out into separate test cases. Later on these test cases may be rewritten, or just skipped.

Also, aim for testing everything once, no less, no more. It's not effective having every test case fail just because one function in the interface changed.

How much to test

There is a unix shell script that counts the number of non commented words (lines and characters too) of source code in each application's test directory and divides with the number of such source words in the src directory. This is a measure of how much test code there is.

There has been much debate over how much test code, compared to production code, should be written in a project. More test code finds more bugs, but test code needs to be maintained just like the production code, and it's expensive to write it in the first place. In several articles from relatively mature software organizations that I have read, the amount of test code has been about the same as the production code.

In OTP, at the time of writing, few applications come even close to this, some have no test code at all.

Full coverage

It is possible to cover compile the modules being tested before running the test suites. Doing so displays which branches of the code that are tested by the test suite, and which are not. Many use this as a measure of a good test suite. When every single line of source code is covered once by the test suite, the test suite is finished.

A coverage of 100% still proves nothing, though. It doesn't mean that the code is error free, that everything is tested. For instance, if a function contains a division, it has to be executed at least twice. Once with parameters that cause division by zero, and once with other parameters.

High degree of coverage is good of course, it means that no major parts of the code has been left untested. It's another question whether it's cost effective. You're only likely to find 50% more bugs when going from 67% to 100% coverage, but the work (cost) is maybe 200% as large, or more, because reaching all of those obscure branches is usually complicated.

Again, the reason for testing with the test server is to find bugs, not to create certificates of valid code. Maximizing the number of found bugs per hour probably means not going for 100% coverage. For some module the optimum may be 70%, for some other maybe 250%. 100% shouldn't be a goal in itself.

User interface testing

It is very difficult to do sensible testing of user interfaces, especially the graphic ones. The test server has some support for capturing the text I/O that goes to the user, but none for graphics. There are several tools on the market that help with this.