CodeCoverage
From APIDesign
"When there is enough code coverage?" was one of the questions asked by my 2005 post about Test Patterns in Java. Here is an extracted answer, prefaced by an executive summary.
Contents |
There is no such thing as 100% coverage!
Whenever somebody starts arguing about CodeCoverage and especially measuring it, remember:
State of a program is wide
The state of a system/program is very wide. To claim one achieves 100% coverage, one needs to test it all. Hence repeat:
- testing that each method is covered isn't enough
- testing that each line is covered isn't enough
- testing that each branch is visited isn't enough
All Function Arguments
Nobody tests all possible arguments of a function. But unless it is done, we can't claim 100% coverage, because:
- behavior of a function depends on all memory values it reads
- all combination of read values should be covered
- repeating each test that deals with longs Long.MAX_VALUE times would be pretty time consuming
- nobody has time to test it all
Systems are not Deterministic
Most of the critical systems are multi threaded. As such they are in a risk of deadlocks or race conditions. Unless you test for that, you can't claim 100% coverage:
- Java is not single threaded - GC, finalizer, etc. run in parallel by default
- JavaScript gets non-determinism via XMLHttpResource, NodeJS is asynchronous (e.g. non-deterministic) by default
- simulating all possible race conditions is hard
Executive Summary
Please memorize and repeat all the time: there is no such thing as 100% coverage!
When there is enough tests?
While writing tests, people can ask: how many of them should be written? The simple answer is to write tests while they are useful. The more precise, more complex and less clear answer is going to be covered in this chapter.
There are various tools out there that help to measure test coverage. NetBeans project selected emma for measuring the coverage of our application code by our tests. When invoked it instruments the application code and invokes automated tests on it. While running, it collects information about all called methods, visited classes and lines and then it shows a summary in a web browser.
Counting coverage by visited methods is very rough criteria, but it can be surprisingly hard to get close to 100%. But even if you succeed, there is no guarantee that the resulting application code works correctly. Every methods has a few input parameters, and knowing that it succeeded once with one selection of them, does not say anything about the other cases.
Much better is to count the coverage by branches or lines. When there is a
if (...) { x(); } else { y(); }
statement in code of your method, you want to be sure that both methods, x and y will be called. The emma tool supports this and by helping us to be sure that every line is visited, it gives us confidence that our application code does not contain useless lines.
Still, the fact that a line is visited once, does not mean that our application code is not buggy.
private sum = 10; public int add(int x) { sum += x; } public int percentage(int howMuch) { return 100 * howMuch / sum; }
It is good if both methods get executed, and fine if we test them with various parameters - still we can get an error if we call
add (-10); percentage(5);
because the sum will be zero and division by zero is forbidden. To be sure that our application is not vulnerable to problems like this, we would have to test each method in each possible state of memory it depends on (e.g. each value of sum variable) and that would give us the ultimate proof that our application code works correctly in a single threaded environment.
But there is another problem - Java is not single threaded. A lot of applications start new threads by themselves, and even if they do not, there is the AWT event dispatch thread, the finalizer thread, etc. So one has to count on some amount of non-determinism. Sometimes the garbage collector just kicks in and removes some "unneeded" objects from memory, which can change the behavior of the application - we used to have a never ending loop, which could be simulated only if two mozilla browsers and an evolution client was running as then the memory was small enough to invoke the garbage collector. This kind of coverage is not measurable.
That is why we suggest people to use code coverage tools as a way to sanity check that something is not really under tested. But it is necessary to remind ourselves that however high the coverage is, it does not prevent our application code fully from having bugs. So we, in order to help to fight the strange moves of an application amoeba shape, suggest to write a test when something gets broken - when there is a bug report, write a test to verify it and prevent regressions. That way the coverage is going to be focused on the code where it matters - the one that really was broken.