Defect testing

Goals:
- to detect as many defects as possible (preferably all), or
- to detect the most damaging defects (preferably all)
by testing (i.e. executing code), not by other means (inspections, formal proofs).

Problems:
1. Which input data should be tested (all inputs is almost never possible)?
2. How do we know that the output is correct (oracle problem)?
Most of this note is about the first problem.

Note: the input does not have to be "correct".
It is important to test the proper response to invalid inputs.
Requirements need to state what this response is!

Most of this note is about testing medium small, sequential components.
In the end I briefly address the other issues.

Two approaches:
Black-box testing: the source code is not considered (maybe even not known).
Glass-box testing: the tests are chosen based on the source code.

Black-box testing

A useful technique is partition testing, see Sommerville 8.1.2 (23.3.2). 9th ed. (8th ed.)
- Partitions are based on input and output equivalences.
- Combination of typical values (in "the middle" of a partition) and boundary values (on the edge of a partition).
- Invalid inputs are also partitions, and should be handled according to the specification.

Example: in a drawing program, objects (partly) obscuring others. ...

Example: a procedure that sorts a list.
Classes (at least one test case must be chosen from each class) can be
lenght of list:
- empty list: boundary value
- list with one element: boundary value
- list with some "typical" number of elements
- list with extremely many elements: boundary value.
comparisons:
- no duplicates (typical?)
- some duplicates (typical?)
- all elements are the same (boundary value)
invalid inputs:
- not a list
- a list with elements that cannot be compared

Guideline testing (Sommerville 8.1.2)
- "Find the boundaries" is a useful guideline
- Many guidlines are about some kind of boundaries

Glass-box testing

Excluded from the 9th edition, a snippet remains at www.softwareengineering-9.com/Web/Testing/PathTest.html
(In Sommerville 8th ed. called "structural testing", 23.3.3 and 23.3.4, but these sections are not very good.)

Idea: "all code" should be tested at least once.
- testing once is rather weak
- there are different ways to define what "all code" means.

Definition: coverage is the percentage of "all code" that is tested by a test suite.

Statement coverage:
Branch coverage:

Example

Specification (intentionally ambiguous)
inputs: result, taxrate, threshold
output: tax
relation: tax is <taxrate> percent of the profit, but the first <threshold> SEK is not taxed.
glossary: profit - a positive result.

Based on this specification, the following program was written.

tax = 0 ;
if result > 0 then
    % there is a profit
    tax = taxrate/100 * (result-threshold) ;
end if ;

Statement coverage requires only 1 test case: if result > 0, then all statements are executed.
Branch coverage requires 2 test cases: it must also test the case result ≤ 0, that is the branch from the test directly to the end of the program.

In contrast, statement coverage and branch coverage are the same (requiring 2 test cases) for the following program:

if result > 0 then
    % there is a profit
    tax = taxrate/100 * (result-threshold) ;
else
tax = 0 ;
end if ;

Note: both criteria allow a test suite that can miss the error, namely that if 0 < result < threshold, then tax < 0.

Conclusion (from the example):
coverage testing can test code that exists. It cannot detect that code should exist, but doesn't.

How to do coverage testing

The practical way:
- decide on test data based on some criterium (statistical, partitioning).
- use a testing tool that records which statements/branches are taken during the test, and computes coverage.
Problem: you may reach 80%, 90% or 95% coverage, but often there are obscure parts of the code that are only reached for very specific input values, not easily guessed. There may even be dead code: code that is not executed for any input value. Remark: discovering dead code can be a side-effect of a coverage testing method. But it is not the purpose. Dead code does not have to be an error, it may be there for a reason.

The theoretical way:
- decide on a number of paths (executions from start to finish) through the code that cover the code.
- compute for each path which input is required to take that path.
- run the tests with these inputs.
Problems:
1. how to compute which input is required to take a path?
2. there may be no such input.

Definition: a path for which no input exists is an infeasible path.

Example (infeasible path)
Suppose that we write a quick and (very) dirty fix for the program above:

tax = 0 ;
if result > 0 then
    % there is a profit
    tax = taxrate/100 * (result-threshold) ;
end if ;
if tax < 0 then
    % oops
    tax = 0;
end if ;

There are 4 paths through this program:
1. result > 0, tax < 0
2. result > 0, tax ≥ 0
3. result ≤ 0, tax < 0
4. result ≤ 0, tax ≥ 0
But path 3 is infeasible: if result ≤ 0, then tax = 0 when we the program comes to the test tax < 0.

Note: infeasible paths are not the same as dead code, and do occur also is well-structured programs.

How to compute which input is required to take a path

This can be done by backwards reasoning with constraints:
- when a test is encountered, add the required outcome to the constraints
- when an assignment is encountered, apply it to the current constraints.

Example (continued)
Suppose that we want to compute input for path 1.
- start out with no constraints
- pass the assignment "tax = 0" in the second if-statement - no constraints contain "tax"
- add the test "tax < 0" to the constraints
- pass the assignment "tax = taxrate/100 * (result-threshold)"; the new constraint is: taxrate/100 * (result-threshold) < 0.
(- simplify the constraint: if taxrate is a positive symbolic constant, then the constraint becomes: result < threshold.)
- add the test "result > 0"; the constraint is 0 < result < threshold.
- pass the assignment "tax = 0", no constraints contain "tax".
- we reached the beginning of the program with 0 < result < threshold.

Exercise: try the same for path 3. You will note that the constraint becomes result ≤ 0 and 0 < 0.

Further coverage criteria

Path coverage: all paths must be taken. The most comprehensive coverage test, but if there is an unbounded loop, then there are infinitely many paths. Only applicable in very specific circumstances. Note that path coverage does not guarantee correctness. In the very first example, the 2 tests for branch coverage give path coverage.

Condition coverage: is like branch coverage, except when tests are composed with "and" and "or"
all boolean sub-expressions must be tested being both true and false.

Example: a program doing a linear search through an array A could begin with

i = 1 ;
found = false;
while i ≤ A.length and not found do

Ordinary branch coverage would require only one test case: the condition is first true, the program enters the loop, finds the item, and the next time the condition is fale.
Condition coverage requires 2 test cases: one exiting the loop because i > A.length, and one because found is true. Note that a black-box partitioning test would also require these two cases.

There are further variations on this theme.

Relational operator coverage: for tests like a < b (resp a ≤ b), it records that there is a test case where a = b, and one where a = b-1 (resp b+1). Thus it tests the boundary values for these tests, detecting "one-off" errors.

Loop coverage: for each loop, there should be tests passing through the loop 0 times (if possible), once, and several times.

Data flow coverage: for each variable, follow a path from where it is used (read) back to where it is set (written).
Example: "tax" is used in the test "tax < 0". The value of "tax" there can either come from "tax = taxrate/100 * (result-threshold)" or from "tax = 0".

Error message coverage: choose inputs that force the system to generate all error messages (a guideline test in Sommerville)

State coverage, transition coverage: for units that implement a finite state machine.

For interface testing (8.1.3 - 23.2.1)

When testing interfaces, we assume that the components have been tested.

Function coverage: each function is tested at least once. Weak, but a useful first test.

Call coverage: each function call is tested at least once. This should detect the most common errors in function calls, such as missing or exchanged arguments.

Further testing can be done in black-box manner: testing each call with boundary values: extreme values, special values (empty list, null pointer), and incorrect values.

Other sources of errors:
- Shared memory
- Timing (below)

Testing concurrent systems

Problems:
- What is a path? The sequence of executed statements comes now from more than one source code.
- A combinatorial explosion of executable sequences.
- Consequence: programmers are likely to make errors because of unforeseen sequences.
- Hard to control which sequence is tested, as a result errors may be hard to reproduce.
- Timing errors that do not occur "in the laboratory" (debugging code present, simulated environment, testing tool) may pop up after installing the system in "reality".

Solutions:
- Well ... try alternatives to testing, such as formal verification, model checking.

The oracle problem

How do you know that the test gives the correct answer?
1. Well ... at least the program didn't crash ...
2. Compute the answer by hand and compare (usually not practical).
3. Check that the answer satisfies certain requirements. For example: the list sorting program should produce a sorted list (checking that a list is sorted is easier than sorting it). The tax-program should not return a negative tax. If you put a yellow ball in front of the camera, the picture should show a yellow ball (which is easier to check than computing the RGB-value for each pixel in 16 bits precision).
4. If you have a previous version of the program, you can do back-to-back testing.