How do I understand the testing coverage achieved by my Hexawise-generated tests?

This lesson describes how to use the Analyze Tests tab of Hexawise and explains what the testing coverage metrics mean.

Click on the "Analyze Tests" button to use this feature.

Click on the "Analyze Tests" button to use this feature.

The Analyze Tests coverage chart can be extremely useful and can help answer questions like: "How much coverage is each of your tests adding?" and "How much testing is enough?", but it takes a few minutes to understand what the valuable information in the chart means. Here goes. The number of Parameters and Values you entered in the "Define Inputs" screen will determine how many total possible pairs of Values there are in your test plan. A simple example with 8 Values makes this point:

The Analyze Tests coverage chart can be extremely useful and can help answer questions like: "How much coverage is each of your tests adding?" and "How much testing is enough?", but it takes a few minutes to understand what the valuable information in the chart means.  Here goes.  The number of Parameters and Values you entered in the "Define Inputs" screen will determine how many total possible pairs of Values there are in your test plan.  A simple example with 8 Values makes this point:

Given these inputs, your plan will have exactly 24 possible combinations of pairs of values, as shown here:

Given these inputs, your plan will have exactly 24 possible combinations of pairs of values, as shown here:

The first test case (Large / Heavy / Purple / Hexagon), will test for 6 of the 24 possible pairs.

The first test case (Large / Heavy / Purple / Hexagon), will test for 6 of the 24 possible pairs.

So after the first test case, the coverage chart will show that 25% of the total possible pairs that could be tested in this simple example have actually been tested at this point. So far so good.

So after the first test case, the coverage chart will show that 25% of the total possible pairs that could be tested in this simple example have actually been tested at this point.  So far so good.

The second test case, (Small / Light / Green / Hexagon), will test another 6 pairs. Importantly, none of these 6 pairs of Values have been tested yet. In our first two tests, we will have tested a total of 12 pairs of Values.

The second test case, (Small / Light / Green / Hexagon), will test another 6 pairs. Importantly, none of these 6 pairs of Values have been tested yet.  In our first two tests, we will have tested a total of 12 pairs of Values.

So after 2 test cases, the chart shows that 50% of the possible pairs (e.g., 12 tested out of 24 possible) have been tested.

So after 2 test cases, the chart shows that 50% of the possible pairs (e.g., 12 tested out of 24 possible) have been tested.

Why do coverage charts start off with a steep trajectory (with lots of added coverage per test) only to flatten out towards the end (with only a little added coverage per test)?  Analyzing test number 3 shows us why:

Why do coverage charts start off with a steep trajectory (with lots of added coverage per test) only to flatten out towards the end (with only a little added coverage per test)?  Analyzing test number 3 shows us why:

There is no possible way to select values so that we test for 6 new pairs of values as we did in each of the first two tests.  The best we can do is test 5 new pairs of values (shown in green) and 1 previously tested pair.  In this 3rd test, "Large and Purple" had already been tested in the first test.

After test 3, we have now tested for 17 pairs of of 24 total possible pairs. The coverage chart shows 70.8% (vs. 75% had we been able to squeeze in 6 new pairs into test 3).

After test 3, we have now tested for 17 pairs of of 24 total possible pairs.  The coverage chart shows 70.8% (vs. 75% had we been able to squeeze in 6 new pairs into test 3).

What is up with the final two test cases? We were able to achieve 25% coverage of pairs in test 1 and test 2. Why do test 5 and test 6 only achieve a measly 4.2% increase each?

What is up with the final two test cases?  We were able to achieve 25% coverage of pairs in test 1 and test 2.  Why do test 5 and test 6 only achieve a measly 4.2% increase each?

The final two tests each only add a tiny amount of coverage because we managed to test all but two pairs of Values in the first four test cases and because it will require at least two additional test cases to test those final two remaining pairs.

The final two tests each only add a tiny amount of coverage because we managed to test all but two pairs of Values in the first four test cases and because it will require at least two additional test cases to test those final two remaining pairs.

The only pair tested for the first time in test 5 is "Small and Purple". The only new pair tested in test 6 is "Large and Green".  That is only one sixth as many new pairs in each test as compared to either of the first two tests. 

The likelihood of finding a new defect in test number 5 or 6 is much lower than finding a new defect in test case 1 or 2.

A few important points to consider when analyzing coverage information:

A few things are worth pointing out about the coverage information in these charts:

First, when used correctly (and thoughtfully), it can be extremely useful. It gives you a quick method for objectively assessing: "how much extra testing coverage are you achieving with each new test?" and "how much testing is enough?"

Many testing teams have a rule of thumb, for example, to stop executing the Hexawise-generated tests after they have achieved 80% coverage because they can clearly see diminishing marginal returns to further testing after that point.

The second thing to keep in mind is cautionary. As George Box says, "All models are wrong. Some models are useful". It would be a mistake to look at the graph, see "100% coverage" has been achieved after the final Hexawise-generated test and conclude that the tests cover everything that should be tested.

An "Analyze Tests" chart generated by Hexawise, like all software testing coverage reports, is an imperfect model of what should be covered (which is itself based on an imperfect model of the System Under Test). There could be significant aspects of the System Under Test that were not entered into the "Define Inputs" screen. It is important to remember that one or more of those excluded aspects (a hardware configuration? a software configuration or plug-in? the order in which actions are executed? whether a user is navigating with mouse or a keyboard? whether or not "submit" buttons are clicked multiple times quickly? etc.) could potentially cause a defect that might not be identified in your current set tests.