Lately I’ve been exploring various automated testing techniques, and it’s about time I got to property-based testing, also known simply as property testing. I think the easiest way to explain property-based testing is to contrast it with example-based testing, which is by far the most common type of automated testing.
If we were building an arithmetic library in JavaScript, we’d want to make sure to write tests verifying that our functions observe things like commutativity of addition. For example:
describe('addition', () => {
describe('is commutative', () => {
it('gives the same result when adding numbers in either order', () => {
// presumably both equal 3, but that could be tested elsewhere
assert.equal(add(1, 2), add(2, 1));
});
});
});
This might be sufficient for some use cases, but there’s a weakness in the fact that we’re only testing for one specific combination of numbers. If our function failed to handle zero properly, this test wouldn’t catch it. If our code sloppily ignores very large numbers, we’re likely to also not write tests using those numbers, and even if we add those cases, we might be forgetting something else.
Property tests solve this by executing our code many times with randomly generated inputs, covering a wider variety of use cases than we could by manually writing out examples. Here’s how the same property (commutativity of addition) could be tested using JSVerify.
const jsc = require('jsverify');
describe('addition', () => {
it('is commutative'() => {
const commutativeTest = (a, b) => add(a, b) === add(b, a);
const commutativeProperty = jsc.forall(jsc.number, jsc.number, commutativeTest);
jsc.assert(commutativeProperty);
});
});
Perhaps the most confusing thing here is that jsverify
is imported as jsc
, in homage to Douglas Crockford’s jscheck. Property testing has a bit of an academic history, starting with QuickCheck for Haskell. The QuickCheck paper explains how to use QuickCheck to formally specify and test properties of Haskell programs, also outlining why that’s a good idea and some interesting lessons they learned along the way. As QuickCheck has inspired similar libraries in just about every language out there, the paper is now descriptive of property testing in general. The conclusion section starts with an observation that really impacted me:
We are convinced that one of the major advantages of using QuickCheck is that it encourages us to formulate formal specifications, thus improving our understanding of our programs. While it is open to the programmer to do this anyway, few really do, perhaps because there is little short term payoff, and perhaps because a specification is of less value if there is no check at all that it corresponds to the implemented program.
Another related observation rings true to me as well: property testing finds as many bugs in the property definitions as in the implementation itself. That could be an indicator that property testing is error-prone and of little value, or it could be an indication that it helps software authors understand what they’re building before they ship a shoddy version of it.
When is property testing useful?
Technically, you can imagine that any function you write would need to satisfy certain invariants that could be formally specified and tested with arbitrary data. In practice, I think certain problem types are a more natural fit depending on the shape of that arbitrary data and the characteristics in that specification. If a piece of code operates on a very limited set of values and returns a result that’s easy to predict, it is usually simple enough to write example-based tests that hit all of the code paths and corner cases. Also, just by virtue of the sheer repetition involved, property-based testing is going to slow down your suite more than a handful of well-chosen examples.
But when your inputs are things like “any three numbers” or “an arbitrary binary stream,” it’s much harder to feel confident that your examples are really enough. Likewise, testing for specific hard-coded outputs becomes more difficult when algorithms and data structures get too complex, and so property-based testing earns its place in these domains.
I would definitely consider property-based testing for things like:
- libraries for statistics, currency, time, or arbitrary-precision arithmetic
- code for generating or manipulating complex data structures
- encodings and decodings
I can imagine property-based testing as the driving force in creating a library like this. If you take time to carefully define the interfaces and properties, the implementation itself might feel almost like a formality.
Example: solving polynomials
Note: You can find the full source for these examples on GitHub.
To extend what I did couple of months ago with linear functions, I want to look at one method for solving general polynomials. I have a class that models a polynomial by storing coefficients as an array of integers, and exposes a function to evaluate the polynomial for a given x
value:
class Polynomial {
/**
* @param {array} coefficients
* with coefficients[i] being the coefficient for X^i
*/
constructor(coefficients) {
// remove leading 0s, so 0x^2 + 2x + 1 becomes 2x + 1
this._coefficients = this._trimCoefficients(coefficients);
}
get coefficients() {
return this._coefficients;
}
evaluate(x) {
const reducer = (y, coefficient, power) =>
y + coefficient * Math.pow(x, power);
return this.coefficients.reduce(reducer, 0);
}
...
}
So we can create and evaluate a polynomial like:
const polynomial = new Polynomial([1,2,3]); // 3x^2 + 2x + 1
const y1 = polynomial.evaluate(1); // 6
One way to analytically solve functions is with Newton’s Method, which requires knowledge of the function’s derivative, plus a guess that is close to the true solution. We can write a function for the derivative, and we can structure our tests such that a relatively good guess is always available. Here’s the derivative:
class Polynomial {
...
get derivative() {
const newCoefficients = this.coefficients
.map((coefficient, degree) => coefficient * degree)
.slice(1);
return new Polynomial(newCoefficients);
}
...
}
The function for Newton’s Method is an iterative one. It starts with a guessed x value that is hopefully near the zero we’re looking for. Each iteration, it finds the intersection of the tangent line with the x-axis by subtracting x - p(x) / p'(x)
where p'
is the derivative of p
. This gif illustrates the process.
/**
* Find a zero of the given polynomial near the provided guess
* @param {Polynomial} polynomial
* @param {number} guessX - x value near the zero we're looking for
* @param {number} tolerance - how close we need to be to the desired Y
* @param {number} maxIterations - how many iterations before we fail
*/
const solve = (polynomial, guessX, tolerance = 0.000001, maxIterations = 1000) => {
const derivative = polynomial.derivative;
let iteration = 0;
let x = guessX;
let y = polynomial.evaluate(x);
while(Math.abs(y) > tolerance && iteration < maxIterations) {
x = x - (polynomial.evaluate(x) / derivative.evaluate(x));
y = polynomial.evaluate(x);
iteration++;
}
if (Math.abs(y) <= tolerance) {
return x;
} else {
throw new Error(
`Gave up after ${iteration} iterations. Last approximation: ${x}`
);
}
};
Property testing our solver: a custom arbitrary
Now we’re ready to do start writing our property test. In order to get a random polynomial from jsverify
, we need to create what’s called an arbitrary
. An arbitrary generates inputs of a particular type, so the test framework can feed them into our function. Creating an arbitrary for polynomials involves providing three functions: generator
, show
, and shrink
. show
is just a function to turn our arbitrary polynomial into a string for printing on the console, so we won’t go into that.
generator
creates a random polynomial of a given size:
/**
* Size in the case of polynomials will be the length of the coefficients array,
* which is one more than the degree of the polynomial
* For simplicity, use integer coefficients
* @param {Integer} size
*/
const generator = size => {
const coefficients = [];
for (i = 0; i < size; i++) {
coefficients.push(jsc.random(-MAX_COEFFICIENT, MAX_COEFFICIENT));
}
return new Polynomial(coefficients);
};
I found the shrink
function to be the most challenging and interesting part of creating an arbitrary. Of course, we need to know what it is before we can write one, so we’ll take a little detour to talk about that.
Shrinking
When an example-based test fails, you get a message indicating which test case failed and maybe also some additional context about the specific failed assertion. Because the test case contains the whole setup, this is generally enough to make finding the bug straight-forward. But property tests are different, because reading the test code won’t tell us which inputs caused the failure. Knowing the inputs might be enough, but since they’re randomly generated, they may be quite ugly and unwieldy. It’s hard to know what about a particular set of inputs caused the issue.
So, when a property test case fails, it doesn’t just bark at you. The framework keeps running the test with smaller and/or simpler inputs generated using the shrink functions associated with the inputs, attempting to find a minimal set of inputs to cause the failure. When this works well, it becomes much more apparent exactly what about this set of inputs tripped up your code.
To create a simple example, I made a fake function that takes an array of integers as its inputs and behaves incorrectly whenever there are at least 5 integers, all 10 or greater. Here’s what the sequence of test inputs look like, starting with the first failure:
... snip: a bunch of passing inputs
[ 13, 10, 38, 33, 19 ]
[ 10, 38, 33, 19 ]
[ 6, 10, 38, 33, 19 ]
[ 9, 10, 38, 33, 19 ]
[ 10, 10, 38, 33, 19 ]
[ 10, 38, 33, 19 ]
[ 5, 10, 38, 33, 19 ]
[ 7, 10, 38, 33, 19 ]
[ 8, 10, 38, 33, 19 ]
[ 9, 10, 38, 33, 19 ]
[ 10, 38, 33, 19 ]
[ 10, 5, 38, 33, 19 ]
...
... snip: about 150 cases
...
[ 10, 10, 10, 10 ]
[ 10, 10, 10, 5, 10 ]
[ 10, 10, 10, 7, 10 ]
[ 10, 10, 10, 8, 10 ]
[ 10, 10, 10, 9, 10 ]
[ 10, 10, 10, 10 ]
[ 10, 10, 10, 10, 5 ]
[ 10, 10, 10, 10, 7 ]
[ 10, 10, 10, 10, 8 ]
[ 10, 10, 10, 10, 9 ]
...
Error: Failed after 30 tests and 9 shrinks.
rngState: 0590d7e8cf5613b4a2;
Counterexample: [10, 10, 10, 10, 10];
By combining the built-in shrinks for array
and integer
, JSVerify has found the minimal array of integers to cause a failure! I don’t know if you’re impressed, but I am.
About the rest of the output: I’m not 100% sure how the “30 tests” and “9 shrinks” numbers came about. The latter is smaller than the number of inputs tried because each shrink produces a whole collection of smaller inputs, not just one.
The rngState
is another peculiarity of property testing: because inputs are randomly generated, you may not catch the same errors in every test run. But if you know the random seed used when a particular failure was triggered, you can re-run the tests with that seed to produce the same sequence of inputs and verify your fix. In this case, we could use something like mocha test/shrink-test.spec.js --jsverifyRngState 0590d7e8cf5613b4a2
to run every test in that file with the same inputs as in the failing run shown above.
Shrinking our polynomials
It took some thought and reading before I was able to apply the concept of shrinking sensibly to my own arbitraries. At first I tried just randomly generating some number of “smaller” cases, but the results didn’t exhibit the behavior shown above. Unsurprisingly (knowing what I know now), the resulting shrinks lacked the power to home in on the minimal failing inputs. The winning approach, I learned, is to use the power of the framework and build your shrinks out of the provided shrinks for the underlying native types. Here’s a shrink function for my polynomial arbitrary:
const shrink = polynomial => {
const arrayShrinker = jsc.shrink.array;
const arrayIntShrinker = arrayShrinker(jsc.integer.shrink);
const coefficientSets = arrayIntShrinker(polynomial.coefficients);
return coefficientSets.map(Polynomial.create);
};
This feels right, and it works. It also made it even more apparent that “polynomials with integer coefficients” and “arrays of integers” are just different ways of thinking about the same things. In fact, digging into the documentation a bit more reveals that my whole polynomial arbitrary could be generated from existing arbitraries and a couple of static functions:
const boundedInt = jsc.integer(-MAX_COEFFICIENT, MAX_COEFFICIENT);
const polynomialArbitrary = jsc.nearray(boundedInt).smap(
Polynomial.create,
Polynomial.getCoefficents
);
The arguments to smap
are the two static functions: one to create a Polynomial
from a nonempty array of integers, and another to go in the reverse direction. The return value is a new arbitrary that produces Polynomial
instances. (The size
measurement ends up a bit different this way, because the built-in array arbitraries use a logarithmic size scale. We could supply a max size of 1024 to keep the polynomial degree under 10.)
Property testing our solver: the test
describe('newton\'s method', () => {
it('solves polynomial functions from a guess within 0.5', () => {
const solverTest = (polynomial, x) => {
if (polynomial.degree <= 0) {
return true; // no fun solving constant functions
};
// polynomial(x) = y, so we make a new polynomial
// polynomialWithZero such that polynomialWithZero(x) = 0
const y = polynomial.evaluate(x);
const polynomialWithZero = polynomial.setCoefficient(0, polynomial.coefficients[0] - y);
// use a random guess within 0.5 of the right answer
const delta = jsc.random.number(-0.5, 0.5);
const solution = newton.solve(polynomialWithZero, x + delta);
// because we're doing analytical math with native JavaScript numbers
// we expect floating point error, so we test
// approximate equality w/ a custom function
return approxEqual(polynomial.evaluate(solution), y);
};
const solvesPolynomialFunctions = jsc.forall(
arbitraryPolnomial,
jsc.integer(10),
solverTest
);
// size: 10 limits the _degree_ of the polynomials tested
// this is where we'd provide a larger size for the smap arbitrary
jsc.assert(solvesPolynomialFunctions, { size: 10 });
});
});
On a given test run, this single jsc.assert
call tests our solver ~75-85 times using a different polynomial (of degree 9 or less) and a different x value (between -10 and 10) each time. This works, and it gives me a lot more confidence than if I’d made up a couple of polynomials on my own and tested them each at a handful of places.
Conclusion
I’ve still got a lot more to learn about property-based testing, and I’m excited to explore the tools available in other languages, especially Elixir. I’ll feel better prepared for the next task involving a general-purpose data structure or number crunching algorithm now that I have this in my tool belt.
At the same time, having found what I think is a pretty decent use case for this paradigm, I’m also aware that it’s not a good fit for a lot of everyday app code. In many contexts, thinking carefully about the intent and behavior of the code is enough to construct a handful of examples that exercise the important logic and catch the most likely bugs. But that’s exactly where I think a basic understanding of property-based testing will be most helpful: it teaches you to clarify exactly what a piece of code is promising to do and think carefully about how it might fall short.
We're building an AI-powered Product Operations Cloud, leveraging AI in almost every aspect of the software delivery lifecycle. Want to test drive it with us? Join the ProdOps party at ProdOps.ai.