Hypothesis, Antithesis, Synthesis

The creators of Hypothesis have introduced Hegel, a new family of property-based testing libraries designed to bring high-quality automated bug-finding to multiple programming languages.

Hello. I wrote Hypothesis. Then, back in November, I joined Antithesis, shortly followed by Liam DeVoe (another core Hypothesis maintainer). The inevitable result was synthesis, which is why today we’re introducing our new family of property-based testing libraries, Hegel.

Hegel is an attempt to bring the quality of property-based testing found in Hypothesis to every language, and to make this seamlessly integrate with Antithesis to increase its bug-finding power. Today we’re releasing Hegel for Rust, but this is the first of many libraries. We plan to release Hegel for Go in the next week or two, and we’ve got Hegel libraries in various states of readiness for C++, OCaml, and TypeScript that we plan to release over the coming weeks or months.

Here’s an example from Hegel for Rust to whet your appetite:

#[hegel::test(test_cases = 1000)]
fn test_fraction_parse_robustness(tc: hegel::TestCase) {
let s: String = tc.draw(generators::text());
let _ = Fraction::from_str(&s); // should never panic
}

This finds a bug in the fraction crate where from_str("0/0") panics rather than returning an error value.

If that was already enough of a sales pitch for you, you can check out Hegel here. If not, let me tell you a bit more about why property-based testing, and Hegel in particular, are pretty great and why I think you should use them.

We saw an example of it above with Hegel for Rust: Property-based testing is testing where, rather than providing a full concrete test case yourself, you instead use the library to specify a range of values for which the test should pass. In our fraction example, our claim was a common one: Our parser should never crash, it should always either produce a valid result or error value.

You can think of that property-based test as infinitely many copies of tests that look like the following, where each test replaces the s value with a different string:

#[test]
fn test_fraction_parse_robustness() {
let s: String = "0/0";
let _ = Fraction::from_str(&s); // should never panic
}

The benefit of property-based testing libraries is that you don’t have to come up with those strings.

“Doesn’t crash” is probably the most boring property-based test, but it’s surprisingly useful. Coming from Python, it’s very useful (it’s surprisingly hard to write a Python program that never crashes), but as we saw, this happens even in Rust.

Here’s another example of a more interesting common property:

use hegel::generators::{self, Generator, integers, booleans};
use rust_decimal::Decimal;
use std::str::FromStr;
#[hegel::composite]
fn decimal_gen(tc: hegel::TestCase) -> Decimal {
let int_part = tc.draw(integers::<i64>());
let has_frac = tc.draw(booleans());
if has_frac {
let frac_digits = tc.draw(integers::<u32>()
.min_value(1).max_value(28));
let frac_val = tc.draw(integers::<u64>()
.max_value(10u64.saturating_pow(frac_digits.min(18))));
let s = format!("{}.{:0>width$}", int_part, frac_val,
width = frac_digits as usize);
Decimal::from_str(&s).unwrap_or(Decimal::from(int_part))
} else {
Decimal::from(int_part)
}
}
#[hegel::test(test_cases = 1000)]
fn test_decimal_scientific_roundtrip(tc: hegel::TestCase) {
let d = tc.draw(decimal_gen());
let sci = format!("{:e}", d);
let parsed = Decimal::from_scientific(&sci)
.expect(&format!("Failed to parse {:?} from {}", sci, d));
assert_eq!(d, parsed);
}

Here we had to define our own custom generator for Decimal using Hegel’s support for composing generators. After that, we got to test a common property called “round tripping” – if you serialize a value into some format and then read it back, you should get the same value back. This is probably one of the most common non-trivial properties that it’s worth testing in most projects, as most software needs to transform data between different formats at some point. In this case it turns out that rust_decimal doesn’t correctly handle zero when converting numbers to scientific notation, and this test finds the bug.

I have a rough classification of bugs found by property-based testing as falling into three categories:

You forgot about zero.
This data type is cursed and you fell afoul of the curse.
You made an error in a complicated structural invariant.

At Antithesis we’re most excited about the third category, but generally I find a lot of the initial value of property-based testing comes from shaking out the first two, because bugs of this type are so easy to find.

For example, here’s a test that shows heck running afoul of Unicode being cursed (reported bug):

use heck::ToTitleCase;
#[hegel::test(test_cases = 1000)]
fn test_title_case_idempotent(tc: hegel::TestCase) {
let s: String = tc.draw(generators::text());
let once = s.to_title_case();
let twice = once.to_title_case();
assert_eq!(once, twice);
}

This tests the intuitive property that once you’ve converted something into title case, it’s in title case and shouldn’t need further changes. Unfortunately, this fails by drawing “ß”, which the first to_title_case turns into "SS" which the second then turns into "Ss".

The best example I’ve got for you right now of “complicated structural invariants” comes from this (it turns out, already known) bug Hegel found in the im library:

#[hegel::test(test_cases = 1000)]
fn test_ordmap_get_prev(tc: hegel::TestCase) {
// Trick to boost the size to make sure we test on large key sets.
let n = tc.draw(generators::integers::<usize>().max_value(200));
let keys: Vec<i32> = tc.draw(generators::vecs(generators::integers()).min_size(n));
let im_map: OrdMap<i32, i32> = keys.iter().map(|&k| (k, k)).collect();
let bt_map: BTreeMap<i32, i32> = keys.iter().map(|&k| (k, k)).collect();
let key = tc.draw(generators::integers::<i32>());
let im_prev = im_map.get_prev(&key).map(|(k, v)| (*k, *v));
let bt_prev = bt_map.range(..=key).next_back().map(|(&k, &v)| (k, v));
assert_eq!(im_prev, bt_prev, "get_prev({}) mismatch with {} keys", key, im_map.len());
}

This finds that above a certain size, get_prev returns the wrong value.

This sort of test is a simple example of what we usually call “model-based testing” – you’ve got something you want to test, and you construct a “model” of it – usually some bad implementation of the same thing that e.g. stores everything in memory, or implements things inefficiently. You can then use property-based testing to check that the model and reality always agree.

There are many more ways to use property-based testing than this. This post just showcases some of the more effective sorts of tests you can write with it. When getting started I actually tend to recommend starting with one of your existing tests and refactoring it, but once you start thinking in terms of this sort of testing you’ll start to see examples like the above ones everywhere.

If you’re not familiar with it, Hypothesis is the most widely used property-based testing library in the world.

Some of why Hypothesis is the most widely used library of this sort is because it’s written in Python, which I’m given to understand has a few users. But Hypothesis wasn’t the first property-based testing library in Python, only the first that achieved widespread use. This is because it has a lot of benefits over other property-based testing libraries.

The main ones are:

Hypothesis has a great library of high-quality generators, and flexible tools for building on them.
Hypothesis has “internal shrinking”, which means that it will basically always give you a high-quality and readable final example. It avoids many of the pitfalls of shrinking in other property-based testing libraries, such as producing invalid test cases, requiring manually writing shrinkers, and poor quality out-of-the-box shrinking.
Hypothesis has a test database, which means that when a test fails, if you rerun it it will automatically fail fast in the same way.

My running joke with Hypothesis is that every other property-based tes

Source: Hacker News