...that uses a new cachegrind runner to exercise
tests/microbenchmarks.py in CI and compare against baseline benchmark
results. A run will fail if any benchmark slows down by 10% or more. Add
a 'benchmark' tox environment to facilitate running benchmarks locally.
Also remove use of `@override` to improve performance, and include some
other minor code improvements.