Automate Your Way to Better Code: Advanced Property Testing (with Oskar Wickström)

Developer Voices

zhlédnutí 6 946

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 27. 02. 2024
One of the most promising techniques for software reliability is property testing. The idea that, instead of writing unit tests we describe some property of our code that ought to always be true, then have the computer figure out thousands of unit tests that try to break that rule.
For example, you might say, “No matter which page you visit on my website, there should always be a login button or a logout button.” Then the test’s job is to try to break that rule, but clicking around until it finds some combination of clicks fails that assertion. Like, maybe it finds the 404 page, and you realise it was missing the website’s normal header.
At its best, property testing takes far less work than unit testing, but is far more thorough, because it lets us write the rules and has the computer write the examples. The downside is, it often seems theoretical. It can be hard to apply property testing to real-world cases. Let’s fix that.
We’re joined by Oskar Wickstrom, who’s been building all kinds of different systems and bringing property testing with him wherever he goes. We discuss the basics of property testing, then he goes into the advanced and cunning techniques that go beyond the ordinary into testing databases, webpages and more. With a bit of thought, he can help us test a ten times as much code with a tenth of the effort.
--
Oskar’s book, Property Testing a Screencast Editor [ebook]: leanpub.com/property-based-te...
Quickstrom: quickstrom.io/
F# for Fun & Profit: Property Testing Series: fsharpforfunandprofit.com/ser...
Linear Temporal Logic: en.wikipedia.org/wiki/Linear_...
The Quickstrom Paper: arxiv.org/abs/2203.11532
TodoMVC (One frontend app, many implementations): todomvc.com/
Oskar on Twitter: / owickstrom
Kris on Mastodon: mastodon.social/@krisajenkins
Kris on LinkedIn: / krisjenkins
Kris on Twitter: / krisajenkins
--
#softwaredevelopment #podcast #programming #testdrivendevelopment #propertytesting

Komentáře • 24

@fun_iqp Před 4 měsíci ⁺²²
Man, such a warming host deserves the best guests. And this one video proofs that's true.
@nexovec Před 4 měsíci ⁺¹⁵
This is the best tech interview channel, period. By a long shot
@nummaric Před 4 měsíci ⁺²
After listening to the first 10 minutes, it sounds to me that these tests are written the way I would write an implementation in Prolog, which I really like as long as it stays simple. Great channel/interviews, I enjoy it a lot.
@HyperFocusMarshmallow Před 4 měsíci ⁺⁵
31:00 This strategy is really nice. Mathematically many inputs map to the same output.
But all the possible outputs can be mapped to a unique representative in the space of inputs.
So you can generate simple elements of the output space, construct a unique representative in the input space and then run the function you want to test to see that you get back the same result.
Then you just have to assert that it’s an identity map.
Reminds me alot of some tricks that are used in differential geometry and mathematics more generally.
I’ll leave it to more competent people to state it in the language of category theory.
@kurthui Před měsícem
thanks for a detail explanation. that makes a lot of sense.
f(a) = b, instead of manually generating a list of a, we generate a list of b, and find a inverse function g, to generate a, by a = g(b), the only thing I am not sure is that one cannot generate cases of a when there is not a 1-to-1 mapping.
I guess, the option is to manually make a random function to generate a in that case if that make sense
@HyperFocusMarshmallow Před měsícem
@@kurthui The trick as I understand it is to restrict the test to apply to just a subset of inputs. On that subset the maps will be 1-to-1. And you're exactly right. That means that such a test does typically not cover the entire input space on its own. And the "inverse map" g is not a full inverse. In general there is no inverse function. That's part of the point. But if we cleverly pick a subset of the inputs and cleverly pick a subset of the outputs, then g might be an honest inverse on the restricted sets. The subset's we pick must make the map we pick be the kind of map we want. For example, if we pick a subset from B and a candidate restricted inverse g and that g maps two distinct elements from B to the same element in A, well then we just picked a bad g that doesn't work. It just can't be an inverse however we restrict A. So there are a bunch of things to keep straight here.
So assume we do that and it works. What do we loose from the restrictions?
Software tests rarely test every case. That's typically to computationally expensive.
Even writing a test for a particular general case might be hard and easy to mess up. So the idea as I understand it is to make this kind of restrictive choice to construct a subset of cases which are easy to write a test for, and it's easy because we pick the sets so that the restricted map have nice mathematical properties.
So is that good enough? How much of the general input space does these kinds of subsets capture? The answer is that it depends on the details, right. But at least now it wasn't prohibitively difficult to write and maintain the test.
If input space coverage becomes a worry, one could maybe pick a differently shaped subset of the input space and pick a different map g' that is an inverse on that subset.
Maybe there are other much better ideas than that. I'm just making things up here based on my understanding.
But the philosophy of software tests is to write as few tests as possible as simply as possible, with the constraints that you get sufficient amount of confidence that the various bits and pieces of the functionality works and that the tests catch various kinds of regressions.
If the issue was, that it was just too tricky to write a test for general functionality, well then, even just testing a restricted part of it in a simple straightforward way could be a huge gain. If you do that in 15 different ways, maybe you actually catch most actual regressions in the general case anyway. That's not a proof or anything. It's probably trivial to come up with counter examples. But the input sets and output sets often may have a bunch of extra structure for many practical problems and then maybe one could prove that it is enough for special cases. In principle at least.
That's my perspective. I'm just a guy in the comments. This is not my area of expertise at all. I'm not even sure I actually understand all the details and it was I while, since I listened to the interview. So take this with a huge pinch of salt here.
@kurthui Před měsícem ⁺¹
@@HyperFocusMarshmallow thanks for your speedy reply! and appreciate your humble.
realistically, I feel the mapping either is straightforward 1-to-1 then almost trivial to do testing, or convoluted, there gonna be significant work itself to implement the mapping. Since if the mapping is easy, it means your software is some simple functions that can be easily reverse engineered.
also if we treat the software/function as a black box, the main thing is to find a mapping, then it seems fine to do property testing from source f or target g. And I feel a lot of times case testing can effectively cover a lot of ground and easy to understand.
I am just trying to reconcile with the fact that property testing is not widely used. still seems a niche tool in some narrow areas, but definitely a good tool in our toolbox. Software testing is still very hard, there is no shortcut.
@krumbergify Před 4 měsíci ⁺³
The point of TDD is not to get excellent code coverage- although you do! The point of TDD is that you will be forced to design testable code since that will make it much easier to write your tests. It is really hard to write tests if the code uses singletons so you are instead forced to adopt a more functional approach (dependency injection) which decouples concerns that should not depend on each other.
@Delease Před 4 měsíci ⁺²
Property Testing sounds like a really great idea. As someone who is already using data-driven tests in production, if I could extend the code which is producing these cases with auto-generated tests, it would be an easy transition.
I do think there will always be a place for example-based tests. Imagine trying to property test the implementation of a cryptographic hash function. Unless you had another implementation to compare it against (and even if you did, how would test that one?), you should not be able to associate the input and the output except by doing the work to compute the hash. Thus you cannot cover all cases using Property Testing. But, if you can generate the bulk of you're test cases using it, then mop up the rest with a few examples, it would be a big win.
@DeveloperVoices Před 4 měsíci ⁺⁴
Agreed, there will always be a place for both. I actually faced that this week. I've got some serialization code that writes data to and from JSON. I've got some property tests that ensure that `x == decode(encode(x))` and it's way more thorough than I would ever be. But I also have a couple of hard-coded cases just so I know the JSON looks exactly the way I want it to look. :-)
FWIW though, I think you could also write some property tests for cryptographic functions. For example, a good hashing function shouldn't generate similar-looking hashes for similar-looking input. So you might say, "Generate a document. Hash it. Now change one character in the document and check that at least 5 characters have changed in the hash. Now do that 10,000 times." Something like that would be a valuable check. :-)
@namewastaken360 Před 4 měsíci ⁺⁷
CZcams comments are a difficult place to have a detailed conversation about property based testing, but it is a very powerful technique. I've used it event based systems consisting of event sourced aggregate and reactions with a small number well defined projections. The system inputs were mostly events coming from an external system, as well as import type commands based on the state of the external system. We wrote generators to produce a sensible stream on input commands and events, and wrote property tests to check the ultimate state of the system, ignoring the intermediate steps in the process. This meant that when we drastically changed the intermediate components or existing tests did not need to be changed. Adding more stuff to the output projections meant only adding dinner new property assertions.
Idempotency was easy to prove, just have a generator that takes an input stream and add some duplicates then check the output of both are the same. Likewise you can do the same thing with shuffling the streams if your system should be insensitive to order.
It can take a while to set up, but once you have property tests and generators in place they are so much more useful than unit tests and adding tests is very easy as you don't need to set-up state for edge cases etc.
@namewastaken360 Před 4 měsíci ⁺¹
Round trip tests are trivial to wait too, great for testing serialization, type mapping etc.
@namewastaken360 Před 4 měsíci ⁺¹
I hadn't formalised some of these ideas like metamorphic tests in my mind before. Great talk so far.
@johnwalker7422 Před měsícem
Super surprised there was no mention of shrinking. One of the amazing properties of property-based tests
@Nellak2011 Před měsícem
Is it possible to have a test server running property based tests 24/7 on your code, and then when something fails it automatically notifies the developer?
If so, that would be so cool!
@PaulSebastianM Před 4 měsíci
This idea of property testing seems like it can be solved by building a DSL in your tests. See Dave Farley's talks.
@DeveloperVoices Před 4 měsíci ⁺⁴
That's in the same area, but I think it's a different approach. A DSL generally makes it easy to write example-based tests. It's a great technique and a real time saver. But property testing is more general, saying something like:
1. An example username is any string.
2. An example email is `@.com`.
3. Come up with 10,000 random username/email pairs and call the register function. The number of registered users should always increase by one.
@PaulSebastianM Před 4 měsíci ⁺¹
@@DeveloperVoices OK then I see that as being kinda redundant and energy consuming... Maybe I'd run such tests for spacecraft software.
@archiekesseler2132 Před 4 měsíci ⁺²
I think the point is to find bugs you didn't think to check for. To me at least that's not redundant at all.
@PaulSebastianM Před 4 měsíci
@@archiekesseler2132 isn't that called fuzzing?
Před 3 měsíci
@@PaulSebastianMFuzzing is a bit simpler since you only check for unexpected failures (exceptions, 5xx errors); here you check more specific conditions on the outputs.

Další v pořadí

Automatické přehrávání

Neovim: Creating, Curating and Customising Your Ideal Editor (with TJ DeVries)