The lazy programmer's guide to writing thousands of tests - Scott Wlaschin

NDC Conferences

zhlédnutí 55 620

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 23. 07. 2024
Don't forget to check out our links below!
ndcporto.com/
ndcconferences.com/
We are all familiar with example-based testing, as typified by TDD and BDD, where each test is hand-crafted.
But there's another approach to writing tests. In the "property-based testing" approach, a single test is run hundreds of times with randomly generated inputs. Property-based testing is a great way to find edge cases, and also helps you to understand and document the behavior of your code under all conditions.
This talk will introduce property-based testing, show you how it works, and demonstrate why you should consider adding this powerful technique to your toolbelt.
Věda a technologie

Komentáře • 119

@scottrankin4501 Před 4 lety ⁺³²
EDFH isn't always evil, stupid, or lazy. Often they are work for a boss who constantly screams "get it done and move on to the next feature."
@toxic_narcissist Před 4 lety ⁺⁴⁸
I'm a noob programmer and I haven't thought of testing this way. Thanks
@sqeaky8190 Před 4 lety ⁺⁷
I am an experienced developer and filled SDET roles before. I hadn't thought of the idea of creating prroperties to represent kinds of tests before either. It is really clever. We shouldn't feel bad for not inventing everything. Let's just keep doing our best.
@VasuJaganath Před 2 lety ⁺²
@@sqeaky8190 This revolution has come from the "Type Theory" and "functional programming" communities after at least 2 decades of research in automated theorem proving. These frameworks become polished only very recently. The idea is deceptively simple but if you think carefully, this is a precursor to proving those properties!
@ChristianBrugger Před rokem ⁺⁴
Group theory and finite fields being applied to the add example. Brilliant!
@LeeOades Před 4 lety ⁺¹³
I have an unnatural fondness for unit tests and am always looking for ways to improve and enrich my techniques. So I really enjoyed this talk. I'm still not entirely clear yet on when I would use it but I'm sure that with a bit of practical experience I'd start to understand where and when it would be a superior approach. Thanks!
@964tractorboy Před 4 lety ⁺¹⁵
A fantastic lecture with lots of food for thought. Let the feast commence. Thanks so much.
@HA7DN Před 4 lety ⁺⁴
This have actually answer most of my questions about testing, so I welcome this. It might raise some more questions later tought...
@whiitehead Před 3 lety ⁺³
Thanks a lot for this talk! I learned a ton and I have never written a line of F#.
@TheRavinoth Před 4 lety ⁺⁴
Thanks, this was well done.
@danielregassa9805 Před 4 lety ⁺¹
Amazing stuff! Watching this for the fifth time to absorb it all
@iknowmyname7 Před 2 lety
Great talk. Thank you
@sinistergeek Před 4 lety ⁺⁴
Wow!! Very helpful!Thanks for the upload!! hehe
@forgottenmohawks8734 Před 4 lety ⁺¹
Great talk. Only thing that I wonder about is at the very end with the facial recognition examples. I don’t think it is given that a facial recognition software would necessarily place the “box” around the face in the exact same way on two images that are otherwise identical but where one has been rotated at an angle. Even with a simple angle like 90, 180 or 270 degrees. And likewise, turning one copy of the image into black and white could probably affect the result too.
@itachi2011100 Před 4 lety ⁺⁵
This totally sounded like something I'd see in Haskell and OCaml, wasn't disappointed.
@NostraDavid2 Před 2 lety
I'm sure QuickCheck was the first library that implemented PBT, which was written in Haskell. So yeah.
@maertscisum Před 10 měsíci
Love the anecdote...
@schmoab Před 4 lety ⁺¹
This is very interesting as a mathematical proof, but I am having trouble understanding how to apply it as a tester without a degree in mathematics. I ran into this reimplementation problem while trying to test a calculator on my last project. It proved to be beneficial because having two implementations in 2 different languages developed independently is a rock solid test. It was also a great way to find bugs without having to manually sit down with a calculator to try to figure out expected results. I did have to think up some input values, though.
This was a great presentation and gives me a lot to think about.
@dBug404 Před 4 lety
I dont think 2 different implementations are a rock solid test. If one fails to understand every detail of a requirement, the two implementions will most likely have the some conceptually flaws. But that can also be true, if one just write tests against specific parameters. I am not sure i really think the shown approach is efficient but it can definitly lead to a deeper understanding
@toddblackmon Před 2 lety ⁺²
Note that these 3 properties do not uniquely define addition: Consider 8-bit bitwise OR vs. 8-bit addition.
x OR y = y OR x, x OR (y OR z) = (x OR y) OR z, and x OR 0 = x. However 1 + 1 = 2, but 1 OR 1 = 1.
@BennetVella Před rokem
That's the tricky thing, how do you capture the actual increment from a function without re-implementing it. To me, it seems like the properties provide a very good foundation, but ultimately you'll still want to include plain and simple "examples" to confirm the essence of what you want, in addition to the property testing that's being set. Ultimately most tests that require complexity will always start off with those straightforward example tests, so they're already there. Adding property tests as they're determined will provide additional strength.
@RamHomier Před 4 lety ⁺¹
In part V when you mention model based testing, is it basically just making an oracle as you mentioned earlier?
@idontcarefuku Před 4 lety ⁺²
Really good examples... In my head trying to avoid you own implementation of the method to test the method... This is the answer... Wicked
@flaskeyboredom5942 Před 3 lety
Robert Martin says (re: TDD) "as the tests get more specific, the implementation gets more generic" - as a workaround to not adding another naive case statement to handle the new test.
@jl6723 Před 3 lety ⁺¹
What Test would you write to get around the EDFH if they wrote the add function such that if one of your inputs is zero, it returns the other variable, else it returns 0?
@freddiepage6162 Před 2 lety ⁺¹
Associativity
@HappyKatze Před 4 lety ⁺⁴⁴
The audio is a little bad, but found it informative nonetheless. Thanks for the upload
@toxic_narcissist Před 4 lety ⁺¹¹
typical online conference, I guess. I'm surprised there were no cat or baby sounds in the background
@jameshoiby Před 4 lety ⁺²
@@toxic_narcissist I'm pretty tolerant of household noises in the background at this point. We're all in this together!
@Vlfkfnejisjejrjtjrie Před 4 lety
@@jameshoiby we are? Did you get thst from MSM? Covid 19 appears to ignore BLM protesters too.
@twitchyarby Před 4 lety ⁺⁴
Not everyone has good audio recording gear or space in their house
@CripplingDuality Před 4 lety ⁺⁶
@@Vlfkfnejisjejrjtjrie he got it from not living with his head stuck in his ass
@Pezsmapatkany Před 4 lety ⁺³
How to handle exceptions with this? Like when creating properties for a division operator, we would want to handle division by zero separately. Also, for addition, how to handle arithmetic overflows if the generated numbers are too big?
@rainbowevil Před 4 lety
@Chris Warburton - I disagree that this approach makes you consider the edge cases up front when looking at the overflow problem. If you look at the rules for addition at 14:30 and you’re using an addition implementation which overflows silently (which maybe you didn’t want, but didn’t know you didn’t want it beforehand) you will find all those tests pass. Or even if the implementation spits out garbage when overflowing, but deterministic garbage, it will pass the tests.
@zackyezek3760 Před 4 lety ⁺¹
Ultimately, you're looking to check that your function (add, divide, more complicated function f) satisfies certain behaviors.
This is where knowing advanced math- group theory, type theory, formal math for functions- has practical use for software development. You can literally write "operator+" for your objects that meets the mathematical definition of a group operator, enduring it'll behave in the way users & other code expects from 'addition'. From there you can do similar things like guarantee your "undo" is a true inverse function, or your UUID function is bijective (one to one). And so forth, You use the PROPERTIES a function, object, etc. should have to guide its actual code & then use those same properties to black-box test it in unit tests.
An advanced example is class types that form a group or ring. As in, I have objects X that have APIs allowing consumers to transform an X into a different X. By implementing class X and its APIs so that set of 'X' forms a mathematical group (or ring, field, etc.), I guarantee that no sequence of calls to those APIs can fail to generate an invalid 'X' object as an output without lots of wasteful or brittle checks in the production code. And the same formal math gives clear direction on what tests I need to have 100% coverage for this subassembly.
@MrDaniel560 Před 2 lety
great talk :PPPP loved the content very useful
@timanderson5717 Před 4 lety ⁺¹⁰
what if the EDFH said (in python)
def add(a,b):
if a==0: return b
if b==0: return a
return 2
@wojciechwal2953 Před 4 lety
assert add(x, -x) == 0
@trulyUnAssuming Před 4 lety
@@wojciechwal2953 def add(a,b):
if a==0: return b
if b==0: return a
return 0
associativity fails though: (1+2)+3 = 0+3=3 while 1+(2+3)=1+0=1
@bellajbadr Před 2 lety
enjoyed this 1
@coopergates9680 Před 4 lety ⁺²⁰
14:20 The EDFH says
let add (x,y) =
-1 * (-1 * y - x)
@entcraft44 Před 4 lety ⁺⁵
This is actually correct, isn't it? The implementation is kind of stupid, but still valid.
@matemolnar653 Před 4 lety ⁺¹²
Isn't this just addition with extra steps?
@nirrepluap Před 4 lety ⁺³
This is a valid implementation, just not very "clean", but tests can't test for that haha
@coopergates9680 Před 4 lety ⁺¹
@@entcraft44 Any incorrect implementation will fail at least one test that a correct one passes, so this is the EDFH's final answer.
@surters Před 4 lety
Or just return zero ... always
@RajaKajiev Před 4 lety ⁺¹
EDFH - now I know how to name it.
Indeed I've met once such an implementation, in a real production chartplotter software, and it made memories for life for me.
@DeveloperPA Před 4 lety
Another option to force the EDFH is use the inverse of addition, eg, x - -y == add(x,y)
@igorswies5913 Před rokem
That's just an implementation, the EDFH will just copy "x - -y" from the test. So all you're checking is that x - -y is equal to x - -y
@NostraDavid2 Před 2 lety ⁺¹
"Adding 0 is the same as doing nothing". In other words: 0 is the identity value for addition, right? 0 is also the identity value for subtraction, but not for multiplication (1 is the identity value for multiplication)
@warpzone8421 Před 4 lety ⁺¹²
14:20 EDFH would just perform addition normally. Then if the second value was zero, return the answer. If the second value wasn't zero, multiply the answer by 2 and return it. His goal isn't to write less code that works. His goal is to hand in as much code as possible that doesn't work.
The only real solution is to get the EDFH fired and burn down and re-implement every file he ever touched.
@liger04 Před 4 lety ⁺²
Indeed. Property testing relies on the code performing the same actions on the entire testbed, so literally any perfect code can be ruined by the EDFH adding a special case at the top that kills the program whenever one specific input is processed. That's why no amount of testing will ever negate the need for proofreading other people's code.
@Pezsmapatkany Před 4 lety ⁺¹
This will still fail test 1 (Adding 1 twice is the same as adding 2).
Let's say x = 10, then x+1 will be 22, then 22 + 1 will be 46. But x+2 will be 24
@armareum Před 4 lety
@@Pezsmapatkany only for that particular input value. there are 100 different variants that must all pass
@Duiker36 Před 3 lety
It's funny that this criticism reveals the actual problem of Wlaschin's examples: arithmetic addition is an operation that is well-studied and its properties are extremely well-established. Which is to say he's correct: the _definition_ of an arithmetic addition operation is that, given two real numbers, the operation obeys the commutative property, associative property, and identity property.
Pezsmapatkany's point is that Warp Zone's algorithm fails the associative property test. But Warp Zone's actual conclusion is still correct: they merely chose the wrong example. In the vast majority of real world cases, we do not understand the operation as thoroughly as mathematicians understand arithmetic addition. We do not have a coherent and complete set of properties to draw from to describe all given computations, nevermind the abstractions on top of that like GUIs.
@herp_derpingson Před 3 lety ⁺²
This is how I imagine GPT-3 based programs would look like
@MrDivineManiac Před 2 lety
You were spot on. haha
@SteinGauslaaStrindhaug Před 4 lety ⁺⁶
While this seems like a good idea for dealing with actively malicious coders, or for testing library code and really mission critical rocket control systems etc. It does seem really hard to write for any code that is just slightly more complex than the trivial examples. Which to be fair is also a problem with with writing good example based tests that actually test something useful.
I may be sligtly biased since I'm promarily a frontend coder; and in frontend code the hard logic is trivial but the real issues are basically untestable. Frontend code is "correct" when it looks good, feels good and is understandable to a human; so the only way to test that is to have an actual human test it. I almost never write unit tests/automated test (as in test code that inputs data and expects things about the output); I do write a lot of code in order to test things though; and that code often use random generated inputs as well: but that's mostly code that generate synthetic content to test how it looks; explicitly imput perverse combinations or extreme amounts of data just to se how it behaves at the limits, but this code almost never does any verification itself; i just open the app and look at it and interacts with it. And later remove the test code once I'm satisfied.
The only times I encounter classical test code is when I'm touching the (mostly backend) code that others have written. And usually the only reason I think about the tests is because some minor change I make to the code causes lots of tests to fail; or because I've discovered bizarre bugs in code that is supposedly "covered" by test code; or because I wonder how on earth they managed to write tests that just randomly fails 30% of the time you run them in the CI server even if the code it's testing is unchanged (i.e. simply re-running the test will succeed 70% of the time).
Usually when I break tests its because the tests it's not because I have introduced a bug; often I have fixed a bug causing the test to fail since the asshole who wrote the test probably changed the test so it expected the buggy output rather than the correct output; or more infuriatingly it breaks because the test uses "mocks" that expect the code to work exactly as implemented so if I fix the code by moving an expensive function call outside the loop (thus making it way faster without changing the result) the so called test fails because it "expected expensiveCalculation(fixedInput) to be called 30 times but got 1 calls". Sometimes I try to add a test to catch a known bug in code that has lots of tests that are neither readable, understandable or able to catch the actual bugs that do exist in the code it's supposed to test; and not only does my new test fail but it also causes dozens of other tests to suddenly fail because the test code is leaving crap in the database between tests and expects that that exact crap to be there later so if I reset the test database it fails and if I insert something it fails because then there is more elements in the table than a dozen of tests expect; and even if I carfully try to manually remove the added elements it still might fail because this does not reset the primary key conter so maybe some later test bizarrely expects a primary key to be 2 and gets 3! And trying to fix the tests by destroying and re-initializing (and manually fixing all the tests that expected side-effects from previous tests) makes the tests 10 times slower since apparently initializing the database has a fixed cost of 20 seconds each time (which is probably why they didn't do that in the first place).
@SteinGauslaaStrindhaug Před 4 lety ⁺²
You might say that unit tests should not involve the database; but then what use are the test then?
Most of the software I've worked with has fairly complex frontend code that is not really testable with typical unit tests, and some backend that usually mostly just forwards the requests or data between the database and the frontend and occasionally forwards requests or data to a sub system. In most sensible projects the backend is really trivial and generally doesn't have obscure bugs; it either works or completely fails to compile and deploy.
In one of the more insane projects I've worked on the all parts of the code is an unholy mess of ugly code forwarding requests and data back and forth though endless layers of abstractions and buzzword tecnologies up and down between needlessly many subsystems and even horizontally between subsystems, some of which is node servers using non-relational NoSQL databases (because that's "new and cool") for storing relational data half of which is stored in a different database perhaps written in go (because that's also a new and cool language) using a good relational database to store big blobs of non-relational data (because of course) half of which is partially duplicated in a different subservice called "statistics" also written in go and using a separate relational database with string fields implicitly referencing primary keys in a different database (but in a completely different style) that apparently was created in order to "improve performance" by moving some of the heavy statistics queries away from the "core" system. Never mind that the primary purpose of both the mobile app and the dashboard (the two clients of the backend system) was to register or display statistics meaning almost every frequently called query needed data from the two isolated postgres databases and relational-data from the non-relational database, so it had to make multiple requests from the api-server down to the sub-servers and then iteratively merge this data in the api server and of course cache this result in an unreliable caching system otherwise it would be unusably slow, causing all sorts of weird cache invalidation issues. (Also of course every service, parameter, table, field in this system used incredible generic names like "Group" or "Item" and lots of 1 letter variable names repeated all over the 10s of subsystems so it's also impossible to find anything by grepping since almost anything you look for will match hundreds of lines in every subsystem.)
Anyway besides the horrible structural issues with this system (mainly that most requests required "joins" between isolated databases); almost all the actual bugs (as in wrong rather than just slow behaviour) originated in horribly complex queries involving datefields. All the incorrect queries had "unit tests" that failed to identify the bugs since the synthetic data used in the test was completely different from the data the actual code would insert. One of the tests did in fact occasionally fail though because of one of the actual bugs: because it used system time +/- some fixed offsets for timestamps in the test data rather than fixed time and one of the bugs in the buggy code was that it tried to split the dates into hour long buckets (apparently for "performance" or something) some of the queries reading the data used date
@jon9103 Před 4 lety ⁺¹
I think a lot of the problems you've experienced are more symptoms of wider problems that pervaded into the tests. Also,, it sounds like the tests were written after the fact to verify implementation details that miss the point of what the code is intended to (e.g. the 30 method calls, at the very least there should have been a comment explaining the significanceof the number and if there is no significanceit shouldn't be tested). While I agree that testing is no panacea, it's absolutely a prerequisite for the success of complex systems, but just like anything else if done poorly they can become more of a liability then a benefit.
@gyroninjamodder Před 4 lety ⁺¹
@@SteinGauslaaStrindhaug Unit tests should not involve the database because you are no longer testing a unit. When you start involving the database you move into integration testing territory. In the story you gave us the problem could have been caught using unit tests for the database queries themselves.
@SteinGauslaaStrindhaug Před 4 lety
@@gyroninjamodder I don't care what you call the test, "unit" or "integration" whatever. But if a function more or less wrap a big SQL query, how do you test that function in any meaningful way without involving the database?
If you mock/fake the database you're only testing that the programming language is able to call another function, which you have to assume is working anyway to write tests at all.
Are you saying there is a way to unit test a SQL query without using a database? How does that work?
@gyroninjamodder Před 4 lety ⁺¹
@@SteinGauslaaStrindhaug You test the SQL queries alone. Instead of testing a function which calls the database then does something with the results you have your test directly invoke the query and check the results. Doing it this way means that you will see that a unit test for a specific query is failing instead of a test touching a lot more logic.
@IcedLance Před 4 lety ⁺¹²
And that leads to Test-Driven Development.
@prepost1420 Před 4 lety
f(x, y) = f(y, x);
f(f(x, y), z) = f(x, f(y, z));
f(x, 0) = x;
Do these requirements specify only the operation add?
@kvdveer Před 3 lety ⁺¹
Nope, bitwise XOR, AND and OR also fit.
@freddiepage6162 Před 2 lety
@@kvdveer f(x , 0) = x doesn't hold for bitwise AND
@freddiepage6162 Před 2 lety ⁺¹
Maybe chuck in an f(x, -x) = 0 too
@NullToChaos Před 4 lety ⁺²
The solution presented at 13:00 still passes all tests listed at 14:30(return 0)
@Pezsmapatkany Před 4 lety ⁺⁶
No, the 3rd test will fail (Adding zero is the same as doing nothing), the return value should be x and not 0
@Subject38 Před 4 lety ⁺¹
@@Pezsmapatkany but you can just add two if statments like so:
int add(int x, int y){
if(x == 0)
return y;
if(y == 0)
return x;
return 0
}
@Pezsmapatkany Před 4 lety ⁺¹
@@Subject38 Yeah, but that will fail for the 1st test (Adding 1 twice is the same as adding 2)
@o.sunsfamily Před 4 lety
@@Pezsmapatkany only if the test tries 0+1+1 and 0+2. If it's a random test, it's not that unlikely that 0 won't come up.
@Pezsmapatkany Před 4 lety ⁺¹
@@o.sunsfamily Well, then you would still get a flaky test, so cannot get away with it in the long run, however you're not right.
Let's say x = 10, then x+1 will result in 0, then 0+1 will result in 1, and x+2 will result in 0.
@tissueboi Před 4 lety ⁺¹
Its funny there's a E in EDFH.
@jonathan-._.- Před 4 lety ⁺⁵
if(x
@EpicNicks Před 4 lety ⁺¹
RandInt() doesn't return a value between int.MIN_VALUE and 100 inclusive, so this test would fail if one of the random numbers is out of range.
@dbtx Před 4 lety ⁺⁴
"The EDFH can't create an incorrect implementation!" well... I paused at 19:13 to say it doesn't prevent the misguided dev from hardcoding a giant lookup table or incrementing/decrementing a copy of x, y times... I hope that sometime in the next N minutes you'll say "here's how to define & impose compute/memory constraints".
@toequantumspace Před 4 lety ⁺¹
He wouldn’t be lazy :)
@NostraDavid2 Před 2 lety ⁺¹
"if you sort a collection, the size should be the same"
This guy never heard about GulagSort: check if the items are sorted, and any item that isn't, gets eliminated (from the top of my mind). 😂
@lawrencemiller3829 Před 4 lety ⁺²⁷
I've never liked this term lazy programmer. I prefer efficient system oriented designer and programmer using computer science, structured execution flow, modular programming, OOD/OOP, and more.
@jeffwells641 Před 4 lety ⁺¹⁹
Lazy programmer is way shorter.
@joshuaemery7291 Před 4 lety ⁺¹¹
@@jeffwells641 also more efficient
@hellNo116 Před 4 lety ⁺⁴
I like it, because it is an oxymoron. In order to become lazy you first need to do more work than most.
@danielebonatto8378 Před 4 lety
def add(x, y):
if isinstance(x, int) or isinstance(y, int):
return x + y
return 0 # fails for x and y both floats :)
we should add tests for several input types and do not forget the properties of + over the reals at least
czcams.com/video/IYzDFHx6QPY/video.html
@NostraDavid2 Před 2 lety ⁺²
I remember being confused by PBT, because properties meant "variables within a class", and not "mathematical properties, such as commutativity" to me.
Another reason why functional programming is never going to be popular with newbies: it overloads jargon with its own semantics, creating more confusion than clarification. I learned PBT and FP despite its jargon, not because of it.
@doublepmcl6391 Před 4 lety ⁺¹
Two plus two is four, minus one is three, quick maths!
@lepidoptera9337 Před rokem
Yes, but it is that in any module arithmetic greater than five, as well... so in reality you won't even pass a char vs. int vs long error check with these kindergarten games. This is simply a man talking who is so stupid that he can't even tell just how stupid he is. ;-)
@coopergates9680 Před 4 lety ⁺¹
At least for addition, you can turn a specific example into a test suite, e.g. given adding 1 and 3 yields 4:
[]
let ''Adding 1 and 3 to a number is the same as adding 4 to it''()=
for _ in [1..100] do
let x = randInt()
let result = add(3, add(x, 1))
let result2 = add(1, add(x, 3))
let result3 = add(x, 4)
Assert.AreEqual(result, result2)
Assert.AreEqual(result, result3)
@rainbowevil Před 4 lety
Was this not covered by the associativity test he showed? Where adding 1 twice is the same as adding 2?
@coopergates9680 Před 4 lety ⁺¹
@@rainbowevil If add is called as add(add(x, 1), 1), then implementing x + 2y instead of x + y would pass that test. If add(1, add(1, x)) were used instead, implementing 2x + y would pass. I meant to create a single test set that could rule out a lot of incorrect implementations, hence scrambling the parameter order a bit.
@Clumpfy Před 4 lety
Nobody gonna point out that the thumbnail falsely uses the apostrophy twice and in the title he corrected it just once?! It's "programmers" and "tests" - just simple plural.
@jeremydavis3631 Před 4 lety ⁺¹
"Programmer's" is correct. It's intended to be possessive, not plural. This is "a guide for the lazy programmer", just like the Hitchhiker's Guide to the Galaxy is a guide intended for a hitchhiker. "1000's", on the other hand, is more controversial: some style guides say to write it that way, and others say to omit the apostrophe because it's a simple plural rather than a singular possessive.
Also, I don't see "test's" anywhere, but maybe the thumbnail was fixed before I arrived. That seems likely if he fixed it in the title first.
@17plus9 Před 3 lety ⁺²
MAXINT + 1
@kylelo5986 Před 4 lety
let add(x, y) =
if x == 0:
y
else if y == 0:
x
else:
0
@jasonleo Před 4 lety ⁺¹
Tdd should make code more and more general, not more and more specific, so 5:30 definitely not tdd, just doing things stupidly
@123TeeMee Před 4 lety
how is tdd meant to improve generality? (not saying you're wrong)
@lepidoptera9337 Před rokem
Tests don't work. Never have, never will. They simply represent the programmer's imagination of what could possibly go wrong with the software rather than the reality of what actually will go wrong. We have half a century of experience with serious bugs that have bypassed any and all test benches. End of story.
@Fasteroid Před 4 lety
lol
@CliseruGabriel Před 4 lety ⁺³
As a tester i find this as a nice brain exercise for the developer and a waste of resources for everyone else because the end result of this does not help the organization in any way. At least the intersection of unit tests across multiple functions gives something palpable. The output of such a test needs to be re-structured in order to be presented somewhere in a human readable way.
Also I truly hope that the Shrinker's algorithm was described as such for the sake of ease of understanding. Because else it's written in a very bad way. It can be solved by generating random numbers between last non failing and first failing. That's how boundaries are tested.
Then, its highly fake positives prone for implementations that have 1 failure in a range of values. Let's say it should only fail for value 31. Statistically the chances of getting a random for 31 out of 1..100 are... not quite big.
The output of the test result is not very useful. The fact that it can be falsifiable combined with the fact that it failed after 23 tests combined with the fact that it needed 3 shrinks == "Fails for inputs bigger than 81." . The fact that i took 3 shrinks cannot be reliably useful as a risk or likelihood assessment because it's random. With a different combination it can return 4 or 5 at worst (based on your description of the algorithm). So there's a big difference in the confidence factor between 33% and 20%. The number of tests again, varies. At best I can read it as "how difficult was it for the Shrinker to find the boundary".
In conclusion, for me your presentation was good. Not the greatest sound but you did speak very clearly and you did explain simple enough to not have me re-watch it. Thank you for your time and please accept my like. The idea is acceptable for some use-cases but in almost every instance the unit tests are better. Usually at QA generators and builders for worst cases are used for a very long time without outputs more optimized. Almost every acceptable testing framework has one and those boundaries are found in the specifications of the feature.
The absolute only case when I find this being better than everything else is if I receive a blackbox which I don't know what and how it should do it. Then, yes, this method of discovery would be useful. Gladly I have never encountered this. Also gladly ISTQB covers all these types in depth. And for those special strings I do have a list of them that I do paste with scripts in whatever input. This tool has literally 0% chance of finding something I will not find in the first hour.
P.S. I am not a gherkin fan.
@espeon91 Před 4 lety ⁺³
I think that he should have covered a more complex example of PBT (property based testing) but as he said, it is difficult to fit it on a slide.
Quickcheck and most production ready property testing libraries are quite sophisticated in generating and shrinking. The shrinking example he gave was more for easy explanation of what a shrinker does. Also, do not forget that a computer is running these tests, not a human. That means that millions of test cases can be generated and shrunk in a second. It does not replace human testing or completely replace unit tests. Unit tests are useful especially for testing known regressions and what the developer thinks are edge cases.
PBT is just another tool in your arsenal to be more confident in the correctness of your code. Another interesting way of testing similar to PBT is fuzzing which has uncovered a lot of bugs in many opensource libraries.
Check out this post : begriffs.com/posts/2017-01-14-design-use-quickcheck.html for more information on how quickcheck (Haskell) works.
@CliseruGabriel Před 4 lety
@@espeon91 after reading your comment I think I can pinpoint why I don't trust it:
Not all inputs of a function can exist by themselves.
In case of an add function, the random integer generator is great. In case of a stock management function the integer generator is almost useless. In the later the properties are the business logic. The output of the function is tied to the input in a way that it actually matters which stock goes to which product and in which order. I can very well imagine how I can use PBT in that context and it's borderline to TDD or data driven testing. So borderline that i'd argue that any other practice except PBT brings more advantages. Having the shitty spreadsheet of data driven testing is useful for linking the testing of this function to another. Having the TDD helps with the documentation. Having unit tests stands in between the two.
@espeon91 Před 4 lety ⁺²
@@CliseruGabriel Which is exactly why not to use just the primitive int generator. PBT works with complex data types as mentioned in the video. So you will not be testing a list of ints but a type like StcokPortfolio which will have suitable range of acceptable values and properties. Using a method in a sub-optimal way does not mean the method is useless.
@CliseruGabriel Před 4 lety
@@chriswarburton4296 which if put into words like you just wrote to me gives us a test suite which doubles down as documentation. By reading them even if I have no clue about the project it totally makes sense to me (kudos btw). But put into the framework presented in the video gives "not much". And then, if we take the framework presented and add words to it is almost like taking Gherkin and adding random values to it. Which is almost like a dynamical version of data driven testing. Which is almost like having a bunch of well written unit-tests.
This is my "argument" against the presented framework. It doesn't bring much to the table because all the other approaches end up testing the properties. Even if they admit it or not. They are different representations of how "invent" a set of values for which a logic must hold true. And the presented approach solves the problem of a very bad actor while taking away from readability. We can solve the same problem of the bad actor by adding random generation to the existing approaches while keeping the documentation part and have a more human usable result.
@FilipCordas Před 4 lety ⁺¹³
So your manager decides you need to improve quality so he hires one of them friendly neighborhood consultants. So they guy of course tells you that you don't have enough unit tests. And he decides to refractor your add function that has been working for years to show you small brain plebs how a big brain would do it. After he is done the number of tests has grown 2000 times, he took two weeks to do it but he is paid by the hour so really it was a waste of his time more than anything. A mount later someone report's a bug, you investigate and you find that your add function changed from decimal to double, the tests took way too long to run, I mean really you are testing every possible combination now so it started getting slow. You ask the guy writing integration tests why didn't he write a test to test this, he tells you that the big brain told him no need to do it any more edge cases are covered by unit tests now and that he should focus on things that have real value to the user.
@L30N4tER Před 4 lety ⁺²
lul, cool story bro
@adambickford8720 Před 4 lety ⁺⁵
Is there a point to this screed?
@GLu-tb1pb Před 4 lety ⁺¹
he could just return x to pass the three tests
@squirel52 Před 4 lety ⁺⁴
That would fail commutativity
@jeffwells641 Před 4 lety ⁺¹
If you always return x, then add(x,y) is not equal to add(y,x) unless x and y are the same. It fails the very first test (commutativity, like the squirrel said).
@spicybaguette7706 Před 4 lety ⁺⁹
The whole point of programming is to be lazy
@17plus9 Před 3 lety
To automate

Další v pořadí

Automatické přehrávání

Four Languages from Forty Years Ago - Scott Wlaschin