Building Operable Software with TDD (but not the way you think) - Martin Thwaites - NDC London 2023

NDC Conferences

zhlédnutí 19 442

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 22. 05. 2024
Building operable software is becoming more important with the adoption of Microservice based systems becoming more common. Developers are increasingly relying on building long-running "integration" tests in deployed environments because it's the only way to gain confidence to deploy their applications. There is a better way, which is to focus on "outside-in" testing that focuses on testing the boundaries of your service.
In this talk, we'll go through some of the pitfalls of relying on unit testing to give you confidence in an application. We'll then go through how you can use TDD as a workflow to build tests in a "Contract First" way and how much more flexible your testing becomes. We'll talk about the benefits over Unit testing focus, and how it can aid in understanding service boundaries. Finally, we'll show you can correlate all this up with Tracing tools like Honeycomb to see the performance of your tests and how internal code interacts.
This talk will be focusing on the WebApplicationFactory in .NET to provide the scaffold, and Honeycomb to provide the visibility, however, the concepts will likely apply to other languages.
Check out our new channel:
NDC Clips:
@ndcclips
Check out more of our featured speakers and talks at
ndcconferences.com/
ndclondon.com/
Věda a technologie

Komentáře • 53

@aaronzhong Před rokem ⁺⁴
Love the practical examples in the talk. For the areas that "can't be tested this way", I personally think that the last two points (connecting to external dependencies, and configuration is correct) in our deployed environments to be more important to verify than those which can be achieved via ODD. For a greenfield project I'm working on, we have been writing our tests in a very similar manner (albeit leaning more towards the BDD style with Given When Then) and we're currently deciding how to achieve confidence on those points that I mentioned. We have two options that we're tossing up,
1. In a separate health check API that we ping during deployment to verify those things are working, giving us explicit confidence or
2. Running the BDD test suite against the deployed environment. The checks here are implicit.
I want option 2 to work because it means we won't need to maintain a separate part of the system; but if we find those tests take too long, are flaky or doesn't work for other reasons, we may fall back to option 1
@DotNetMartin Před 10 měsíci ⁺³
These tests aren't a replacement for anything, unless they cover all the things you need.
My preference of those things is:
* Gated deploy based on advanced healthcheck endpoints. Check you can do `SELECT 1` against the DB, that you can connect to ServiceBus without an error, etc. AppService Initialisation healthcheck for slot switches are a good example.
* 15 minute journey tests run on a schedule that don't block the deploy.
* Fast release pipelines that you can deploy all the time with (with confidence) and robust observability tooling with SLOs/Alerts
It's not "1 thing", and I think that's what people get hung up on.
@m13v2 Před 11 měsíci ⁺²
That’s quite in line with “Growing Object-Oriented Software, Driven by Tests” plus the clarification “When you don’t follow the ‘Tell, Don’t Ask’ rule, prefer social over solitary tests.”
@DotNetMartin Před 10 měsíci
There is nothing new in software
@animanaut Před 9 měsíci ⁺²
nothing to add to the discussion, just wanted to say thanks for the speaker for engaging in the comments. its missing for most other conference talk videos online
@DotNetMartin Před 7 měsíci ⁺¹
Thanks! The questions and comments are what help me make the talk (and other similar talks) better.
@nytofteAS Před 7 měsíci
Great talk with some interesting takes on the approach of using Outside-in TDD. Curious about how you would proceed in your test class as further requirements to the system is added. For instance would the progression of requirements for the specific endpoint mean adding complexity to the arrange code for the tests (extracted to methods ofc to keep test intention clear). In other words would the adding of business requirements then mean a continuous focus on refactoring towards test helpers for arrange code making them more generic and flexible over time (but also potentially more complex)?
@DotNetMartin Před 7 měsíci ⁺²
Ultimately the test map to the requirements. If the original requirement drove some particular tests, and those requirements are no longer valid, then the tests should be removed (or repurposed). If the new requirements are additive, then you should add more tests for those requirements.
Removing tests, removes the confidence that the original requirements are still being met.
That's also where refactoring tests comes in. However, that's a completely separate step, and shouldn't be done when adding "new" requirements.
As with everything though, there is no "one true way", context is important, so use your judgement, write the important tests, focus on confidence.
@babgab Před 5 měsíci ⁺¹
From my experience with whole-app automation testing in video games, I find that such tests tend to be slow and brittle. For one thing, having to boot the app means that every test run pays the app startup costs, which in a AAA video game can be on the order of minutes as the game builds and caches the data it needs (customers don't typically see this apart from shader compilation, because this data cooking process happens prior to shipping, but internally it must be redone every time the game's content changes). Similarly, having to boot the app means paying the compile-time cost of building the entire app, which for a AAA video game can also be on the order of minutes (possibly tens of minutes depending on what's changed!). Then even if it does boot up, any bug that causes the app not to function will block *every* test from running, which causes people to scramble to fix tests that aren't *actually* broken (somebody else's code was broken). Heisenbugs that only show up 1% of the time will randomly fail test runs for no clear reason (threading issues are a common cause of this). It has also been my experience that a test harness that can command a video game in a shipping environment needs more maintenance and has more ways that can fail than a unit test; not only does the game itself need a bigger API surface to talk to the tests (because the actual output of a video game is graphics and audio, not something that one can easily measure in a test harness), but also some tests need to take into account network latency, which is a source of flakiness as the time between test actions may be measured in milliseconds...
None of this is to suggest that we shouldn't have outside-in tests, only to give some perspective on what it was like to mainly have outside-in tests in the context of something that isn't a banking app - that I have not had a good experience with them and I don't think they're sufficient to avert manual testing and therefore if I'm going to do TDD, I would like to write more unit tests. Frankly, I find the main value proposition of TDD (which for me is "iterating faster") is hard to realize with outside-in tests, so I don't feel incentivized to use it with this kind of testing. Every attempt at it has been frustrating and I eventually gave up and went back to test-after with outside-in integration tests.
I would also like to note that it is nice to see someone acknowledging that you can do TDD with things that aren't unit tests, even if I'd generally find that more valuable.
@DotNetMartin Před 4 měsíci
There are definitely places where this doesn't work, in .NET APIs it does.
The key part is moving as far as to the outside as you can instead of staying at the class level. For some languages, it's not viable to stay at the consumer side, but for every language you can move further towards the outside
@dogoku Před rokem ⁺⁴
All the things he is talking about from 13:15 (including abstracting into reusable steps, etc) is what I have been doing as part of BDD.
I guess I never used BDD "correctly" just how it made sense to me...
@RobMyers Před rokem ⁺¹
E.g. Cucumber step definitions and background statements? Yeah, that's exactly what they're for. The thing that seems to be missing from the speaker's definition of BDD, and that Cucumber provides, is the whole-team readability of the product specification, in an environment where not everyone can read and understand the programming language. His tests were not readable to most bankers or users, so I was a bit confused (annoyed?) by that.
@DotNetMartin Před 10 měsíci ⁺¹
@@RobMyers The tests here are examples, you can make tests readable without Given/When/Then style syntax. The goal of Gherkin syntax was readability for sure, but that's not the only way to do it. In a lot of cases you become constrained to try and get the wording you want into those steps, and end up with very long step names.
If gherkin syntax works, and you can get external people to write them based on a small amount of steps, I'd say do it, that's the thing we want.
These aren't focused on others writing them, and not necessarily others without any IT knowledge reading them.
@WilliamPowerDental Před 2 měsíci
Great talk and great practical application of contract testing. You lost me on the pro tip - 'can introduce path approval checks' how would that work?
@DotNetMartin Před 2 měsíci ⁺¹
You can check to see whether an individual path has changed. So if you store all your contracts in the test project, under the same folder, if those have changed you add an additional approval check to ensure that a second set of eyes looks at it. Specifically, those second set of eyes have the express instruction that "Contracts have been updated, they must be checked before this goes out".
Since a core part of this kind of testing is that done in-memory, you don't have to run it up, you're not relying on servers spinning up, etc. You can protect things quickly.
@WilliamPowerDental Před 2 měsíci
Wow I see what you mean, an extra layer of protection on your contract tests, if they have been updated. Thanks! @@DotNetMartin
@bikerd12 Před rokem
Is there a link for the code examples?
@bikerd12 Před rokem
The link is at the end of the talk
@alekseimenkov8317 Před 11 měsíci ⁺⁴
I saw a lot of systems where devs did this type of "BDD" and just check a status code from the response. They were proud of their 80% code coverage
None of these systems were easy to maintain
There are a lot of articles and videos why e2e, acceptance tests don't help you with software development, These high level tests can't give you enough trust to deliver your software fast.
It is a good addition to unit tests and integration tests. But it is not enough to write only behavior tests, because it is not possible to test all logic of the application through high level tests,
@DotNetMartin Před 10 měsíci ⁺⁵
I'd have to disagree, I've built an entire system with the majority of tests focused on these. It's not "all" these tests, the skew is towards more of these tests over single units. Sometimes you can get all the confidence you need from unit tests, but in my experience, you get more value from these types of tests being front and centre to your engineer's workflow.
@seNick7 Před 3 měsíci ⁺¹
But tests in this talk are not "end to end" tests. These are a mix of integration with sociable tdd/unit tests of a one service.
E2e would be run against the Frontend and go through all services.
@pendax Před rokem ⁺²
I'm not familiar with dotnet programming, but I would like to know what Span means in this context.
@simonk1844 Před 11 měsíci ⁺⁴
That's nothing dotnet specific; "span" is a term used in distributed tracing. The trace output of any request is a tree of operations that it triggered, eg request -> (operation A -> (operation A1 then A2 then A3)) -> (operation B -> (B1 -> B2)). Each node in the tree is a "span" which contains the info about the "nested" operations. Starting a new span is simply saying "I'm about to do some nested operations now (which may emit their own trace info)..".
@br3nto Před rokem ⁺⁶
37:29 this isn’t how you’d write caching code? Why not? It’s like every single example ever given on the web. I lol when people say “this isn’t production ready code”… ok, well just show the production ready code instead of the simplified “never use this in production” code so we can all see what production code actually looks like and what we should be doing instead. Don’t perpetuate what we shouldn’t be doing in prod.
@DotNetMartin Před 10 měsíci
Honestly, the code was an example, it wasn't part of the actual demo. I'm actually reworking the demo as I actually don't think it properly show what I was trying to get across.
Why isn't it production code? It doesn't have a cache expiry, and therefore no new items would make it in ;) other than that... ship it!
@NickMaovich Před 8 měsíci
24:52 that assert makes no sense and won't ever pass (or even compile)
Comparing whole object to the ID
@DotNetMartin Před 7 měsíci
Glad you were paying attention, hope you got something out of the video
@KyleSmithNH Před 8 měsíci
I do wish people stated with this approach, but reasonably complex systems do suffer from high combinatorial test cases at a certain point (e.g., "given my system has 100 widgets, it cleans up the oldest 50 widgets" -- am I really going to call the API to set up the Arrange phase?). I've worked with a lot of people that start at the class-as-a-SUT approach, so I value this talk as an introduction to an extreme alternative, but I hope the speaker hits on the downsides and how to cope with them.
@DotNetMartin Před 7 měsíci
Sounds like you're commenting without actually watching the talk? Maybe try that first before commenting next time.
@fellowseb Před 4 měsíci
@@DotNetMartin Wow, I liked your talk but this comment is just rude. Chill out! Plus the question about the high combinatorial cases is interesting IMO.
@DotNetMartin Před 4 měsíci ⁺¹
@fellowseb if someone isn't going to bother watching the talk, where I cover the concern they mention, then I feel I responded proportionally to their rudeness.
In regards to high combinatorial cases, they work too, and it's still the best way to do it. We had 8000 tests on a single service doing it this way.
@iorch82 Před 10 měsíci ⁺¹
There's no silver bullet. Testing from the edge ends up quite messy once you have complex business requirements, since you will need to have a quite big arrangement phase. If writing a CRUD, sure, go this way.
@DotNetMartin Před 10 měsíci ⁺¹
Disagree here, your setup is important. Abstract it away and give it context.
If your prerequisites are actually that complicated I'd be looking at the system design and whether you can split up the units of the system and whether you should.
Everytime I've seen that reason, it's been able to abstract those requirements into something that makes sense.
@interstellar3997 Před 6 měsíci ⁺²
some good stuff but also a lot of needless grumpy rants and a bit too much of stating the obvious. "no one cares about your 4000 unit tests!!" good one.. 😴
@DotNetMartin Před 5 měsíci
Stating the obvious is very relative. To me, the entire talk is obvious, I'm not sure why anyone would test any other way. A lot of building relatable talks is making sure that everyone gets to the level of your understanding/point of view so that they can understand what you're talking about.
Also, "Grumpy rant" is my style ;) Glad you got something out of it despite that though!
@vikas6024 Před rokem ⁺¹
Observability is not the part of development code, when the code is touched by many developers your tests will eventually break if you assert on spans even if there are no behavior changes in your code. I think it follows the same path as comments, they are valid until they are not.
@awsumgeorge Před 10 měsíci
If you can run all tests in 8 seconds, you can ensure they don't break. Institute rule: "Everybody is responsible for fixing their broken tests before check-in." If tests are maintainable, the rule will be accepted without much complaint.
@DotNetMartin Před 10 měsíci
Testing the things that are important is the theme here. If Observability isn't important to you as an engineer, I think there are bigger problems. Relying on auto-instrumentation from Vendor APMs only goes so far. When you've worked on a manual instrumented code base with a team of engineers who own, maintain and support their own code, you'll soon see how much of a Development concern Observability is.
@vikas6024 Před 10 měsíci
@@DotNetMartin I think I didn't state it well. I didn't mean that Observability is not important, it's obvious that it's as important as writing tests.
My problem with this is that it's prone to giving you false positives & false negatives. As a developer you can leave the span there and still remove the caching logic. Or You can just remove or change the span but caching logic is still there, now your test is failing but caching is there. Now you can argue "Who would do that? How can someone be that stupid?", but it happens and it happens often, that's why tests are there to guard us and ship with confidence. Personally, I would be afraid to work in a codebase where I see no tests failing for a behaviour change in my software.
@DotNetMartin Před 10 měsíci
@vikas6024 what you're describing is not unique to spans, I could do the same no matter what. Would I test for a span? Yes, if I have downstream things relying on it like SLOs. If I don't, all tests could pass and my production observability would be broken.
If something has to be there for a reason, have a test that describes WHY it needs to be there. That way someone has to purposefully delete the test that say "test that the span for monitoring production is there". Will someone delete it? Maybe? Will someone delete the test that says "make sure customers balance is correct"? Maybe. These techniques done work in isolation, there's loads of other process that needs to be around them.
@user-vu8ch5eo7w Před rokem
We don't want to test the cache. Cache is not the goal - it's just the means to the end. And the goal is to satisfy NFRs - request latency, for example. One possible way to do it is defining clear SLOs and check for them in production.
@keithang9335 Před rokem
i think they both go hand in hand; if you find that you’re not meeting request latency requirements, then tracing would help you verify that calls to the cache do occur when you’re expecting to, as part of the triage process
@DotNetMartin Před 10 měsíci
You're not wrong that satisfying NFRs should be done through SLOs. What I don't agree with is that you shouldn't ensure that the attributes you rely on to build those SLOs in prod are regression tested in code.
@GoodTechConf Před 11 měsíci ⁺²
This is so wrong. This guy needs to listen to James Coplien as soon as possible.
@m13v2 Před 11 měsíci ⁺¹
Because unit tests are the least effective tests and if done badly make refactoring/changes impossible hence Coplien sometimes suggests to throw unit tests away which haven’t failed for a year?
Well, this talk is also about putting more importance on component/acceptance tests over unit tests.
(“unit tests” with the exception of classicist unit tests serving as component/acceptance tests.)
@GoodTechConf Před 11 měsíci ⁺¹
@@m13v2 fair enough, but here, we can see a big mixture of DevOps concepts (Observability, OpenTelemetry) with testing techniques which is a big no-no for me. Observability has nothing to do with tests, it's a runtime technique to raise alerts and KPI. Using that for tests is a form of hack.
@DotNetMartin Před 10 měsíci ⁺²
@@GoodTechConf That's kinda the problem. Observability is, and always should be, a developer concern. It can also drive alerts and dashboards. Observability (which comes from Control Theory) is about understanding inner system state, not specifically about alerts/kpis/metrics.
Testing to ensure that the outputs of your application are correct, and always consistent to what you expect is what testing is for, and if you output telemetry, you should ensure that it's there when you rely on it.
@imartynenko Před 10 měsíci ⁺²
This talk sucks because he confuses unit testing with integration testing. Integration testing is slow and does not promote clean code. Your code could be garbage and totally unmaintainable but it could work. You won’t know you have a bug somewhere until you hit it because you can’t think of edge cases just based on “business requirements”
@DotNetMartin Před 10 měsíci ⁺⁴
I'd love for someone to show me a definition of Unit that talks about classes and methods, I've tried, and honestly I can't find it anywhere. Same with Integration testing, there isn't a definition, everyone has their own. The name isn't the point, it's what they do. "Developer Tests" is what I think is a better definition, which is all tests that a developer should write to under their system.
Like I say, I care less about what is a unit test, and what's an integration test. If you follow the video to the end, you'll see that I say this is another thing you can do, and it gives you more confidence to deploy your software. Lower level tests that test the classes and methods in isolation may also be useful. Those lower level tests solve a different purpose.
Not every talk has to promote every concept. I promote clean tests, I promote developer driven tests that are best on bringing out requirements.
@imartynenko Před 10 měsíci ⁺¹
@@DotNetMartin “developer test” definition is too broad. Which one is it? If you have a mix of unit and integration tests in one project how you plan to run those? All at the same time? Agile teams have many thousands of tests, they all run at different times for a reason because some are slow, some (like unit tests are designed to be very fast). Again, you confusing things
@DotNetMartin Před 10 měsíci
@@imartynenko The point of these tests is that they're as fast as class/method based tests, and therefore run at the same time.
You're right, if these were slow, and required external dependencies, then I'd be looking at splitting them. With this approach though, as WebApplicationFactory is In Memory, and uses an InMemory channel, you get tests that run as fast as hitting a class.
As I said in this talk, we had around 8000 of these tests running in under 10 seconds.
@seNick7 Před 3 měsíci ⁺³
@imartynenko the joke is on you, the author didn't mention unit testing :) And since he says his 2000 tests take 8 seconds, why you say its too slow?
Either way, I agree that he shows integration tests, but when doing TDD a class is never the unit of testing. Google "BDD is TDD done right".
BTW, no tests check the structure. That's what code review is for (and refactoring phase in red/green/refactor cycle). If your tests test interactions, you do it wrong, because you won't be able to refactor the code without breaking the tests.
PS. Your last statement is not true. You can write tests for special case without explicit req.
@KasperPlougmann Před měsícem ⁺²
You can do clean code and produce garbage too... In fact, that's often the case

Další v pořadí

Automatické přehrávání

Refactoring Is Not Just Clickbait - Kevlin Henney - NDC London 2023