Sunday 6 December 2015

ShouldBeEquivalentTo().ShouldBeConsideredHarmful()

0. Introduction

I test-drive my code these days. (I first tried it back in 2004 or so, right after devouring Kent Beck's book on the subject, but I was writing research code in C and OpenGL so I didn't have much in the way of widely-distributed testing frameworks or, worse, best practices. I did okay, but I strayed.) Lately I've been writing C# in Visual Studio, which means I have access to a bunch of great tools that make my testing life suck a lot less.

In particular, Fluent Assertions makes my tests read a lot better than the NUnit default. Test legibility is underrated; if a test fails on broadly-owned code, the person trying to turn red back to green is probably not going to be intimately familiar with what I meant when I wrote the thing in the first place. It might be someone else, or it might be me three months down the road, having worked on enough other shit in the meanwhile that I've forgotten exactly what I was trying to say. In that situation, any incremental improvement in test legibility sums to a hell of a lot of added value when you integrate over the rest of your project's lifetime. Fluent Assertions is great.

1. But there's a problem

Let's suppose you're writing example code for a TDD book. Given that it's not the early 2000s, you won't be writing a CD- or DVD-rental package, but we'll generalize and say you're writing a Widget rental package.

And boy howdy, renting widgets is complex. Your rental quote for a widget isn't something straightforward like "two bucks a day", it's usually more like "fifty cents an hour after nine in the morning but before noon, three bucks an hour if you use it over lunch but you get a dollar discount if you don't, a buck an hour from one in the afternoon until five, and you're not allowed to use it at all after five if your blender's out of cheese". Hey, the widget-rental domain is weird, and your quotes have to be able to represent all of this nonsense.

So you end up with a fair bit of code not entirely unlike the following:

[Test]
public void widget_use_denied_after_1700_when_blender_out_of_cheese()
{
    var blender = A.Fake<ICustomerBlender>();
    A.CallTo(() => blender.CheeseLevel()).Returns(0);
    var request = new WidgetRentalRequest(4.PM(), 6.PM(), blender);
    // ...

    var expected = new WidgetRentalQuote( /* ... */ );
    expected.Start = 4.PM();
    expected.End = 5.PM();

    widgetRentalService.GetQuoteFor(request)
        .ShouldBeEquivalentTo(expected);
}

...which isn't terrible. You explicitly build the kind of object we expect to find, and then you check that the widget rental service gave us what you expected.

The problem is, you check that the widget rental service gave us exactly what you built.

2. Annotating widget rental quotes

With widget rental being as complex as it is, people have trouble following the logic behind a rental quote. Maybe you only deny rentals after five in the afternoon (if your blender's out of cheese) for flaming blue widgets -- our most popular model -- and you instead charge a modest overage fee of ten cents an hour if the customer's renting an inert red widget with insufficient blender-cheese. You and your dev team keep getting things a little bit wrong, your client-facing techs keep getting things a little bit wrong, your clients themselves can't keep it all straight, and the front-end team desperately wants to be able to explain to customers why they weren't able to rent a flaming blue widget past five in the afternoon.

So you go into widgetRentalQuote and add a simple string/string map that every step of the quote generation process can add to, to annotate the generation process. Maybe later on you map this into JSON that the front end can parse out and tell the customer "Oh by the way, you can't rent a flaming blue widget past 5pm unless you refill your blender's cheese reservoir". Wonderful!

And just like that, every single test you've written against widgetRentalQuotes fails.

3. What just happened?

Well, you're now generating widgetRentalQuotes with annotations. You probably never annotated the exemplar quotes when you built them, because the features you were testing didn't involve annotation. So, ShouldBeEquivalentTo goes and checks the annotated quote you generated from the rental service against the un-annotated quote you gave as a target, and helpfully reports that something your test doesn't care about is different. Now you have eleventy billion spurious test failures to clean up.

So you could go back and add appropriate annotations to each exemplar quote you've built for testing. That's all well and good, but it's likely to add a lot of extra clutter to your tests. All that extra code, just to satisfy a fairly lightweight side-band data stream! Let's maybe not do that.

You could, if you're a very cool person, implement annotations as a monad that wraps your existing quoting code, but that's a blog post for another time. Let's suppose that the rest of your team is a bit frightened of monads (as devs often are), so you've decided not to do that.

You could use the existing quote-generation logic to build your exemplar quote, and cook up mocks that annotate the exemplar correctly. But that gets clumsy, and risks verging into the territory of tautological testing -- you end up invoking the code under test to build the exemplar, and just verifying that the code executes the same way twice in a row. Not always a fair assumption, but probably not what we're into.

You could build a separate quote builder interface that's careful to add the right annotations. That's in many respects a useful thing to do, especially if you and your team spend a lot of time writing integration tests. The only concern there is that it's easy to write your way into a parallel class hierarchy, where changes to the code under test cause spurious test failures until you make the same changes to the builder.

Or, you could write a widget rental quote-specific set of fluent assertions.