Unit Testing In a Nutshell

Over the last several months I have been taking a deeper dive into software testing best practices. Testing is a common “pain point” for a lot of developers and I’ve seen multiple projects that even have no tests at all because “it slows us down”. I’m a big believer in testing (and test driving) software to reduce bugs, enable confident refactoring and improve the design of the system. Having no tests or poor coverage will really slow projects down in the long run. This post distills some best practices for Unit Testing and my hope is to make a case for embracing testing as a tool to help us build better software and make our lives easier, not more difficult.

Unit Testing in a Nutshell:

A unit test verifies a small piece of the system, runs fast and is isolated from other tests.
Unit tests should verify units of behavior and not just lines of code or implementation details. This makes the tests resistant to refactoring so they don’t break when implementation, but not functionality, changes in the system.
Unit tests can serve as documentation or a kind of Spec for the system and ideally should be written in a way that even a non-developer or business person can understand (i.e. the description of the test should describe a business scenario in plain English)
The purpose of unit tests are to enable sustainability of the system (to keep it from grinding to a halt due to bugs or breaking changes in the long run and to allow for refactoring and making changes with confidence)

Definitions:

Spec: Short for Specification. A description of what the software or system should do to satisfy stakeholders goals.
Unit of Behavior: A small grouping of code representing some behavior of the system. Unlike a unit of code (i.e. a single function or class), a unit of behavior can span multiple classes or objects.
Regression: A bug. When a feature stops working after a certain event (i.e. a code change etc.)
System Under Test (SUT): The part of the code in the system that is being tested. i.e. the classes, objects or functions.
Test Isolation: The ability for tests to run in parallel, sequentially, and in any order is called test isolation.
False Positive: A test that fails, but for the wrong reasons. This term can be confusing, so think of a test for the flu – if you test positive, that’s bad and means you have an illness, so here “positive” counter intuitively translates to a negative (i.e. a test failure) result.
False Negative: A test that passes, but for the wrong reasons.
Test Double: A Mock, Stub, Spy, Dummy or Fake to mimic a dependency instead of using it directly in a test. The term comes from “Stunt Double” used in film making. A future blog post will explore these in more detail.

What makes a good test? The properties of a good unit test:

Resistant to regressions (Catches bugs.)
Resistant to refactoring (The test is not brittle and will not break when code implementation, not functionality/behavior, changes.)
Runs fast for rapid feedback (You should be able to keep unit tests running during development to verify changes do not break something as you’re coding.)
Easy to maintain
Isolated (The test can be run independently of other tests and in any sequence of order. This allows for parallel execution for faster test runs and eliminates unexpected bugs caused from shared states.)

Do not focus on achieving 100% Line Coverage, focus on coverage of Behavior and Use Cases

One of the benefits of good test coverage is to give you confidence that changes to the system (i.e. refactoring or adding features) does not break it’s desired behavior.

Focusing on achieving 100% line coverage can, counter-intuitively, actually be problematic because it forces you or your team to write tests just to achieve that benchmark, whether the tests are valuable or not. This also has the side effect of incentivizing testing implementation details making for brittle tests that will break when refactoring occurs. You will start getting more and more false positives from your tests which will reduce confidence in them.

Not only does this create extra work unnecessarily, it also makes developers hesitant to make changes and refactor because they will then need to fix a bunch of brittle tests. The code stagnates and rots over time. Line coverage requirements are not necessarily a good idea and focusing on testing the observable behavior of the system will make for more valuable tests to cover what really matters.

Unit Tests need to run fast

You should have a fast running suite of tests that you can leave on watch mode while developing. The purpose of this is to enable you to catch bugs and breaking changes as soon as they happen. This greatly decreases debugging time because it saves you the trouble of hunting down exactly which part of the code broke something. If you have the tests running and they start failing, the part of the code that broke things was….the last change you made. This quick feedback loop will save lots of time that would otherwise be dedicated to debugging.

A Note on Naming Tests:

You should try to name your tests in such a way that a business person or domain expert would understand what the test is covering. This helps to make your test suite a documentation or spec for the system and what it’s expected functionality is. This in turn also makes the system as a whole and what it’s goals are easier to reason about and understand.

Suggestion: Tests should verify facts about the system. You may want to consider removing words like “should” and replacing with “is”, and when naming tests in general consider making a statement as if it were a fact about the behavior of the system. The C# testing tool XUnit, for example, actually uses [Fact] to declare a test definition.

Another suggestion is to just use plain English when naming tests instead of following a convention like: [MethodUnderTest]_[Scenario]_[ExpectedResult].
You shouldn’t use the name of the method or class in the test description because that is an implementation detail. If the function name changes, then you have to update all of the test names referencing it.

The Classical School vs. London School of Unit Testing:

There are two schools of testing: The Classical School and The London School. There are pros and cons to each, but I prefer the Classical School style. It is advocated by industry stalwarts such as Martin Fowler and the arguments for it, in my opinion, are compelling.

Differences:

The London School prefers using Test Doubles for all dependencies except immutable objects vs. the Classical School which only mocks shared dependencies
The London School considers a unit a class or function, while The Classical School considers a unit a unit of behavior which could span multiple classes or functions.
The London School considers a test isolated if the class or unit is isolated completely from other classes through mocking all outside dependencies, while The Classical School considers isolation of an entire test (the unit of behavior is separate from another unit, dependencies for classes may not be mocked).

One of the main advantages of The London School of testing is that if a test fails, you know exactly where and what part of the code failed since it considers a unit to be very granular (a specific isolated class or function). The disadvantage is that more mocking makes for more fragile and brittle tests. Since mocking everything except immutable objects, including intra-system communication, will require tests knowing more about implementation details instead of only verifying observable outcomes, they will be less resistant to refactoring.

Even though The Classical School of testing has some disadvantages like the lack of easily finding the specific failing function or class, these can be overcome by running the tests constantly to give you instant feedback when a breaking change is introduced – the source of the bug is easily found, it’s the last thing you wrote. The main advantage is that by testing observable behavior and outcomes, your tests become more resistant to refactoring, valuable and test the meaningful parts of the system. This also provides important feedback about pieces of code that cause cascading failures across the system. The London School approach would only indicate the specific class that failed since the classes under test are completely isolated.

Brent Marquez | Web and Application Development