Leveraging Annotations in Tests

Leveraging Annotations in Tests

The purpose of leveraging annotations (or hooks) is so you can set up an environment and have the appropriate amount of data seeded for the tests to use. Common JUnit hooks are covered here. Any modern testing framework leverages this basic model.

A quick rundown of when to use what:

BeforeClass

BeforeClass (or equivalent) when you want something to execute once for the test class, before any tests run, and before the BeforeTest code runs.

This is where you set up your environment and prepare any data. Any data that is shared between all tests belongs in here.

Before

Before will run before each test method. If you have 3 tests, whatever is in Before will run 3 times in total.

This is where data preparation should go if a fresh set of data is needed for every test. If this is an E2E (web) test, this is also where the browser should be opened and navigation to the appropriate page for testing should occur.

After

After runs after each test method.

This is where data cleanup should go if you are creating new data before each test.

AfterClass

runs after all of the tests have finished, and after the last execution of After

This is where data cleanup should happen for anything that was created in BeforeClass. If having a pristine environment is a goal, this is where you should try to delete anything that was created but not successfully deleted. If you’re unable to delete it still, you can log the information so it’s not lost. It may require someone to take a look into why something is failing. This often uncovers bugs around edge cases that weren’t thought about.

Naming Test Methods

Naming Test Methods

Creating a good name can be a difficult task at first, but it can pay dividends in the long run. The larger a codebase gets, the more complex it gets, or the more members of a team that are working on it – the more important having clear, concise test names become.

Tests should be named according to what the user is trying to accomplish. When you see a failed test title, you should have a good idea what the test was trying to accomplish. A test method with a small scope and a clear, distinct name allow anyone to quickly and easily identify what it is doing.

thisShouldDoThat is usually a good approach to the name. If you’re having trouble deciding on how to name your test, think to yourself

  • Is this test trying to verify too much, and should it be broken up?
  • Is this a full test, or should this be part of another test?

A good indicator of a bad test name is if it includes any of the following words:

  • check
  • verify
  • test
  • assert
  • correct
  • right
  • good
  • okay

“Check”, “verify”, “test”, “expect”, and “assert” are bad words for a test because it is a test. If you’re not checking/verifying/testing/expecting/asserting on something, then it wouldn’t be a test. It’s redundant and unnecessary.

“Correct”, “right”, or “good” are bad words because somebody unfamiliar with a feature looking to a failure doesn’t know what they mean. You should be more specific as to what criteria makes it correct/right/good.

If a test is named eventStatusIsCorrect, how does someone who hasn’t worked on the feature know what correct is? eventStatusIsNotNull is an improvement. But again, we can go further. What happens that makes the event status not null? eventStatusIsNotNull_WhenStatusButtonIsClicked is a good test name. It says what is expected and what action takes place to make that expectation.

This is a carefully crafted list of bad test names, and generally bad practices that you can look at as reference of things not to do.

Naming Test Classes

Naming Test Classes

Test classes should represent one story or even more likely, part of one story. Test classes with a clear and small scope allow anyone to quickly identify what the class is doing. It is completely valid to have multiple test classes with multiple tests per user story that your team delivers.

Like test method naming, test classes should be named based on the feature and it should be easy to understand what that test class is doing simply based on the name.

ForgotPasswordTests is an example. It’s not specific to any page, and simply by the name of it you immediately have a pretty good understanding what that test class is about. Going further, even this can be broken up or extended upon with additional test classes such as ResetPasswordTests, PasswordLockoutTests, PasswordCreationTests, PasswordStrengthIndicatorTests, and so on.

A good rule to follow would be that there should be at least as many test classes as your team has user stories, per sprint. Each of those classes should have multiple, smaller tests within it.

Code is Guilty Until Proven Innocent

Code is Guilty Until Proven Innocent

The importance of defensive programming

Defensive programming is a commonly used term. Wikipedia defines this as “a form of defensive design intended to ensure the continuing function of a piece of software under unforeseen circumstances. Defensive programming practices are often used where high availability, safety or security is needed.” This are important principles to keep in mind when building something that will be released into the wild.

The example the immediately jumps into my mind is Jenkins. When Jenkins executes a build process it defaults to passing. Unless the process encounters an error or failing tests appear in the JUnit XML, the build was always be marked as passed. This is not defensive programming.

This has previously been the cause of a number of problems for me. Builds that should have been marked as failed were actually marked as passed. Builds that did nothing were marked as passed, and so I was able to continue thinking that everything was OK. Everything was not OK.

The way this currently works can look something like this:


Start job
Pull down some code
Unsuccessfully execute some script
Jenkins build result = passed

The correct way to handle this is to have the system fail everything by default, and don’t let anything pass until a condition is met. Jenkins should go through a code flow, and the flow should explicitly set the build to passed at the end.


Start job
Pull down code
Execute some script
If a specific condition == true
    Jenkins build result = passed

If the boolean condition doesn’t match what is expected, then the build won’t be set to passed. This would indicate that something went wrong during the process. Again, this would be the ideal way to do it, not what actually happens.

The way that Jenkins currently does this can be problematic for anyone, but especially problematic for anything mission critical. Example: A healthcare company performs batch processing for blood results every 15 minutes. The processing job will open a file containing information about bloodwork. In the file is the reason for the blood draw, what the expected range of values of the measured information is, and what the actual value was. Each person’s test result will be Positive, Negative, or Other (indicating a problem with the procedure). Now, someone performs a bad update on this script which no longer writes out each individual’s’ test result. When Jenkins runs this batch processing job, it will pass. It will fail only if the file that is being written out is being used later on in the job and isn’t there. However, if that’s not the case, everything will continue on and (hopefully) this failure will be caught somewhere down the line.

In Jenkinsfile (a DSL on top of Groovy that allows one to put each Jenkins job into code), the parameter is currentBuild.result and it defaults to passed. If the code is ever set to anything other than passed, it cannot be reset to passed. Cloudbees has made it more difficult to do defensive programming. The workaround for this system is to create a new variable, actualResult and default it to false. Code the job in Jenkinsfile as it normally would happen. At the condition where it is known if the build should be passed or failed, actualResult should be true or false. At the end set currentBuild.result = actualResult.

Outside of the awfulness that is Jenkins, there may be more common, simple real world examples to grasp this concept. When a form has a user input field that accepts a phone number, one would assume that the developers would put validation on the field. The field should, by default, say that the input (or lack thereof) is invalid. It should go through validation to prove that it is good. Without any sort of validation, “abcd” would be a valid phone number. That’s not what the database field will want to accept, and it should error out. Instead, the field should determine that there are the appropriate amount of numbers, an extension is or isn’t included, a country code is or isn’t present, or whatever else may be desired. It shouldn’t accept undesirable characters.

By assuming that code is guilty and it has to prove its innocence, by validating input, and checking expected conditions before blindly stamping a seal of approval (a passing build), a larger amount of safety and security will be baked into applications. This is important for mission critical systems, and it’s not any more difficult to put into smaller applications. Follow these practices and save yourself the headache later on. Always prove innocence in code.

Making Every Line Count

Making Every Line Count

Keep It Simple, Stupid

Every line of a test should be useful. If you can remove a single (non-assertion) line from a test, and have it still pass, then you’ve got a test that should change. When writing software, the mindset you’ll want to be aware of is KISS – Keep It Simple, Stupid. The idea is to keep it as simple and straightforward as possible. This concept also applies to writing application level code. A test performing unnecessary actions is undesirable. A good test will only test one thing, be explicit about it, and do it well by making every line count.

To demonstrate what a good test looks like, let’s make an example that applies brakes to a car and verifies that the car is slowed down.

public void carSlowsDown_whenBrakesArePressed() {
    Car myCar = new Car();
    myCar.setSpeed(50, MPH);
    myCar.pressBrakes(20, 6); // Press it 20% of the way down for 6s
    assertTrue(myCar.getSpeedInMph() < 50);
}

As you can see in this example, the test title expects the car to slow down after applying the brakes. We start by creating a car object followed by explicitly setting the speed. I have opted not to check that the speed is 50mph after setting it, because presumably that would be covered in another unit test. We apply the brakes for a period of time, and then we assert that the speed of the vehicle is less than the speed we started with in the previous line. Every line serves a purpose in this test.

Of course, not every test is this simple. That’s not to say that the tests can’t be made more simple, but it does take work. If the majority of tests were as simple and straightforward as this, then working with tests wouldn’t get the stigma that it often does. This can be something to strive for. When writing a test, or doing a code review, ask yourself “what is the purpose of this line?”. Cut away the fat and only include the necessary lines.

Tests Should Be Deterministic By Design

Tests Should Be Deterministic By Design

The purpose of a test is to figure out if something does or doesn’t work. When writing any type of automated test, it pays off to be certain about what is and is not testing. After figuring out what needs to be tested, the next step is to figure out the minimum amount of data that is required to produce the desired outcome. When the author is in control of all of the data, and the item under test, then any scenario relating to data should be under control. What I’ve seen all too often are non-deterministic tests. Tests that might do this or that. This is bad.

I have seen several automated tests set up data and then do the following:

@Test
void productWorks_whenThingIsClicked() {
    setupData();
    navigateToSomePage();

    if (thing.isVisible()) {
        clickSomeObject(); 
        doSomethingInteresting(); 
    } else {
        clickSomethingElse(); 
    }

    assertTrue(someItemOnPage.isVisible()); 
}

The problem with this test is that it’s testing 1 of 2 flows. This test should be broken up so that there are two tests, and each scenario can be covered. When a single test tries, and fails, to cover two scenarios, then a code path is left untested. In that case, it’s not very different than having a missing test case. The difference is that it looks like something may get tested here when it actually isn’t. When looking at a test, it should be clear what the test is doing and why it’s being done.

Sometimes the reason for this is because of a feature flag. In my experience, it’s OK to create tests that rely on a feature flag IF the feature flag is removed in a timely manner (2 weeks or less) and the test code is updated to reflect this. After two weeks (one sprint for many companies), it begins to fall off of people’s radars. It may drop in priority, and then the conditional logic remains in the test.

Another approach to solving this with the scenario from above is to create an annotation, or subset of tests, that will only run depending on the parameters passed into the test. Instead of 1 test there would be two, and they would have annotations, @FeatureFlagOn and @FeatureFlagOff. When the command to run the tests is given, it will filter to pick up all of the FeatureFlagOn tests, and filter out all of the FeatureFlagOff tests, and vice versa. Again, this is adding condition, but one step removed. This will be running certain tests depending on configuration. This is a slightly better approach than the previous one, but it still suffers from the same pitfall of being forgotten. The upside this has is the name and code are explicit.

The best way to solve this problem, in my experience, is to get everyone to agree that the code will be short lived, and it will be removed after the experiment. Once they agree, and the code is written, create 2 pull-requests. One to merge the code in, and another to revert the code. Anytime the pull request page appears, it will be on everyone’s minds that the code needs to be ripped out.

Conditional logic in tests is bad because it covers up code paths that should be tested, and it’s not explicit as to what does and doesn’t get tested. Timebox A/B experiments so they cannot live on forever. Create a pull request to revert the code as soon as you create the code so it stays in sight.

External Dependencies in Tests

External Dependencies in Tests

They’re bad.

Unit tests are the lowest level of testing. A unit test should test the logic of a given function. Any database calls, web requests, or any other external dependencies should be mocked out. It should have a happy path test, and the majority of edge case tests should happen here. In short, having a codebase set up, the build tool, and dependency management tool (gems, jars, whatever the codebase needs to run) installed, the unit tests should be able to run without spinning up any databases, services, and won’t require the internet.

A store API will be our example. A common action will be to find out how many groceries are in the shopping cart, and to display a particular error message based on the JSON response. This logic can be covered by a unit test. A properly written unit test will mock out the response. It will look something like this:

@requests_mock.mock()
def test_success_message_displayed_when_json_has_returned()
    mock.get('https://groceries.greatstore.com/cart, json=[items: {apple: {quantity: 1}}])
    assert store_cart.get_item_count() == 1

This is a good example because the request is mocked out, so the data is hardcoded and won’t change. There are no live requests being made in a unit test. No matter how many times this test is ran, the JSON will always be returned the same until it’s changed. A bad example of this is to perform the same scenario using a live request.

@requests_mock.mock()
def test_success_message_displayed_when_json_has_returned()
    setup_test_data_for_cart()
    urllib2.get('https://groceries.greatstore.com/cart’) 
    # Live requests in unit tests are bad
    assert store_cart.get_item_count() == 1

There are several cases where the logic under test here will pass, but the test itself will fail due to an external dependency. A few ways this can happen is if:

  1. Greatstore.com doesn’t return the item count because of a request timeout
  2. HTTP response was a 500
  3. HTTP response is a 302 because the URL was temporarily changed
  4. HTTP response is a 301 because the URL was permanently changed

There are a myriad of places where this could go wrong. This would be a bad test because it introduces a level of uncertainty. Without logging the HTTP response, it would be very difficult to figure out what actually happened.

Another reason to use a mocked out response instead of a live request is because the performance is much better. When performing a live request, the packets need to make it out of the machine where they’re running, out of the network, reach the destination, wait for a response, and then send a payload back. The tests are capable of slowing down purely because of the amount of network traffic. With a mocked out response, hardcoded data is always loaded into a variable. This skips all network activity to acquire the same information you would otherwise want to retrieve.

Minimizing the number of external dependencies in a test will improve the test stability, and that will increase the code confidence of the team working on the project. Removing any external dependencies in unit test prevents an investigation into a flaky test failure. It’s worth mentioning that if time is important, live requests to a system take much longer reading in mocked out data. Reducing the number of external dependencies in unit tests will provide the best performance and stability.

Write The Damn Tests

Write The Damn Tests

One of the problems I’ve recently seen are people writing code in a codebase that has tests, but they choose not to run the tests. When questioned why they don’t run tests, they say what they’re working on is important and has to “get in” for the release. I wholeheartedly disagree with this approach for a number of reasons, and the biggest reason is because it’s bad for quality.

In this scenario, there is someone writing new code, but not adding to or running the existing tests. We also have an automated CI system in place already that will run the tests. There will be peers looking at the pull request and seeing new code with no new tests. There will repository owners that need to approve the change, but may see that new tests are broken. These are all bad, and easily avoidable problems. To fix this situation, the culture needs to emphasize that quality is more important than quantity, and people will need to change the mindset of “just doing it” because it’s on a checklist, to something that is more purposely driven.

Jason Fried and David Heinemeir Hansson wrote in their book, Rework, that “It’s better to have one half of a kick ass feature than to have one half assed feature”. Those words have resonated with me ever since. It’s the staple of quality over quantity. It doesn’t matter how many features are shipped if they don’t work.

Get the organization to understand that quality is important to the customer. When those hard-to-find race conditions that the customer mentions that screws up their workflow are located, they should be fixed instead of working on another feature that wasn’t requested. Quality isn’t something that is added later on in the product. Quality is an organizational effort that’s baked in from the beginning. It’s not a small team of people that go through a checklist to add quality – it’s a mindset that everyone working on the product decides to take up.

Compromise quality in the beginning, and it will be easy to continue to ignore due to Broken Window theory. If something is already in a bad state, it’s easy to work on something else rather than fixing it. Working in a company like that won’t promote a good growth environment. Run the tests, write the code, fix the problems, run the tests.

Passing Isn’t Always Good

Passing Isn’t Always Good

When a failing test can be a good thing

While understanding the importance of passing tests before promoting code to downstream environments, it can be easy to lose track of what tests are in place for. Tests are in place to find out when things aren’t going to work as expected. There have been many cases where a test is failing after a change, and the test was either removed or the assertion changed.

There are certainly reasons to remove tests; if a user story changes and functionality is removed or changed, the tests need to be updated or removed to reflect this as well. In a good BDD or Spec test structure, a developer should be able to easily identify which tests would be affected by the change, and update/remove as necessary. In other cases, this can prove to be more difficult.

Tests removed don’t strike me as a problem if they’re being replaced by new tests. If they are not replaced with new tests, that should raise a red flag because you will be introducing new, untested code.

In the cases where an assertion is flipped, (e.g. expectTrue becomes expectFalse) and the title isn’t changed, that should also raise a red flag. The assertions often drive what the intent of the test is. engineTurnsOn() will have a final assert, assertTrue(engine.getRunStatus()). If that assert becomes assertFalse(), then the intention of engineTurnsOn() will change, and the test should have a more appropriate name, engineDoesNotTurnOn().

When someone is changing test code, there should be a reason other than, “so the tests pass”. When you have passing tests, but they don’t mean what they’re supposed to, they won’t offer much value. When spotted, this is a sort of thing that needs to be called out. Tests that say they do one thing, but actually do another are misleading. Those types of tests lead to increased complexity, difficult to understand results, a lack of clarity…etc. They are cancerous.

You don’t want to hear “so the tests pass” because it will lead to bad culture. Instead what you want to hear is “so the tests do what they say” or “so the tests are accurate”. Something you can identify to a positive change. You don’t want failing tests to be a pain, you want failing tests to point out where something isn’t working where you expect it to.