Quality Testing

Data Interactions With Tests

Correct data use is an integral part of scaling an organization on the software development front. Not knowing what a test needs is a warning sign that the test may fail in higher environments, thus reducing confidence in future deployments and increased time spent debugging failed operations. Thankfully, this is a solved problem with a simple paradigm. The number one thing that needs to happen is that each test should be idempotent. This means that it can be ran any number of times (system load aside) and should return the same result while not affecting other tests. To achieve this, each test needs to create and destroy its own data.

Data Creation

A test should not rely on pre-existing data being in the database. A test should not rely on certain data being arranged or built and already in the environment. Instead, a test should seed its own data.

Leverage test annotations depending on your data needs.

Data creation needs will change from scenario to scenario. A login can (in many cases) be shared among tests in the test class. Whether it’s a service level test or an end-to-end test, logging in once and passing that authentication around is usually a good practice. However, a unique transaction or purchase object may be needed for each specific test. Say, like something to test login attempts with invalid or revoked credentials.

The data that needs to be inserted into the database should not be direct database calls. That would be dangerous and difficult to maintain. Instead, the tests should directly call a controller or API endpoint. Utility (or helper) classes can be used to create complex objects that may require multiple other embedded objects.

An example of that is the need to test that an employee has an emergency number. You start by generating a phone number, assigning it to an employee, and that employee belongs to a department, and maybe that department to an organization. That would be a lot of work to maintain if raw SQL were being used. Instead, using endpoints to create the data that’s needed is much less maintenance in the future.

Note: To make the test data more unique (and visually appealing), you can leverage a library called Faker to create the random alphanumeric characters required.

Data should be created before any navigation occurs in a test. This ensures that an end-to-end test navigates to a page where it expects data, and the data is already created and not possibly still in flight. If the system is slow to create data, the amount of time spent in creation can cause a problem where data is expected on a page, but it’s not being drawn in because it’s still being added to the database. This can cause a flaky test due to the performance of the APIs. This is often easy to avoid, so let’s avoid it.

Data Use

The only data that should be created is the data that is going to be used for a test. Anything else that is created and goes unused will be a waste of time, resources, and will confuse the next person to edit that area of code. Any code that is left in the test that’s doing nothing adds needless complexity. If there isn’t enough data that is created, the test obviously won’t work correctly. Use only what you need, and cut everything else out.

Data Destruction

Much like the Scouts, we want to leave no trace. All data that is created during testing should also be destroyed. This is best served by creating an object in the BaseTest class (let’s call it DataCollector), and each time a test creates test data, it should be added to DataCollector.

Person person = DataCollector
    .add(new PersonBuilder("firstName", "lastName")
System.out.println("Person ID is: " + person.id)

With every test extending a BaseTest, DataCollector can be called to destroy all of the data automatically for every test. Ideally, all of the API delete endpoints have cascading deletes. So a person is deleted, and then the department, and then the organization, referencing the example above. If they endpoints don’t support cascading deletes, you can set the order in which deletes occur from DataCollector by removing all of that type of object, followed by the parent object, and go all the way up the chain. It is more work, but it’s superior than sometimes, maybe, cleaning out the database by hand. A cron job to run a cleanup script isn’t bad either, but it’s less ideal due to having to wait for it.

When the test suite has finished executing, you can go through the list of objects attached and delete object by its id. When an object returns a 4xx response, you can often safely ignore it because some objects get deleted while testing the frontend.

The belt and suspenders approach to ensuring that the environment is clean is to run queries to ensure that the database is empty (sans seed data, of course). Run the tests and then allow the cleanup script to run. Running the initial queries once more should result in the same results – no extra data in the database. If that is not the case, then there are tests not using DataCollector or one or more of the delete operations are failing (due to execution order of deletion or the delete endpoints aren’t working correctly).