The topic of testing has been a constant presence in all of the teams and projects I’ve been involved in while working at SensioLabs. I often find myself explaining the same concepts over and over again, so I figured it was about time to gather my thoughts on this topic and put them into words.
Unfortunately the definition of unit is quite vague and many have made the assumption that a unit must always be a single class or function. However, this need not be the case. If you have a group of objects that work together and could semantically be seen as a unit and is testable as a unit (more on that below) then a single unit test will suffice.
Unit tests can test a single object or a group of objects, depending on the situation.
With that said, people generally like the symmetry gained by having one test class per production code class, so it’s important to not be dogmatic about it.
Now, how do we decide whether to write one test case for a group of objects or test each component individually? This depends on how many different code paths exist.
Let’s say we have 3 objects that we want to test in unison, we’ll call them A, B and C. A has a dependency on B and C and makes full use of the different behaviors they provide:
Assuming A makes use of all the functionality B and C have to offer, in theory, if we wanted to test all the possible different scenarios and test every single execution path through these 3 objects, we would need 3 * 2 * 4 = 24
separate tests to cover all these scenarios.
On the other hand, if we test each component individually (and use test doubles in place of B and C, since we want to test A in isolation), we would only need 3 + 2 + 4 = 9
tests.
Testing objects individually and in isolation requires fewer tests to be written.
With that being said, it does come at a cost. When setting up test doubles, you need to ensure that the contract of the original object is followed, otherwise we fall victim to API drift and our tests may continue to pass even though the code it is testing is broken. This is absolutely crucial, and the use of test doubles are often the subject of heated debate as a result.
PHPUnit already does this to some degree. If a method on an object is mocked that doesn’t actually exists, it will complain and fail.
When using test doubles, you bear the responsibility of ensuring that they stay up-to-date. When a change in an object’s contract is made, all test doubles for that object must be updated as well.
Ultimately it’s a judgement call that needs to be made on a case-by-case basis. All roads lead to Rome, as they say. The most important thing is ensuring that an object is well tested and does what it is supposed to. Whether mocks are used or not, is of secondary importance (with a few exceptions).
A good rule of thumb to follow is to start by writing tests without test doubles. Only start using test doubles if you feel like you’re forced to write too many tests methods in one test case.
With that said, there are exceptions to the rule. There are certain things that should, in most cases, be mocked and certain things that should never be mocked:
If you have calls to Doctrine or Guzzle in an object whose primary responsibility is something else other than interacting with the external resource, consider extracting that logic to its own object. This will allow you to mock the object in the original class and write an integration test for the extracted object.
Objects that call external resources should be mocked when used as a dependency, and an integration test should be written for these objects.
Note: The term "integration test" can mean very different things depending on the context. What I generally refer to as an integration test may be called by a different name in your team or technical domain.
Integration tests ensure that an object that relies on an external resource works correctly and interacts with said resource in the desired fashion. Typical examples include repositories and API clients.
These tests should communicate with a real instance of the external resource and be as close to production as possible. Most API vendors offer test environments of their services that don’t affect production data. As for databases, a test database should be used using the same database software as production. For example, instead of using an in-memory SQLite database, MySQL should be used for testing if it is used in production.
Integration tests should communicate with an instance that is as close to the production setup as possible. No mocking should take place.
Objects that interact with external resources should be kept as simple as possible and only be responsible for the interaction and not contain any other type of logic. Integration tests are expensive and slow and we should strive to write as few as possible (whilst still thourougly testing that our objects work).
Integration tests are orders of magnitude slower than unit tests. We should keep these objects simple so that fewer tests are needed to ensure they function correctly.
The reason is simple: You’re at a much higher risk for API drift.
Changes in PHP objects whose behavior you have complete control over are much more visible. Maybe your mocking library might even help you in that regard by refusing to mock methods that don’t exist.
On the other hand, breaking changes in, for example, AWS APIs need to be tracked manually, and you need to have faith that the vendor will commit to not introducing breaking changes without letting API consumers know.
The chances of being notified of API drift are much lower when dealing with external resources not part of your codebase.
Many modern frameworks provide developers with the tooling to test their applications from the perspective of the user and to interact with an application much like a user with a browser would. This often comes in the form a of a test client that can send (or at least simulate) HTTP requests or a headless browser making real HTTP requests.
This ultimately begs the question: Assuming you cover all possible actions a user could do, shouldn’t this style of testing be enough to ensure a working application?
Like most things in software engineering, there is no definitive answer that applies to all situations.
By doing so, you’re essentially testing the application as a unit as described in the section above. This means that, for most non-trivial applications, covering every single conceivable user action would require (quite literally) an exponential number of tests.
But oftentimes you don’t have to cover every single user action.
In my opinion this really isn’t any different from the judgement call you need to make when deciding to test objects in isolation or as a group.
The only thing that matters is having tests that you are confident will catch regressions going forward. Whether you achieve that using plain unit tests or some fancy full-stack testing framework is completely irrelevant.