When you’re working on a monolithic system and automate tests for it — which you’re supposed to — the time it takes to run the whole suite grows proportionally with the size of the system. In a significantly large system, the whole run might take many hours or a few days to complete.
There are several reasons we automate tests. At the very least, they give us confidence that our latest changes haven’t broken anything. The less time it takes to confirm that, the sooner we can release the brain space occupied by those changes and move to the next thing.
If that feedback loop takes a few hours to return the results though, that creates multiple problems.
First, you can’t completely release the brain space about the latest changes you did because you’re not yet sure they haven’t broken something. That means a part of your brain is still occupied by previous tasks and you can’t fully focus on the next task.
Second, even if you don’t care and release the brain space right after pushing your changes to the repository, if you actually break something, as soon as you find out about that, you need to stop whatever the next thing you’re working on, and get back to fix a regression.
Depending on how long it takes to run the whole suite in your particular case, it might be later in the day, the next day, or even the next week.
The larger a monolithic system becomes, the more likely it is that changes to the codebase will break a thing or two. That’s because most monolithic systems are big balls of mud where almost every subsystem knows about and interacts with almost every other subsystem. Clear boundaries is another problem microservices solve.
Given you’re working on a typical monolithic system and you’re pushing changes almost daily, your changes are very likely to break something. And hence at some point, you’ll reach a state of constant interruptions caused by having to fix the regressions you introduced hours or days ago.
Imagine a scenario where you learn about regressions on the second day after pushing the changes causing those regressions. That means you have to remember whatever you had in your mind when changing code two days ago. And imagine that happening on a more or less regular basis. Like a couple times a week. Not a great situation to be in.
Another benefit of the confidence provided by automated tests is being able to deploy to production whenever there’s something worth deploying. When test automation is done properly and hence catches regressions, a passing suite means nothing we expect to break broke and hence we can deploy to production.
And we do get that benefit when a system we work on is relatively fresh and small. But since most successful projects don’t just stop and enjoy their success, the system constantly keep evolving. And that means that eventually the system will grow so much that the whole suite of tests will inevitably start taking hours to complete. Give it a few more months or years in that direction, and it might start taking days.
And if we stay loyal to the idea of only deploying to production the changes that are passing, each deployment will be delayed by as long as it takes to run the whole suite. If it takes a couple days to complete — which is not the worst case in the real world — that means we can’t deploy anything we’ve implemented or fixed today.
It might be okay if it only means delaying deployment of new features, but it’s not that simple. It also affects deploying bug fixes because you can’t ever be sure that a bug fix you’ve just pushed doesn’t introduce another bug by breaking some other remote part of the system.
And not being able to right away deploy fixes for bugs you know affect some of your customers is just sad. It’s especially sad when you already have a fix but you can’t deploy it because of the fear of breaking something else.
Microservices to the Rescue
These problems get automatically solved by microservices just by the virtue of most microservices being much smaller than monoliths.
Test suites of most microservices are likely to take at most a few minutes to complete. Even if you have a relatively big microservice, it’s very unlikely its suite of tests will take longer than an hour to run.
That means that once you’ve pushed a change to the codebase of a typical microservice, you can get a confirmation of not introducing a regression, unload the context from your brain, and move on to the next task — all in a matter of minutes.
It also means that if you have a bug in a microservice, you can investigate, fix, and deploy the fix to production in a matter of minutes. Sure, some bugs take longer to figure out, but once you have figured it out and added or modified a test to confirm the bug, you can fix it, rerun the whole suite of tests, and deploy right away.
Another nice side effect of microservices when it comes to tests is that you’re less likely to break a distant part of a codebase that you have no clue about but have to figure out to fix a regression you’ve introduced. You’re actually likely to know the whole codebase of an averagely sized microservice you’re working on.