Rewriting a Monolith – Where the Journey Begins
We’re working in an ever changing industry, where the market value of a software developer is often measured by its relevancy. I bet pretty much everyone of us aims to keep one’s knowhow up to date to ensure the demand for our services exists now and in the future.
This is the story of a rewrite of an ecosystem that has evolved over the years to a distributed monolith. This is also the story familiar for so many of us – developing something new with the “weight” of the past on our shoulders. My idea is to write several blog posts as the implementation goes on about the topics I find valuable for sharing.
Let’s start our journey from the very bottom of our stack.
Data ties our hands but keeps the show going on
Wouldn’t it be nice and easy to start from a clean table with no strings attached from the previous version. Well, yes and that’s what we’re doing. But only kind of.
The biggest asset of the monolith being rewritten is the data that has been gathered during the decades its initial version has been in production use. In the field where the monolith operates data is crucial. It provides the backbone for correct decision making now and in the future and has nationwide importance for statistical analyses and even political decision making. The competitors might already have superseded the monolith with more modern solutions if we wouldn’t have the recorded history on our side. It’s fair to say that the data keeps the whole thing alive, also for the implementation round 2.
We have to guarantee the non-breaking production use of the initial version of the monolith while implementing the next generation version of it. As we know, playing in two teams at the same time is impossible, which has led us to a decision of going forward with the database first approach. This means that we’re keeping and using the current database implementation as it is, with all its defects and inconveniences and focus on implementing new tables, etc. in it with a more modern approach. Did I just pull this blog post down the drain while writing that? Bear with me, as the biggest change is how the database will be used after the rewrite and we’ll talk about that in the next chapter.
The birth of a distributed monolith
In the beginning there was the monolith and in the end there was a distributed monolith. What happened in between?
Like I wrote earlier the monolith would probably have already been replaced with a more modern solution if it wasn’t for the data. However, as important as it is, data is not everything. The customers of the monolith recognized the value of the data but also the fact that time had passed the monolith by in the fields like usability, intuitivity, clarity, etc. This led to a situation where some customers of the monolith ended up writing their own UIs on top of the business logic located in the backend of the monolith. Cynic might say it’s never that straightforward and (s)he would be taking a correct stand with that claim. As the monolith has several end users (later stakeholders) implementing their own UI the mixed requirements of the stakeholders needed to be handled somehow. This is where the monolith started to evolve to a distributed monolith as several new generic and/or stakeholder specific backend services were introduced to meet the mixed, sometimes even contradictory, requirements of the stakeholders.
Doesn’t sound too bad, right? Well there are more layers on this cake. The database couldn’t be modified to meet the requirements of each stakeholder and thus the dynamicity and flexibility had to be searched from the programmability of the database – meaning the stored procedures, user defined functions, etc. of the database. These decisions lead to where we were standing on the edge of the rewrite. The business logical reasoning has, in many parts, sneaked its way in the database layer, complicating e.g. unit testing significantly. And it doesn’t stop there. Despite the fact that many stakeholders had licensed only some of the newly introduced backend services in their use, the backend services are in many parts also heavily interdependent. As the backend API A can’t rely on the existence of the backend API B in the same operation environment, a lot of redundant business logical implementations can be found from any layer of the backend stack. The redundancy occurs in the form of stored procedures, user defined functions, API endpoints including their validation logic, etc.
As a result of all written above, we find ourselves quite deep in the dungeons of the dependency hell.
The exit from dependency hell
The divine lift rides are offered by Entity Framework. How come, one might ask? Let me clarify that next.
Like you have read above, the business logic of the monolith was located in many parts in the stored procedures, user defined functions, etc. in the database. You can imagine what a mayhem it was to try to unit test them. Some might argue, but writing business logic as an SQL script or similar can also be quite tricky when comparing to normal C# or Java backend programming. Entity Framework allows us to pull the business logic away from the database layer and place it to the actual business logical layer of our backend services where it is easier to implement, unit test, maintain, etc. You might think I’m on to something but being still unable to taste the stake? Let me explain a bit further.
The heart of the Entity Framework is the database context, which takes care of all data interactions between your application and the database. Database context is elemental also when doing AI assisted programming as the AI is now able to understand the database structure and help you in both implementing the business logic and writing unit tests for it. Without the database being contextually available all this would be virtually impossible.
So far so good but what about the customer specific needs in business logic? This issue will be tackled with the help of license bits and feature toggles. We still need to write the code and the unit tests for the bits of business logic where the needs of the stakeholders vary, but now we’re able to do this AI assisted which results in e.g. improved productivity and faster throughput times.
How about the interdependency of the backend services? Let’s talk about that in the next chapter.
Doctor – we got mono!!!
Some say mono, also in the form of repos, is a disease but let me explain how it can also be the cure.
I’m not trying to say having a monorepo would come only with benefits. I totally admit that we are likely to face slower build times, more complex CI/CD setups, etc. but when it comes to dependency hell (one of the most common pitfalls of monorepos), this is actually the single most significant reason behind this architectural choice.
Like we’ve learned earlier, the backend services of the current monolith are heavily interdependent. How is this possible when the backend services are distributed to multiple repos? Well, it isn’t. Having a distributed monolith with multiple interdependent but still independent backend services would have resulted in a full blown dependency mayhem without having own copies of the shared business logic in the codebases of each of these individual backend services. This is what we’re trying to avoid with monorepo. We will still publish multiple different backend services but they share the same codebase making the common business logic available for each backend service without the need of having own copies of them in different repos. I admit that this design choice might also backfire when it comes to dependency and version management, but we anyway value the benefits over the risks here.
What’s up next
Stay tuned as my next blog post will concentrate on managing 3rd party integrations, UI development and local development setup.
