> I've worked in large polyrepo environments. By the time you get big enough that you have internal libraries that depend on other internal libraries, debugging becomes too much like solving a murder mystery. In particular, on more than one occasion I had a stacktrace that was impossible with the code that should have been running. A properly-configured monorepo basically makes that problem disappear.
On the contrary, a monorepo makes it impossible because you can't ever check out what was actually deployed. If what was running at the time was two different versions of the same internal library in service A and service B, that sucks but if you have separate checkouts for service A and service B then it sucks less than if you're trying to look at two different versions of parts of the same monorepo at the same time.
There is no source of truth for "what was deployed at time T" except the orchestration system responsible for the deployment environment. There is no relationship between source code revision and deployed artifacts.
Hopefully you have a tag in your VCS for each released/deployed version. (The fact that tags are repository-global is another argument for aligning your repository boundaries with the scope of what you deploy).
Why not? I’m doing it right now. The infrastructure is versioned just like the app and I can say with certainty that we are on app version X and infra version Y.
I even have a nice little db/graph of what versions were in service at what times so I can give you timestamp -> all app and infra versions for the last several years.
Unless your infrastructure is a single deployable artifact, its "version" is a function of all of the versions of all of the running services. You can define a version that establishes specific versions of each service, but that's an intent, not a fact -- it doesn't mean that's what's actually running.
Am I missing some nuance here? Yes the infra version is an amalgamation of the fixed versions of all the underlying services. Once the deploy goes green I know exactly what’s running down to the exact commit hashes everywhere. And during the deploy I know that depending on the service it’s either version n-1 or n.
The kinds of failures you’re describing are throw away all assumptions and assume that everything from terraform to the compiler could be broken which is too paranoid to be practically useful and actionable.
If deploy fails I assume that new state is undefined and throw it away, having never switched over to it. If deploy passes then I now have the next known good state.
Oh, this implies you're deploying your entire infrastructure, from provisioned resources up to application services, with a single Terraform command, and managed by a single state file. That's fine and works up to a certain scale. It's not the context I thought we were working in. Normally multi-service architectures are used in order to allow services to be deployed independently and without this form of central locking.
If what was deployed was foo version x and bar version y, it's a lot easier to debug by checking out tag x in the foo repo and tag y in the bar repo than achieving the same thing in a monorepo.
I'm not sure I understand how that scenario would arise with a monorepo. The whole point of a monorepo is that everything changes together, so if you have a shared internal library, every service should be using the same version of that library at all times.
And every service deploys instantly whenever anything changes?
(I actually use that as my rule of thumb for where repository splits should happen: things that are deployed together should go in the same repo, things that deploy on different cycles should go in different repos)
Not necessarily instantly, but our CD is fast enough that changes are in production 5-10 minutes after hitting master.
But what's more valuable is that our artifacts are tagged with the commit hash that produced them, which is then emitted with every log event, so you can go straight from a log event to a checked-out copy of every relevant bit of code for that service.
Admittedly this doesn't totally guarantee you won't ever have to worry about multiple monorepo revisions when you're debugging an interaction between services, but I haven't found this to come up very much in practise.
Edit: I should also clarify, a change to any internal library in our monorepo will cause all services that consume that library to be redeployed.
> What do you do with libraries shared between different deployment targets?
The short answer is "make an awkward compromise". If it's a library that mostly belongs to A but is used by B then it can live in A (but this means you might sometimes have to release A with changes just for the sake of B); if it's a genuinely shared library that might be changed for the sake of A or B then I generally put it in a third repo of its own, meaning you have a two-step release process. The way to mitigate the pain of that is to make sure the library can be tested on its own without needing A or B; all I can suggest about the case where you have a library that's shared between two independent components A and B but tightly coupled to them both such that it can't really be tested on its own is to try to avoid it.
That's a great test and I think an argument for monorepo for most companies. Unless you work on products that are hermetically sealed from each other, there's very likely going to be tight dependencies between them. Your various frontends and backends are going to want to share data models for the stuff they're exchanging between them for example. You don't really want multiple versions of this to exist across your deployments, at least not long term
I think it's maybe an argument for a single repo per (two-pizza) team. Beyond that, you really don't want your components to be that tightly coupled together (e.g. you need each team to be able to control their own release cycles independently of each other). Conway's law works both ways.
If they have independent release cycles, they shouldn't be tightly coupled (sharing models etc. beyond a specific, narrowly-scoped, and carefully versioned API layer), and in that case there is little benefit and nontrivial cost to having them be in a monorepo.
Not GP, but I use versioned packages (npm, nuget, etc) for that. They're published just like they're an open source project, ideally using semantic versioning or matching the version of a parent project (in cases where eg we produce a client library from the same repo as the main service).
On the contrary, a monorepo makes it impossible because you can't ever check out what was actually deployed. If what was running at the time was two different versions of the same internal library in service A and service B, that sucks but if you have separate checkouts for service A and service B then it sucks less than if you're trying to look at two different versions of parts of the same monorepo at the same time.