> this also requires you update (or remove) these additional checks as your software and infra change... that's what most people forget, and then alert fatigue sets in, and everything becomes a non-alert
This is what I forgot to mention, monitoring is a process, not a tool you put in place and forget about. And I agree that this is the main cause monitoring goes to shit. But if you think about it a little deeper you arrive at the requirement to have monitoring done by somebody who understands the business and the infra/software stack and has been at that company a few years at least. While in reality monitoring is mostly an afterthought cost center and employees are rotated far too often. It's not a glamorous position either, I don't think I once met somebody who wants to work there.
This is what I forgot to mention, monitoring is a process, not a tool you put in place and forget about. And I agree that this is the main cause monitoring goes to shit. But if you think about it a little deeper you arrive at the requirement to have monitoring done by somebody who understands the business and the infra/software stack and has been at that company a few years at least. While in reality monitoring is mostly an afterthought cost center and employees are rotated far too often. It's not a glamorous position either, I don't think I once met somebody who wants to work there.