Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I'm running more complicated stacks to capture massive amounts of data in order to use it less and less. > […] > My experience is companies do not anticipate that the cost of monitoring an application can easily exceed the cost of hosting the application even for simple applications.

These problems resonate with me for sure, and after tackling them at other companies, they were part of our team’s inspiration for building Heii On-Call [1] as a lightweight, minimal monitoring platform for apps / APIs / Cron Jobs / etc.

Instead of “ingest all the metrics/logs” we have guides showing, for example, how to set up the most minimal possible Prometheus / Alertmanager, with the smallest footprint of CPU/memory/disk requirements. [2]

And for many of our users they don’t even use any of our Prometheus or Datadog integrations, and just stick with our dead simple out-of-the-box HTTP monitoring. [3]

Note that these address the “now” problem of monitoring: is it working now, or not? While most of the article talks about the “past” problem of monitoring (long term storage of logs, metrics; pretty charts).

The “past” part is the part that the author is pointing to as being rarely useful and the source of the cost and complexity. I think there’s probably still value in dumping whatever logs and metrics you want into S3 just in case you need to go digging one day, but for the most part, this is pretty low value. The relatively higher value and lower implementation complexity is in the “now”.

[1] https://heiioncall.com/

[2] https://heiioncall.com/guides/minimalistic-monitoring-and-al...

[3] https://heiioncall.com/docs



>Instead of “ingest all the metrics/logs” we have guides showing, for example, how to set up the most minimal possible Prometheus / Alertmanager, with the smallest footprint of CPU/memory/disk requirements. [2]

I'll bite... why _not_ "ingest all the metrics/logs"? I see 2 possible reasons: 1) don't have enough resources (CPU/memory/disk) to consume "all the logs"; 2) you won't be able to find what you need.

I think both are a failure of the tools in the space.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: