Logging is one of those things that remains mostly an afterthought for a lot of ...

CBLT · on Aug 8, 2023

I'd characterize logs as a poor tool for doing the two tasks people use them for: investigating the state of the server, or investigating execution of a request. I instead am a strong believer in separate tools for those two tasks.

Server state should be exposed through metrics. Metrics have far fewer sharp edges than logs, and it's more obvious how to correctly produce, consume, and alert on them. I've seen (variations of) your 5 action items needed for the logs of every company I've worked at, but they've never applied to metrics.

Executions should be exposed through tracing. I'm kind-of cheating here: I expect the traces to have logs attached. But a well-done tracing system, where a developer can add a flag to their Postman query and their request was traced with the debug level set only for that request is a magical thing.

indymike · on Aug 8, 2023

> Solution: logs go into a data store that allows you to filter on this data.

You can get very far getting to know the operating system's remote logging machinery can get you very far on this. It's amazing how often people basically duplicate this and how often people just write logs to text files or database tables instead of hooking up to the tooling that comes with the OS.

bandrami · on Aug 8, 2023

We used to call it Perl Programmer's Disease: at some point every Perl programmer in the late 1990s wrote a script to send Apache logs to a remote host because doing that was faster than learning how to make Apache log to a remote host directly.

indymike · on Aug 8, 2023

It's amazing how often the wheel gets re-invented because people don't realize there's a spare tire.

Razengan · on Aug 8, 2023

> log levels matter. Logging messages are structured data.

Swift has those [0][1] and other features like jumping to the file and line of code from where the log was generated, but I wish there was a way to easily add extra information to each message in the debug console such as the current frame being rendered etc. Something I've been wrestling to do for the past few days, but if I write a custom logging function, then the IDE's debug console thinks every log message was generated from my custom function.

And god I wish we started making use of COLOR within text-heavy information. Being able to color different words/values in a log message would massively improve readability and comprehension.

[0] https://developer.apple.com/documentation/os/logging

[1] https://developer.apple.com/wwdc23/10226

appplication · on Aug 8, 2023

I only used it once for a class in grad school, but it’s things like this that make Swift feel like a really well intentioned programming language, especially paired with the xcode ecosystem.

biugbkifcjk · on Aug 8, 2023

You've made a lot of good points. I've stepped into a team that is supporting a large product that has been going for years. There are so many error logs and alerts that nobody notices them any more - its so frustrating.

jstarfish · on Aug 8, 2023

I've taken to advocating for the "don't generate an alert for this unless the team needs to be called in at 3am" school of alert engineering.

Alternately, "cost of responding to false positives will be deducted from your bonus."

jillesvangurp · on Aug 8, 2023

I do two things:

- I (as the CTO) get grumpy when I get alerted for nothing or spammed with non stop alerts. And I see all the alerts. Basically that means I tell people to get their act together (or lead by example). In fairness, it's quite often me that made the changes that caused me to get alerted and grumpy. This is not about finger pointing but about it genuinely being annoying to have to deal with this. This is a necessary level of pain that you seek to minimize.

- I get more grumpy when I don't get alerted when the thing actually breaks. This means I have to explain to others why shit was broken for hours/days on end without me doing anything about it. The dog ate my homework doesn't quite cut it here. I'm responsible, so I need to know.

The balance here is making sure every error gets logged and then making sure that everything that does get logged gets resolved in a way that makes the problem go away permanently. It's either a bug (fix it), an infrastructure failure (fix it), or something that isn't an error (so fix that it doesn't log a such).