This seems backwards to me in 2023. Logs seem like valuable data to train your A...

darkclouds · on Aug 8, 2023

What if the logs were missing entries though? Would you train your AI on missing data?

I found a bug in rsyslog a few years ago, where in certain conditions, log entries were missing. If you know the pattern for what log entries were going missing, then you could craft a very stealthy attack on the system and remain undetected in the logs.

eesmith · on Aug 8, 2023

Consider this point:

> there's no guarantee that the message you're monitoring for won't change. Maybe someday the Linux kernel developers will decide to put more information in their MCE messages and change the format.

How often do the messages change? How often do you need to re-train the AI? Is it worthwhile?

cobertos · on Aug 8, 2023

A properly trained one shouldn't overfit. If it was trainer to just detect error lines and flag them for human review I think that'd be very beneficial

Not as accurate as string matching though. And more time/compute intensive

eesmith · on Aug 8, 2023

That doesn't seem to address my question.

How much work does it take to train and maintain a property trained AI, given the three problems mentioned in the essay?