Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What would you use for storing and querying long-term audit logs (e.g. 6 months retention), which should be searchable with subsecond latency and would serve 10k writes per second?

AFAICT this system feels like a decent choice. Alternatives?



I would question first if the system needs to search with subsecond latency and if the same system needs to be which can handle 10k writes/sec.

Even google cloud and others let you wait for longer search queries. If not business ciritical, you can definitly wait a bit.

And the write system might not need to write it in the endformat. Especially as it also has to handle transformation and filtering.

Nonetheless, as mentioned in my other comment, the interesting details of this is missing.


Let's say that it powers a "search logs" page that an end user wants to see. And let's say that they want last 1d, 14d, 1m, 6m.

So subsecond I would say is a requirement.

And no, it doesn't have to be the same system that ingests/indexes the logs.


"So subsecond I would say is a requirement." you do not make any specific point why you came to that conclusion.

You can easily entertain users to show them that the system is doing something in the background without loosing them and if they are collegues who actually need to search, you don't even need to keep them as they have to use your setup.


OK, let's say it needs to be <3s, for reasons.


You'll find many case studies about using Clickhouse for this purpose.


Do you know any specific case studies for unstructured logs on clickhouse?

I think achieving sub-second read latency of adhoc text searching over ~150B rows of unstructured data is going to be quite challenging without a high cost. Clickhouse’s inverted indices are still experimental.

If the data can be organized in a way that is conducive to the searching itself, or structured it into columns, that’s definitely possible. Otherwise I suppose a large number of CPUs (150-300) to split the job and just brute force each search?


There is at least https://news.ycombinator.com/item?id=40936947 though it's a bit of mixed in terms how they handle schema.


not sure if an excellent joke or a honest mistake


Let's go with former, I definitely didn't mean to link https://www.uber.com/en-FI/blog/logging/ :)


What if I don't have such latency requirements? I'm willing to trade that for flexibility or anything else


10k audit logs per sec? I think we have different definitions of audit logs.


NATS?


NATS doesn't really have advanced query features though. It has a lot of really nice things, but advanced querying isn't one of them. Not to mention I don't know if NATS does well with large datasets, does it have sharding capability for it's KV and object stores?


I use NATS at work, and I have had the privilege to speak with some of the folks at Synadia about this stuff.

Re: advanced querying: the recommended way to do this is to build an index out of band (like Redis (or a fork) or SQLite or something) that references the stored messages by sequence number. By doing that, your index is just this ephemeral thing that can be dynamically built to exactly optimize for the queries you're using it for.

Re: sharding: no, it doesn't support simple sharding. You can achieve sharding by standing up multiple NATS instances, and making a new stream (KV and object store are also just streams) on each instance, and capture some subset of the stream on each instance. The client (or perhaps a service querying on behalf of the client) would have to me smart enough to be able to mux the sources together.


Does it handle clustering/redundancy for the data stored in KV/object store? My intuition says yes because I believe it supports it at the "node" level


Yes. When you create a stream (including a KV or object store) you say what cluster you want to put it on, and how many replicas you want it to have.


Very cool, I'll have to keep that in mind next time I'm in need of something similar!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: