sfg75's comments

sfg75 · on March 24, 2020

> Plus you get 11x9's of availability (99.999999999% availability).

Note that those 9s are for durability

From https://aws.amazon.com/s3/faqs/#How_durable_is_Amazon_S3:

"This durability level corresponds to an average annual expected loss of 0.000000001% of objects. For example, if you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years"

sfg75 · on April 10, 2018

That's a fair point. Indeed we started looking at doing aggregations across raw events, before realizing this was probably ill fated.

It's very possible we could have done the same with RedShift but it didn't seem obvious how. With Citus offering extensions like topn and hll we however quickly saw how that could work for us.

Thanks for the link btw!

al_james · on April 11, 2018

Yeah, thats a good point. Redshift does not have the same level of 'probabilistic counting' functions, that can be used from rollups. Redshift does have HLL (SELECT APPROXIMATE COUNT(*)) however that can only be applied when scanning the full data, I am not sure its possible to store a HLL object in a rollup and later aggregate them.

sfg75 · on April 10, 2018

Pretty much automatic. With the exception of our search engine which is in C++ (as performance is paramount there), Go is becoming our language of choice for most of our backend services. We found in Go a great balance in terms of productivity and performance.

After building the aggregation and ingestion services in Go, sticking with this language for the API sounded like a good idea as well since Go makes it trivial to build an http server and the logic of the API is simple enough that we didn’t see the need for any web framework.

shrumm · on April 11, 2018

Thanks! I asked because i’m looking at the same problem, though at a smaller scale than you guys.

I decided to build our personalization API using Python’s Flask, worked great at the start because it helped us move quickly adding new features. 6 months later, we have more clients and hence more traffic, the response times have gone up significantly. I ran a benchmark doing a simple db query which returns the result as JSON with the apache benchmark tool and found my Golang implementation to be in the order of 20-25x faster, compared to falcon which is meant to be a lot faster than Flask.

Decided to just go ahead and implement the most performance sensitive parts of the API in Go.

sfg75 · on April 10, 2018

Hey, sorry if that wasn't clear enough (author here).

We decided not to go with ClickHouse because we were mostly looking for a SaaS solution. That's pretty much why we also didn't spend too much time on Druid either.

Choosing Citus meant we could leverage a technology that we already had a bit of experience with (Postgres) and not have to really care about the infrastructure underneath it. We're still a fairly small team and those are meaningful factor to us.

At the end of day I'm sure all those systems would do the job fine (ClickHouse or Druid), we just went for what seemed the easiest to implement and scale.

ryanworl · on April 10, 2018

That makes sense. If you do ever want to check out Clickhouse and want someone to run it for you, Percona or Altinity [1] can probably help. Not affiliated with either, I just read their Clickhouse-related content.

[1](https://www.altinity.com)

sfg75 · on March 27, 2018

We've been using this extension for a while now at Algolia, great to see that it's now open sourced!

We heavily rely on this to power our analytics API. We use it precompute tops for billions of daily events. We can then fetch tops across specific time range usually in the order of the milliseconds on the fly. This was a game changer for us :)

sfg75 · on Jan 5, 2018

Great article. We use Citus and have a very similar approach :)

The article mentions HLL, but there are even more useful extensions (e.g topn to handle tops through the jsonb format).