(Admitting bias: I've only ever worked with postgres in production with update-h...

srcreigh · on May 12, 2023

Interesting. PG docs don’t clarify whether visibility map gets updated for HOT update. Maybe even HOT update spoils index only scans. Although I can’t see why-no new index entries, heap visibility status hasn’t changed for any indexes.. wish to find some answers here but I could not.

Wrt secondary indexes, yes and no. There is a cost to traverse a B-tree for point lookups. Also, foreign keys may now be composite keys if primary key is composite as in the Dropbox example.

If the secondary index is very different from the primary, it will be more expensive. However it’s pretty common to at least use a “user_id” as the first part of the primary key. This will make partial full scans a lot faster for queries regarding a single user; only need to scan that users data, and it comes at a 1-2 order of magnitude cheaper disk read cost. So you’d need a secondary index only if the data you need is spread across 1000s of pages (megabytes of data for a single user in one table) and you’re looking for only a handful of rows randomly located in that sequence.

Twitter is a characteristic case where you need many different clustered sets for the same data (tweets) to power different peoples feeds. I believe twitter just stores many copies of tweets in different clusters in Redis- basically the same as having a (author_id, ts) primary key tweets table and a (follower_id, ts) primary key feed table, both having tweet data inlined. If one clustered table isn’t enough, use two.