Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>You could just use sqlite with :memory: for the Raft FSM

That's the basic design that rqlite[1] had for its first ~7 years. :-) But rqlite moved to on-disk SQLite, since with WAL mode, and with 'PRAGMA synchronous=OFF' [2], it is about as fast as writing to RAM. Or at least close enough, and I avoid all the limitations that come with :memory: SQLite databases (max size of 2GB being one). I should have just used on-disk mode from the start, but only now know better.

(I'm guessing you may know some of this because rqlite uses the same Raft library [3] as Nomad.)

As for the upgrade issue you mention, yes, it's real. Do you find it in the field much with Nomad? I've managed to introduce new Raft Entry types very infrequently during rqlite's 10-years of development, only once did someone hit it in the field with rqlite. Of course, one way to deal with it is to release a version of one's software first that understands the new types but doesn't ever write the new types. And once that version is fully deployed, upgrade to the version that actually writes new types too. I've never bothered to do this in practise however, and it requires discipline on the part of the end-users too.

[1] https://www.rqlite.io

[2] This might sound dangerous but in the current design of rqlite, the underlying SQLite database is completely rebuilt from the Raft log on startup (which is fsync'ed on every write). So any corruption of the SQLite database due power loss, etc is moot since the SQLite database is not the authoritative store of data in rqlite.

[3] https://github.com/hashicorp/raft



> I should have just used on-disk mode from the start, but only now know better.

Yeah, I saw the recent post about reducing rqlite disk space usage. Using the on-disk sqlite as both the FSM and the Raft snapshot makes a lot of sense here. I'm curious whether you've had concerns about write amplification though? Because we have only the periodic Raft snapshots and the FSM is in-memory, during high write volumes we're only really hammering disk with the Raft logs.

> Do you find it in the field much with Nomad? I've managed to introduce new Raft Entry types very infrequently during rqlite's 10-years of development, only once did someone hit it in the field with rqlite.

My understanding is that rqlite Raft entries are mostly SQL statements (is that right?). Where Nomad is somewhat different (and probably closer to the OP) is that the Raft entries are application-level entries. For entries that are commands like "stop this job"[0] upgrades are simple.

The tricky entries are where the entry is "upsert this large deeply-nested object that I've serialized", like the Job or Node (where the workloads run). The typical bug here is you've added a field way down in the guts of one of these objects that's a pointer to a new struct. When old versions deserialize the message they ignore the new field and that's easy to reason about. But if the leader is still on an old version and the new code deserializes the old object (or your new code is just reading in the Raft snapshot on startup), you need to make sure you're not missing any nil pointer checks. Without sum types enforced at compile time (i.e. Option/Maybe), we have to catch all these via code review and a lot of tedious upgrade testing.

> it requires discipline on the part of the end-users too.

Oh for sure. Nomad runs into some commercial realities here around how much discipline we can demand from end-users. =)

[0] https://github.com/hashicorp/nomad/blob/v1.8.2/nomad/fsm.go#... [1] https://github.com/hashicorp/nomad/blob/v1.8.2/nomad/fsm.go#...


>I'm curious whether you've had concerns about write amplification though?

I mean, yes, the more disk IO rqlite has to make to more write performance will be affected. However the advantages of running with an on-disk SQLite database are worth it I believe. In addition rqlite supports storing the SQLite database file on a memory-backed filed system if users really want that[1]. That can help squeeze more write throughput out of rqlite.

>My understanding is that rqlite Raft entries are mostly SQL statements (is that right?).

That's right, rqlite does statement-based replication, though I'm currently looking into extending it so it also does changeset[2] replication where it makes sense.

[1] https://rqlite.io/docs/guides/performance/#use-a-memory-back...

[2] https://www.sqlite.org/sessionintro.html




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: