The article says that SQLite may not be a good choice for large datasets. Assumi...

kamaal · on July 15, 2014

Its not SQLite being slower, its a question of machine resources. When you use a Database cluster spanning across machines you are just throwing more resources at it, there no 'magic' per se going on in other tools. There is also likely better indexing which enables efficient querying.

Since the whole database is in one file though 140TB may be the filesize limit, searching through a index of 140TB data will still be a lot slower. That is the case even with client-server models or even with Hadoop.

Most people claiming to have a big data problem, actually don't have one. Its just bad understanding of SQL, coupled with NoSQL fashion which powers people to opt out of SQL. One more problem with SQL is, its a career path in itself. There is whole industry built around it, data base design, administration etc etc. And people who find this as high barrier to entry, take the easy way out and choose NoSQL based tools hoping it will act as a panacea- Only to re implement SQL badly at some point in their stack. But SQL has other advantages, it teaches you think about efficient representation of data. Which in turn leads to an overall better design of everything that connects it.

For most of your everyday so called 'Big data' problems, SQLite will work like charm. This covers most of the shops that claim to be doing big data work.

For the real big data problems, well then SQLite wasn't designed for it anyway.

_pgmf · on July 15, 2014

From the SQLite docs:

> An SQLite database is limited in size to 140 terabytes (247 bytes, 128 tibibytes). And even if it could handle larger databases, SQLite stores the entire database in a single disk file and many filesystems limit the maximum size of files to something less than this. So if you are contemplating databases of this magnitude, you would do well to consider using a client/server database engine that spreads its content across multiple disk files, and perhaps across multiple volumes.

jdreaver · on July 15, 2014

Maybe their definition of large is whatever can't fit on disk. I'm sure they keep the definition vague as not to make too many guarantees.

jgalt212 · on July 15, 2014

I personally have observed serious performance degradation around 10GB file size. Away from that, I cannot say enough good things about sqlite.