"With a relational database the complexity is hidden"
That is my main issue. I use Cassandra over relational firstly for its linear scalability and multi-master-esque HA. But even ignoring those, I understand exactly what is being scanned and what is not, I don't have to fight with an optimizer at runtime based on several parameters.
I understand your point (and it’s a good one) and here is mine: Unless you're working in a team with a lot of good IT guys, you're likely to end up with worse performances and problems.
For example, when I started in Big Data, in less than 3 weeks I was able to optimize some batches just because I read the documentation of the framework used (PIG in this case) and read a small part of the source code to dig deeper. And it was not some touchy optimizations: I used in-memory joins and reduced the number relations in the scripts to reduce the generation of Hadoop jobs (which led to batchs 4 times faster).
There are often problems with our HBase database because it’s often overloaded (I’m not an IT operator so I can’t give more details) and no one really masters this database whereas it’s in production since 2014.
I do understand that in some cases a NoSQL database is mandatory and like you I like to understand what I’m doing. But:
- I’m not working in Silicon Valley
- Most of my co-worker are not geeks (and I respect that)
- It's VERY hard to find guys with real Big Data or NoSQL skills (this comes from a French technical recruiter)
So, if the geek part of me loves Big Data and NoSQL, the rational part prefers using well known technologies. If NoSQL and Big Data becomes mainstream and more known then the rational part will love them too.
While I agree with a couple of your premises, I don't think they all apply to Cassandra as much and is too broad of a brush to use.
I don't believe NoSQL means no validation. In fact, I've found things like Cassandra CQL actively prevent me from running expensive queries unless I opt in (e.g. ALLOW FILTERING). Validation is DB specific, but I don't think it's fair to say it's a footgun in NoSQL any more than in SQL databases.
As for choosing what is known by the employee market, I personally don't choose technologies that way (but I do choose based on maturity of course). I rarely look for skills as much as the ability to learn new ones, but I understand it can be a pipe dream when in the market for juniors.
Those optimizers you are fighting with have thousands of man hours of research behind them. For every silly choice they make, they make hundreds or thousands of correct ones.
That is my main issue. I use Cassandra over relational firstly for its linear scalability and multi-master-esque HA. But even ignoring those, I understand exactly what is being scanned and what is not, I don't have to fight with an optimizer at runtime based on several parameters.