I swear IOPS/request latency is one of the least understood things in these huge companies. "But we have 40bazillion GB and 100GB LAN", cool story bro, your disk queue latency is pegged at your depth limit and your fs response latency is over a second, everything is going to suck till you deal with that.
I have had customers say this when not getting desired throughput cross-continent. "But the pipes! They are fat!" Yes but also your window needs to scale... 6 figure network engineers not knowing about window scaling, who do not know how to analyze a packet trace.
I recently had someone suggest that they needed 1ms latency cross-continent. I explained patiently that the laws of physics have to change for them to hit that number.
Oh, this is too common! I can't even count how many times I had to blame Einstein for setting the speed of light too low in his law. (Nitpick: I know he's the wrong person to blame, but he's famous enough to make this joke understandable by everybody)
Once I had to write a root cause analysis report and it basically explained how TCP works.
Truth. I literally have a signal on one of my monitoring dashboards which indicates "database is currently undergoing online backup", because it is the single most important performance signal I have. This doesn't come to me as a signal from the database team; I have to poll the database for it myself.
Noisy neighbor in inadequately-isolated, shared-tenancy models is just the worst.
> your disk queue latency is pegged at your depth limit and your fs response latency is over a second
You said words. And I totally, 100% understand them. But, like, for the plebs that are totally not me, could you elaborate on what you're talking about here? I^WThey would like to understand.