That changed slightly with Ryzen: AMD closed the gap on single-threaded IPC (close enough, anyway) but the new issue with Zen 1 and Zen+ was memory/cache/inter-CCX latencies. Zen+ solved most of the memory latency issues but hadn't fixed cache/CCX latencies much.
Supposedly Zen 2 solved most of that. (And some game benchmarks like CSGO suggest they really did) We'll see how it actually pans out since there's still the issue of inter-CCX latency (and now even cross-chiplet latency).
It doesn't solve all of it however. If your program has more than "$number_of_cores / 2" threads, you'll cross the CCX boundary at some point(s). On Zen 2, that instead changes to "$number_of_cores / 4" (CCX boundary) or "$number_of_cores / 2" (chiplet boundary).
Inter-CCX communication requires hopping over the Infinity Fabric bus, which (in case of Zen 1, no newer benchmarks) increases thread latency from ~45us to ~131us. I'm sure it was reduced in Zen+ and is probably closer to 100us by now. However, I'm not sure if inter-chiplet communication will be the same (e.g.: has its own IF bus) or worse (IO chip overhead).
Hopefully someone runs the same inter-thread communication benchmarks on Zen 2.
Supposedly Zen 2 solved most of that. (And some game benchmarks like CSGO suggest they really did) We'll see how it actually pans out since there's still the issue of inter-CCX latency (and now even cross-chiplet latency).